HBaseCon 2015: HBase @ CyberAgent

HBaseCon
HBase @ CyberAgent
Toshihiro Suzuki, Hirotaka Kakishima
Who We Are
● Hirotaka Kakishima
o Database Engineer, CyberAgent, Inc.
● Toshihiro Suzuki
o Software Engineer, CyberAgent, Inc.
o Worked on HBase since 2012
o @brfrn169
Who We Are
We authored
Beginner’s Guide to HBase
in Japanese
Who We Are
Our office is located
in Akihabara, Japan
Agenda
● About CyberAgent & Ameba
● HBase @ CyberAgent
Our HBase History
Use Case: Social Graph Database
About CyberAgent
● Advertising (agency, tech)
● Games
● Ameba
https://www.cyberagent.co.jp/en/
CyberAgent, Inc.
What’s Ameba?
● Blogging/Social Networking/Game Platform
● 40 million users
What’s Ameba?
Ranking of Domestic Internet Services
Desktop Smartphone
by Nielsen 2014
http://www.nielsen.com/jp/ja/insights/newswire-j/press-release-chart/nielsen-news-release-20141216.html
Rank WebSite Name Monthly Unique Visitors WebSite Name Monthly Unique VisitorsRank
Ameba Blog 1.9 billion blog articles
Ameba Pigg
… and More
Platform
HBase @ CyberAgent
We Use HBase for
Log Analysis
Social Graph
Recommendations
Advertising Tech
● For Log Analysis
● HBase 0.90 (CDH3)
Our HBase History (1st Gen.)
Log
or SCP
Transfer & HDFS Sink M/R & Store Results
Our Web Application
Our HBase History (2nd Gen.)
● For Social Graph Database, 24/7
● HBase 0.92 (CDH4b1), HDFS CDH3u3
● NameNode using Fault Tolerant Server
http://www.nec.com/en/global/prod/express/fault_tolerant/technology.html
Our HBase History (2nd Gen.)
● Replication using original WAL apply method
● 10TB (not considering HDFS Replicas)
● 6 million requests per minutes
● Average Latency < 20ms
Our HBase History (3rd Gen.)
● For other social graph, recommendations
● HBase 0.94 (CDH4.2 〜 CDH4.7)
● NameNode HA
● Chef
● Master-slave replication (some clusters patched HBASE-8207)
Our HBase History (4th Gen.)
● For advertising tech (DSP, DMP, etc.)
● HBase 0.98 (CDH5.3)
● Amazon EC2
● Master-master replication
● Cloudera Manager
Currently
● 10 Clusters in Production
● 10 ~ 50 RegionServers / Cluster
● uptime:
16 months (0.92) : Social Graph
24 months (0.94) : Other Social Graph
2 months (0.98) : Advertising tech
We Cherish the Basics
● Learning architecture
● Considering Table Schema (very important)
● Having enough RAM, DISKs, Network Bandwidth
● Splitting large regions and running major compaction at off-peak
● Monitoring metrics & tuning configuration parameters
● Catching up BUG reports @ JIRA
Next Challenge
● We are going to migrate cluster
from 0.92 to 1.0
Case: Ameba’s Social Graph
Graph data
Platform for Smartphone Apps
Requirements
● Scalability
o growing social graph data
● High availability
o 24/7
● Low latency
o for online access
Why HBase
● Auto sharding
● Auto failover
● Low latency
We decided to use HBase and developed
graph database built on it
How we use HBase
as a Graph Database
System Overview
HBase
Gateway
Client Client Client Client
Gateway
Data Model
● Property Graph
follow
follow
follow
node1
node2
node3
Data Model
● Property Graph
follow
follow
follow
node1
node2
node3
name Taro
age 24
date 5/7
name Ichiro
age 31
date 4/1
date 3/31 name Jiro
age 54
API
Graph g = ...
Node node1 = g.addNode();
node1.setProperty("name", valueOf("Taro"));
Node node2 = g.addNode();
node2.setProperty("name", valueOf("Ichiro"));
Relationship rel = node1.addRelationship("follow", node2);
rel.setProperty("date", valueOf("2015-02-19"));
List<Relationship> outRels = node1.out("follow").list();
List<Relationship> inRels = node2.in("follow").list();
API
Graph g = ...
Node node1 = g.addNode();
node1.setProperty("name", valueOf("Taro"));
Node node2 = g.addNode();
node2.setProperty("name", valueOf("Ichiro"));
Relationship rel = node1.addRelationship("follow", node2);
rel.setProperty("date", valueOf("2015-02-19"));
List<Relationship> outRels = node1.out("follow").list();
List<Relationship> inRels = node2.in("follow").list();
API
Graph g = ...
Node node1 = g.addNode();
node1.setProperty("name", valueOf("Taro"));
Node node2 = g.addNode();
node2.setProperty("name", valueOf("Ichiro"));
Relationship rel = node1.addRelationship("follow", node2);
rel.setProperty("date", valueOf("2015-02-19"));
List<Relationship> outRels = node1.out("follow").list();
List<Relationship> inRels = node2.in("follow").list();
API
Graph g = ...
Node node1 = g.addNode();
node1.setProperty("name", valueOf("Taro"));
Node node2 = g.addNode();
node2.setProperty("name", valueOf("Ichiro"));
Relationship rel = node1.addRelationship("follow", node2);
rel.setProperty("date", valueOf("2015-02-19"));
List<Relationship> outRels = node1.out("follow").list();
List<Relationship> inRels = node2.in("follow").list();
API
Graph g = ...
Node node1 = g.addNode();
node1.setProperty("name", valueOf("Taro"));
Node node2 = g.addNode();
node2.setProperty("name", valueOf("Ichiro"));
Relationship rel = node1.addRelationship("follow", node2);
rel.setProperty("date", valueOf("2015-02-19"));
List<Relationship> outRels = node1.out("follow").list();
List<Relationship> inRels = node2.in("follow").list();
API
Graph g = ...
Node node1 = g.addNode();
node1.setProperty("name", valueOf("Taro"));
Node node2 = g.addNode();
node2.setProperty("name", valueOf("Ichiro"));
Relationship rel = node1.addRelationship("follow", node2);
rel.setProperty("date", valueOf("2015-02-19"));
List<Relationship> outRels = node1.out("follow").list();
List<Relationship> inRels = node2.in("follow").list();
API
Graph g = ...
Node node1 = g.addNode();
node1.setProperty("name", valueOf("Taro"));
Node node2 = g.addNode();
node2.setProperty("name", valueOf("Ichiro"));
Relationship rel = node1.addRelationship("follow", node2);
rel.setProperty("date", valueOf("2015-02-19"));
List<Relationship> outRels = node1.out("follow").list();
List<Relationship> inRels = node2.in("follow").list();
Schema Design
● RowKey
o <hash(nodeId)>-<nodeId>
● Column
o n:
o r:<direction>-<type>-<nodeId>
● Value
o Serialized properties
Schema Design (Example)
Node node1 = g.addNode();
node1.setProperty("name", valueOf("Taro"));
Node node2 = g.addNode();
node2.setProperty("name", valueOf("Ichiro"));
Node node3 = g.addNode();
node3.setProperty("name", valueOf("Jiro"));
Schema Design (Example)
Node node1 = g.addNode();
node1.setProperty("name", valueOf("Taro"));
Node node2 = g.addNode();
node2.setProperty("name", valueOf("Ichiro"));
Node node3 = g.addNode();
node3.setProperty("name", valueOf("Jiro"));
node1
Schema Design (Example)
Node node1 = g.addNode();
node1.setProperty("name", valueOf("Taro"));
Node node2 = g.addNode();
node2.setProperty("name", valueOf("Ichiro"));
Node node3 = g.addNode();
node3.setProperty("name", valueOf("Jiro"));
node1
node2
Schema Design (Example)
Node node1 = g.addNode();
node1.setProperty("name", valueOf("Taro"));
Node node2 = g.addNode();
node2.setProperty("name", valueOf("Ichiro"));
Node node3 = g.addNode();
node3.setProperty("name", valueOf("Jiro"));
node1
node3
node2
Schema Design (Example)
RowKey Column Value
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
Schema Design (Example)
Relationship rel1 = node1.addRelationship("follow", node2);
rel1.setProperty("date", valueOf("2015-02-19"));
Relationship rel2 = node1.addRelationship("follow", node3);
rel2.setProperty("date", valueOf("2015-02-20"));
Relationship rel3 = node3.addRelationship("follow", node2);
rel3.setProperty("date", valueOf("2015-04-12"));
node1
node3
node2
Schema Design (Example)
Relationship rel1 = node1.addRelationship("follow", node2);
rel1.setProperty("date", valueOf("2015-02-19"));
Relationship rel2 = node1.addRelationship("follow", node3);
rel2.setProperty("date", valueOf("2015-02-20"));
Relationship rel3 = node3.addRelationship("follow", node2);
rel3.setProperty("date", valueOf("2015-04-12"));
node1
node3
node2
follow
Schema Design (Example)
Relationship rel1 = node1.addRelationship("follow", node2);
rel1.setProperty("date", valueOf("2015-02-19"));
Relationship rel2 = node1.addRelationship("follow", node3);
rel2.setProperty("date", valueOf("2015-02-20"));
Relationship rel3 = node3.addRelationship("follow", node2);
rel3.setProperty("date", valueOf("2015-04-12"));
node1
node3
node2
follow
follow
Schema Design (Example)
Relationship rel1 = node1.addRelationship("follow", node2);
rel1.setProperty("date", valueOf("2015-02-19"));
Relationship rel2 = node1.addRelationship("follow", node3);
rel2.setProperty("date", valueOf("2015-02-20"));
Relationship rel3 = node3.addRelationship("follow", node2);
rel3.setProperty("date", valueOf("2015-04-12"));
node1
node3
node2
follow
followfollow
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
Schema Design (Example)
List<Relationship> outRels = node1.out("follow").list();
node3
node2
follow
followfollow
node1
Schema Design (Example)
List<Relationship> outRels = node1.out("follow").list();
node3
node2
follow
followfollow
node1
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
Schema Design (Example)
List<Relationship> inRels = node2.in("follow").list();
node3
node2
follow
followfollow
node1
Schema Design (Example)
List<Relationship> inRels = node2.in("follow").list();
node3
node2
follow
followfollow
node1
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
Schema Design (Example)
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
Consistency Problem
● HBase has no native cross-row transactional
support
● Possibility of inconsistency between
outgoing and incoming rows
Consistency Problem
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
Consistency Problem
RowKey Column Value
hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”}
hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”}
hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
Inconsistency
Coprocessor
● Endpoints
o like a stored procedure in RDBMS
o push your business logic into RegionServer
● Observers
o like a trigger in RDBMS
o insert user code by overriding upcall methods
Using Observers
● We use 2 observers
o WALObserver#postWALWrite
o RegionObserver#postWALRestore
● The same logic
o write an INCOMING row
● Eventual Consistency
Using Observers (Normal Case)
Client
Memstore
RegionServer
HDFS
WALs
WALObserver#
postWALWrite
Using Observers (Normal Case)
Client
1, write only an OUTGOING row
Memstore
RegionServer
HDFS
WALs
WALObserver#
postWALWrite
Using Observers (Normal Case)
Client
1, write only an OUTGOING row
Memstore
2, write to Memstore
RegionServer
HDFS
WALs
WALObserver#
postWALWrite
3, write WAL to HDFS
Using Observers (Normal Case)
Client
1, write only an OUTGOING row
Memstore
2, write to Memstore
RegionServer
HDFS
WALs
WALObserver#
postWALWrite
3, write WAL to HDFS
4, write the INCOMING row
Using Observers (Normal Case)
Client
1, write only an OUTGOING row
Memstore
2, write to Memstore
RegionServer
HDFS
WALs
WALObserver#
postWALWrite
3, write WAL to HDFS
5, respond
4, write the INCOMING row
Using Observers (Abnormal Case)
Client
Memstore
RegionServer
HDFS
WALs
WALObserver#
postWALWrite
Using Observers (Abnormal Case)
Client
1, write only an OUTGOING row
Memstore
RegionServer
HDFS
WALs
WALObserver#
postWALWrite
Using Observers (Abnormal Case)
Client
1, write only an OUTGOING row
Memstore
2, write to Memstore
RegionServer
HDFS
WALs
WALObserver#
postWALWrite
3, write WAL to HDFS
Using Observers (Abnormal Case)
Client
1, write only an OUTGOING row
Memstore
2, write to Memstore
RegionServer
HDFS
WALs
WALObserver#
postWALWrite
3, write WAL to HDFS
Using Observers (Abnormal Case)
Another RegionServer
HDFS
RegionObserver#
postWALRestore
WALs
Memstore
Using Observers (Abnormal Case)
Another RegionServer
HDFS
RegionObserver#
postWALRestore
WALs
Memstore
1, replay a WAL of an OUTGOING row
Using Observers (Abnormal Case)
Another RegionServer
HDFS
RegionObserver#
postWALRestore
WALs
Memstore
2, write the INCOMING row
1, replay a WAL of an OUTGOING row
Summary
● We have used HBase in several projects
o Log Analysis, Social Graph, Recommendations,
Advertising tech
● We developed graph database built on HBase
o HBase is good for storing social graphs
o We use coprocessor to resolve consistency problems
If you have any questions,
please tweet @brfrn169.
Questions
1 of 90

Recommended

HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase by
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon
8.8K views49 slides
Programmatic Bidding Data Streams & Druid by
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidCharles Allen
1.8K views56 slides
Real-time analytics with Druid at Appsflyer by
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
4.4K views38 slides
Imply at Apache Druid Meetup in London 1-15-20 by
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Jelena Zanko
311 views50 slides
Argus Production Monitoring at Salesforce by
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceHBaseCon
3.2K views21 slides
Game Analytics at London Apache Druid Meetup by
Game Analytics at London Apache Druid MeetupGame Analytics at London Apache Druid Meetup
Game Analytics at London Apache Druid MeetupJelena Zanko
100 views45 slides

More Related Content

What's hot

Druid realtime indexing by
Druid realtime indexingDruid realtime indexing
Druid realtime indexingSeoeun Park
3.1K views19 slides
Migrating to MongoDB: Best Practices by
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMongoDB
7.3K views38 slides
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English by
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
1.1K views59 slides
Hadoop Ecosystem by
Hadoop EcosystemHadoop Ecosystem
Hadoop EcosystemLior Sidi
2.7K views51 slides
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management by
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB
35.7K views44 slides
AWS Big Data Demystified #4 data governance demystified [security, networ... by
AWS Big Data Demystified #4   data governance demystified   [security, networ...AWS Big Data Demystified #4   data governance demystified   [security, networ...
AWS Big Data Demystified #4 data governance demystified [security, networ...Omid Vahdaty
642 views45 slides

What's hot(20)

Druid realtime indexing by Seoeun Park
Druid realtime indexingDruid realtime indexing
Druid realtime indexing
Seoeun Park3.1K views
Migrating to MongoDB: Best Practices by MongoDB
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best Practices
MongoDB7.3K views
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English by Omid Vahdaty
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Omid Vahdaty1.1K views
Hadoop Ecosystem by Lior Sidi
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Lior Sidi2.7K views
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management by MongoDB
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB35.7K views
AWS Big Data Demystified #4 data governance demystified [security, networ... by Omid Vahdaty
AWS Big Data Demystified #4   data governance demystified   [security, networ...AWS Big Data Demystified #4   data governance demystified   [security, networ...
AWS Big Data Demystified #4 data governance demystified [security, networ...
Omid Vahdaty642 views
Webinar: Choosing the Right Shard Key for High Performance and Scale by MongoDB
Webinar: Choosing the Right Shard Key for High Performance and ScaleWebinar: Choosing the Right Shard Key for High Performance and Scale
Webinar: Choosing the Right Shard Key for High Performance and Scale
MongoDB1.3K views
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid by Yahoo Developer Network
July 2014 HUG : Pushing the limits of Realtime Analytics using DruidJuly 2014 HUG : Pushing the limits of Realtime Analytics using Druid
July 2014 HUG : Pushing the limits of Realtime Analytics using Druid
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C... by Databricks
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Databricks6K views
Apache Spark and MongoDB - Turning Analytics into Real-Time Action by João Gabriel Lima
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima3.5K views
Amazon aws big data demystified | Introduction to streaming and messaging flu... by Omid Vahdaty
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Omid Vahdaty467 views
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise... by HBaseCon
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
HBaseCon 2015: HBase as an IoT Stream Analytics Platform for Parkinson's Dise...
HBaseCon4.5K views
Webinar: When to Use MongoDB by MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
MongoDB6.9K views
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale by ScyllaDB
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
ScyllaDB1.5K views
MongoDB and Hadoop: Driving Business Insights by MongoDB
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
MongoDB3.2K views
Big Data in 200 km/h | AWS Big Data Demystified #1.3 by Omid Vahdaty
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty473 views
Web analytics at scale with Druid at naver.com by Jungsu Heo
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
Jungsu Heo5.9K views
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset by Hortonworks
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and SupersetInteractive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Hortonworks3.6K views
High Performance Applications with MongoDB by MongoDB
High Performance Applications with MongoDBHigh Performance Applications with MongoDB
High Performance Applications with MongoDB
MongoDB3K views

Viewers also liked

HBase Meetup Tokyo Summer 2015 #hbasejp by
HBase Meetup Tokyo Summer 2015 #hbasejpHBase Meetup Tokyo Summer 2015 #hbasejp
HBase Meetup Tokyo Summer 2015 #hbasejpCloudera Japan
3.1K views19 slides
まだ間に合う HBaseCon2016 by
まだ間に合う HBaseCon2016まだ間に合う HBaseCon2016
まだ間に合う HBaseCon2016Hirotaka Kakishima
2.8K views46 slides
20150625 cloudera by
20150625 cloudera20150625 cloudera
20150625 clouderaRecruit Technologies
3.9K views47 slides
HBaseとSparkでセンサーデータを有効活用 #hbasejp by
HBaseとSparkでセンサーデータを有効活用 #hbasejpHBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejpFwardNetwork
4.4K views25 slides
HBase×Impalaで作るアドテク 「GMOプライベートDMP」@HBaseMeetupTokyo2015Summer by
HBase×Impalaで作るアドテク「GMOプライベートDMP」@HBaseMeetupTokyo2015SummerHBase×Impalaで作るアドテク「GMOプライベートDMP」@HBaseMeetupTokyo2015Summer
HBase×Impalaで作るアドテク 「GMOプライベートDMP」@HBaseMeetupTokyo2015SummerMichio Katano
6.3K views49 slides
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase by
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBaseHBaseCon
3.2K views30 slides

Viewers also liked(20)

HBase Meetup Tokyo Summer 2015 #hbasejp by Cloudera Japan
HBase Meetup Tokyo Summer 2015 #hbasejpHBase Meetup Tokyo Summer 2015 #hbasejp
HBase Meetup Tokyo Summer 2015 #hbasejp
Cloudera Japan3.1K views
HBaseとSparkでセンサーデータを有効活用 #hbasejp by FwardNetwork
HBaseとSparkでセンサーデータを有効活用 #hbasejpHBaseとSparkでセンサーデータを有効活用 #hbasejp
HBaseとSparkでセンサーデータを有効活用 #hbasejp
FwardNetwork4.4K views
HBase×Impalaで作るアドテク 「GMOプライベートDMP」@HBaseMeetupTokyo2015Summer by Michio Katano
HBase×Impalaで作るアドテク「GMOプライベートDMP」@HBaseMeetupTokyo2015SummerHBase×Impalaで作るアドテク「GMOプライベートDMP」@HBaseMeetupTokyo2015Summer
HBase×Impalaで作るアドテク 「GMOプライベートDMP」@HBaseMeetupTokyo2015Summer
Michio Katano6.3K views
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase by HBaseCon
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon3.2K views
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S... by Cloudera, Inc.
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.4.7K views
HBaseCon 2015: HBase Operations in a Flurry by HBaseCon
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon4.1K views
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B... by HBaseCon
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
HBaseCon4.1K views
Real-time HBase: Lessons from the Cloud by HBaseCon
Real-time HBase: Lessons from the CloudReal-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the Cloud
HBaseCon4.5K views
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving by HBaseCon
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon2.6K views
Rolling Out Apache HBase for Mobile Offerings at Visa by HBaseCon
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
HBaseCon2.6K views
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace by HBaseCon
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon4.5K views
Update on OpenTSDB and AsyncHBase by HBaseCon
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon2.6K views
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems by Cloudera, Inc.
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Cloudera, Inc.6.1K views
Digital Library Collection Management using HBase by HBaseCon
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBase
HBaseCon3.1K views
HBase Data Modeling and Access Patterns with Kite SDK by HBaseCon
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon4.7K views
HBase at Bloomberg: High Availability Needs for the Financial Industry by HBaseCon
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBaseCon6.7K views
Content Identification using HBase by HBaseCon
Content Identification using HBaseContent Identification using HBase
Content Identification using HBase
HBaseCon3.8K views
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS by HBaseCon
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWSHBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon4K views

Similar to HBaseCon 2015: HBase @ CyberAgent

Zeotap: Moving to ScyllaDB - A Graph of Billions Scale by
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleSaurabh Verma
282 views42 slides
Introduction To Apache Pig at WHUG by
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
17K views81 slides
Scio - A Scala API for Google Cloud Dataflow & Apache Beam by
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamNeville Li
5.7K views47 slides
GraphGen: Conducting Graph Analytics over Relational Databases by
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesKonstantinos Xirogiannopoulos
548 views37 slides
GraphGen: Conducting Graph Analytics over Relational Databases by
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
740 views37 slides
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ... by
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...Databricks
2.3K views61 slides

Similar to HBaseCon 2015: HBase @ CyberAgent(20)

Zeotap: Moving to ScyllaDB - A Graph of Billions Scale by Saurabh Verma
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Saurabh Verma282 views
Introduction To Apache Pig at WHUG by Adam Kawa
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
Adam Kawa17K views
Scio - A Scala API for Google Cloud Dataflow & Apache Beam by Neville Li
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Neville Li5.7K views
GraphGen: Conducting Graph Analytics over Relational Databases by PyData
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
PyData740 views
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ... by Databricks
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Databricks2.3K views
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr... by Alexey Zinoviev
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Alexey Zinoviev1.3K views
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016 by Dan Lynn
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn457 views
Powerful geographic web framework GeoDjango by OMEGA (@equal_001)
Powerful geographic web framework GeoDjangoPowerful geographic web framework GeoDjango
Powerful geographic web framework GeoDjango
OMEGA (@equal_001)2.5K views
New! Neo4j AuraDS: The Fastest Way to Get Started with Data Science in the Cloud by Neo4j
New! Neo4j AuraDS: The Fastest Way to Get Started with Data Science in the CloudNew! Neo4j AuraDS: The Fastest Way to Get Started with Data Science in the Cloud
New! Neo4j AuraDS: The Fastest Way to Get Started with Data Science in the Cloud
Neo4j130 views
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム by Masayuki Matsushita
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Sorry - How Bieber broke Google Cloud at Spotify by Neville Li
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
Neville Li3.2K views
Introduction to Spark Datasets - Functional and relational together at last by Holden Karau
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
Holden Karau640 views
Node in Production at Aviary by Aviary
Node in Production at AviaryNode in Production at Aviary
Node in Production at Aviary
Aviary 2.2K views
Webinar: How Banks Use MongoDB as a Tick Database by MongoDB
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
MongoDB2.2K views
Ducksboard - A real-time data oriented webservice architecture by Ducksboard
Ducksboard - A real-time data oriented webservice architectureDucksboard - A real-time data oriented webservice architecture
Ducksboard - A real-time data oriented webservice architecture
Ducksboard4K views
WSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform by WSO2
WSO2Con USA 2015: An Introduction to the WSO2 Analytics PlatformWSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform
WSO2Con USA 2015: An Introduction to the WSO2 Analytics Platform
WSO2468 views
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017 by MLconf
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
MLconf1.1K views
Graph Gurus Episode 1: Enterprise Graph by TigerGraph
Graph Gurus Episode 1: Enterprise GraphGraph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
TigerGraph179 views
Dirty data? Clean it up! - Datapalooza Denver 2016 by Dan Lynn
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
Dan Lynn1.3K views

More from HBaseCon

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes by
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
3.9K views36 slides
hbaseconasia2017: HBase on Beam by
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on BeamHBaseCon
1.3K views26 slides
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei by
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at HuaweiHBaseCon
1.4K views21 slides
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest by
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in PinterestHBaseCon
936 views42 slides
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程 by
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程HBaseCon
1.1K views21 slides
hbaseconasia2017: Apache HBase at Netease by
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at NeteaseHBaseCon
754 views27 slides

More from HBaseCon(20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes by HBaseCon
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon3.9K views
hbaseconasia2017: HBase on Beam by HBaseCon
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon1.3K views
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei by HBaseCon
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon1.4K views
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest by HBaseCon
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon936 views
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程 by HBaseCon
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon1.1K views
hbaseconasia2017: Apache HBase at Netease by HBaseCon
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon754 views
hbaseconasia2017: HBase在Hulu的使用和实践 by HBaseCon
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon878 views
hbaseconasia2017: 基于HBase的企业级大数据平台 by HBaseCon
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon701 views
hbaseconasia2017: HBase at JD.com by HBaseCon
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon828 views
hbaseconasia2017: Large scale data near-line loading method and architecture by HBaseCon
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon598 views
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei by HBaseCon
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon683 views
hbaseconasia2017: HBase Practice At XiaoMi by HBaseCon
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon1.8K views
hbaseconasia2017: hbase-2.0.0 by HBaseCon
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon1.8K views
HBaseCon2017 Democratizing HBase by HBaseCon
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon897 views
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest by HBaseCon
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon646 views
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase by HBaseCon
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon608 views
HBaseCon2017 Transactions in HBase by HBaseCon
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon1.8K views
HBaseCon2017 Highly-Available HBase by HBaseCon
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon1.1K views
HBaseCon2017 Apache HBase at Didi by HBaseCon
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon996 views
HBaseCon2017 gohbase: Pure Go HBase Client by HBaseCon
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon1.7K views

Recently uploaded

What Can Employee Monitoring Software Do?​ by
What Can Employee Monitoring Software Do?​What Can Employee Monitoring Software Do?​
What Can Employee Monitoring Software Do?​wAnywhere
21 views11 slides
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... by
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Marc Müller
36 views83 slides
Neo4j y GenAI by
Neo4j y GenAI Neo4j y GenAI
Neo4j y GenAI Neo4j
42 views41 slides
Keep by
KeepKeep
KeepGeniusee
73 views10 slides
SAP FOR CONTRACT MANUFACTURING.pdf by
SAP FOR CONTRACT MANUFACTURING.pdfSAP FOR CONTRACT MANUFACTURING.pdf
SAP FOR CONTRACT MANUFACTURING.pdfVirendra Rai, PMP
11 views2 slides
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)... by
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...Deltares
9 views34 slides

Recently uploaded(20)

What Can Employee Monitoring Software Do?​ by wAnywhere
What Can Employee Monitoring Software Do?​What Can Employee Monitoring Software Do?​
What Can Employee Monitoring Software Do?​
wAnywhere21 views
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... by Marc Müller
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Marc Müller36 views
Neo4j y GenAI by Neo4j
Neo4j y GenAI Neo4j y GenAI
Neo4j y GenAI
Neo4j42 views
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)... by Deltares
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
DSD-INT 2023 Modelling litter in the Yarra and Maribyrnong Rivers (Australia)...
Deltares9 views
MariaDB stored procedures and why they should be improved by Federico Razzoli
MariaDB stored procedures and why they should be improvedMariaDB stored procedures and why they should be improved
MariaDB stored procedures and why they should be improved
Consulting for Data Monetization Maximizing the Profit Potential of Your Data... by Flexsin
Consulting for Data Monetization Maximizing the Profit Potential of Your Data...Consulting for Data Monetization Maximizing the Profit Potential of Your Data...
Consulting for Data Monetization Maximizing the Profit Potential of Your Data...
Flexsin 15 views
SUGCON ANZ Presentation V2.1 Final.pptx by Jack Spektor
SUGCON ANZ Presentation V2.1 Final.pptxSUGCON ANZ Presentation V2.1 Final.pptx
SUGCON ANZ Presentation V2.1 Final.pptx
Jack Spektor22 views
Cycleops - Automate deployments on top of bare metal.pptx by Thanassis Parathyras
Cycleops - Automate deployments on top of bare metal.pptxCycleops - Automate deployments on top of bare metal.pptx
Cycleops - Automate deployments on top of bare metal.pptx
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by Marc Müller
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
Marc Müller38 views
Copilot Prompting Toolkit_All Resources.pdf by Riccardo Zamana
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
Riccardo Zamana6 views
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit... by Deltares
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...
Deltares13 views
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ... by Donato Onofri
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Donato Onofri711 views
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge... by Deltares
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...
Deltares16 views
DSD-INT 2023 - Delft3D User Days - Welcome - Day 3 - Afternoon by Deltares
DSD-INT 2023 - Delft3D User Days - Welcome - Day 3 - AfternoonDSD-INT 2023 - Delft3D User Days - Welcome - Day 3 - Afternoon
DSD-INT 2023 - Delft3D User Days - Welcome - Day 3 - Afternoon
Deltares13 views
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko... by Deltares
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
Deltares11 views

HBaseCon 2015: HBase @ CyberAgent

  • 1. HBase @ CyberAgent Toshihiro Suzuki, Hirotaka Kakishima
  • 2. Who We Are ● Hirotaka Kakishima o Database Engineer, CyberAgent, Inc. ● Toshihiro Suzuki o Software Engineer, CyberAgent, Inc. o Worked on HBase since 2012 o @brfrn169
  • 3. Who We Are We authored Beginner’s Guide to HBase in Japanese
  • 4. Who We Are Our office is located in Akihabara, Japan
  • 5. Agenda ● About CyberAgent & Ameba ● HBase @ CyberAgent Our HBase History Use Case: Social Graph Database
  • 7. ● Advertising (agency, tech) ● Games ● Ameba https://www.cyberagent.co.jp/en/ CyberAgent, Inc.
  • 9. ● Blogging/Social Networking/Game Platform ● 40 million users What’s Ameba?
  • 10. Ranking of Domestic Internet Services Desktop Smartphone by Nielsen 2014 http://www.nielsen.com/jp/ja/insights/newswire-j/press-release-chart/nielsen-news-release-20141216.html Rank WebSite Name Monthly Unique Visitors WebSite Name Monthly Unique VisitorsRank
  • 11. Ameba Blog 1.9 billion blog articles
  • 15. We Use HBase for Log Analysis Social Graph Recommendations Advertising Tech
  • 16. ● For Log Analysis ● HBase 0.90 (CDH3) Our HBase History (1st Gen.) Log or SCP Transfer & HDFS Sink M/R & Store Results Our Web Application
  • 17. Our HBase History (2nd Gen.) ● For Social Graph Database, 24/7 ● HBase 0.92 (CDH4b1), HDFS CDH3u3 ● NameNode using Fault Tolerant Server http://www.nec.com/en/global/prod/express/fault_tolerant/technology.html
  • 18. Our HBase History (2nd Gen.) ● Replication using original WAL apply method ● 10TB (not considering HDFS Replicas) ● 6 million requests per minutes ● Average Latency < 20ms
  • 19. Our HBase History (3rd Gen.) ● For other social graph, recommendations ● HBase 0.94 (CDH4.2 〜 CDH4.7) ● NameNode HA ● Chef ● Master-slave replication (some clusters patched HBASE-8207)
  • 20. Our HBase History (4th Gen.) ● For advertising tech (DSP, DMP, etc.) ● HBase 0.98 (CDH5.3) ● Amazon EC2 ● Master-master replication ● Cloudera Manager
  • 21. Currently ● 10 Clusters in Production ● 10 ~ 50 RegionServers / Cluster ● uptime: 16 months (0.92) : Social Graph 24 months (0.94) : Other Social Graph 2 months (0.98) : Advertising tech
  • 22. We Cherish the Basics ● Learning architecture ● Considering Table Schema (very important) ● Having enough RAM, DISKs, Network Bandwidth ● Splitting large regions and running major compaction at off-peak ● Monitoring metrics & tuning configuration parameters ● Catching up BUG reports @ JIRA
  • 23. Next Challenge ● We are going to migrate cluster from 0.92 to 1.0
  • 25. Graph data Platform for Smartphone Apps
  • 26. Requirements ● Scalability o growing social graph data ● High availability o 24/7 ● Low latency o for online access
  • 27. Why HBase ● Auto sharding ● Auto failover ● Low latency We decided to use HBase and developed graph database built on it
  • 28. How we use HBase as a Graph Database
  • 30. Data Model ● Property Graph follow follow follow node1 node2 node3
  • 31. Data Model ● Property Graph follow follow follow node1 node2 node3 name Taro age 24 date 5/7 name Ichiro age 31 date 4/1 date 3/31 name Jiro age 54
  • 32. API Graph g = ... Node node1 = g.addNode(); node1.setProperty("name", valueOf("Taro")); Node node2 = g.addNode(); node2.setProperty("name", valueOf("Ichiro")); Relationship rel = node1.addRelationship("follow", node2); rel.setProperty("date", valueOf("2015-02-19")); List<Relationship> outRels = node1.out("follow").list(); List<Relationship> inRels = node2.in("follow").list();
  • 33. API Graph g = ... Node node1 = g.addNode(); node1.setProperty("name", valueOf("Taro")); Node node2 = g.addNode(); node2.setProperty("name", valueOf("Ichiro")); Relationship rel = node1.addRelationship("follow", node2); rel.setProperty("date", valueOf("2015-02-19")); List<Relationship> outRels = node1.out("follow").list(); List<Relationship> inRels = node2.in("follow").list();
  • 34. API Graph g = ... Node node1 = g.addNode(); node1.setProperty("name", valueOf("Taro")); Node node2 = g.addNode(); node2.setProperty("name", valueOf("Ichiro")); Relationship rel = node1.addRelationship("follow", node2); rel.setProperty("date", valueOf("2015-02-19")); List<Relationship> outRels = node1.out("follow").list(); List<Relationship> inRels = node2.in("follow").list();
  • 35. API Graph g = ... Node node1 = g.addNode(); node1.setProperty("name", valueOf("Taro")); Node node2 = g.addNode(); node2.setProperty("name", valueOf("Ichiro")); Relationship rel = node1.addRelationship("follow", node2); rel.setProperty("date", valueOf("2015-02-19")); List<Relationship> outRels = node1.out("follow").list(); List<Relationship> inRels = node2.in("follow").list();
  • 36. API Graph g = ... Node node1 = g.addNode(); node1.setProperty("name", valueOf("Taro")); Node node2 = g.addNode(); node2.setProperty("name", valueOf("Ichiro")); Relationship rel = node1.addRelationship("follow", node2); rel.setProperty("date", valueOf("2015-02-19")); List<Relationship> outRels = node1.out("follow").list(); List<Relationship> inRels = node2.in("follow").list();
  • 37. API Graph g = ... Node node1 = g.addNode(); node1.setProperty("name", valueOf("Taro")); Node node2 = g.addNode(); node2.setProperty("name", valueOf("Ichiro")); Relationship rel = node1.addRelationship("follow", node2); rel.setProperty("date", valueOf("2015-02-19")); List<Relationship> outRels = node1.out("follow").list(); List<Relationship> inRels = node2.in("follow").list();
  • 38. API Graph g = ... Node node1 = g.addNode(); node1.setProperty("name", valueOf("Taro")); Node node2 = g.addNode(); node2.setProperty("name", valueOf("Ichiro")); Relationship rel = node1.addRelationship("follow", node2); rel.setProperty("date", valueOf("2015-02-19")); List<Relationship> outRels = node1.out("follow").list(); List<Relationship> inRels = node2.in("follow").list();
  • 39. Schema Design ● RowKey o <hash(nodeId)>-<nodeId> ● Column o n: o r:<direction>-<type>-<nodeId> ● Value o Serialized properties
  • 40. Schema Design (Example) Node node1 = g.addNode(); node1.setProperty("name", valueOf("Taro")); Node node2 = g.addNode(); node2.setProperty("name", valueOf("Ichiro")); Node node3 = g.addNode(); node3.setProperty("name", valueOf("Jiro"));
  • 41. Schema Design (Example) Node node1 = g.addNode(); node1.setProperty("name", valueOf("Taro")); Node node2 = g.addNode(); node2.setProperty("name", valueOf("Ichiro")); Node node3 = g.addNode(); node3.setProperty("name", valueOf("Jiro")); node1
  • 42. Schema Design (Example) Node node1 = g.addNode(); node1.setProperty("name", valueOf("Taro")); Node node2 = g.addNode(); node2.setProperty("name", valueOf("Ichiro")); Node node3 = g.addNode(); node3.setProperty("name", valueOf("Jiro")); node1 node2
  • 43. Schema Design (Example) Node node1 = g.addNode(); node1.setProperty("name", valueOf("Taro")); Node node2 = g.addNode(); node2.setProperty("name", valueOf("Ichiro")); Node node3 = g.addNode(); node3.setProperty("name", valueOf("Jiro")); node1 node3 node2
  • 45. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”}
  • 46. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”}
  • 47. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
  • 48. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
  • 49. Schema Design (Example) Relationship rel1 = node1.addRelationship("follow", node2); rel1.setProperty("date", valueOf("2015-02-19")); Relationship rel2 = node1.addRelationship("follow", node3); rel2.setProperty("date", valueOf("2015-02-20")); Relationship rel3 = node3.addRelationship("follow", node2); rel3.setProperty("date", valueOf("2015-04-12")); node1 node3 node2
  • 50. Schema Design (Example) Relationship rel1 = node1.addRelationship("follow", node2); rel1.setProperty("date", valueOf("2015-02-19")); Relationship rel2 = node1.addRelationship("follow", node3); rel2.setProperty("date", valueOf("2015-02-20")); Relationship rel3 = node3.addRelationship("follow", node2); rel3.setProperty("date", valueOf("2015-04-12")); node1 node3 node2 follow
  • 51. Schema Design (Example) Relationship rel1 = node1.addRelationship("follow", node2); rel1.setProperty("date", valueOf("2015-02-19")); Relationship rel2 = node1.addRelationship("follow", node3); rel2.setProperty("date", valueOf("2015-02-20")); Relationship rel3 = node3.addRelationship("follow", node2); rel3.setProperty("date", valueOf("2015-04-12")); node1 node3 node2 follow follow
  • 52. Schema Design (Example) Relationship rel1 = node1.addRelationship("follow", node2); rel1.setProperty("date", valueOf("2015-02-19")); Relationship rel2 = node1.addRelationship("follow", node3); rel2.setProperty("date", valueOf("2015-02-20")); Relationship rel3 = node3.addRelationship("follow", node2); rel3.setProperty("date", valueOf("2015-04-12")); node1 node3 node2 follow followfollow
  • 53. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
  • 54. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
  • 55. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
  • 56. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
  • 57. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
  • 58. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
  • 59. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
  • 60. Schema Design (Example) List<Relationship> outRels = node1.out("follow").list(); node3 node2 follow followfollow node1
  • 61. Schema Design (Example) List<Relationship> outRels = node1.out("follow").list(); node3 node2 follow followfollow node1
  • 62. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
  • 63. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
  • 64. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
  • 65. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
  • 66. Schema Design (Example) List<Relationship> inRels = node2.in("follow").list(); node3 node2 follow followfollow node1
  • 67. Schema Design (Example) List<Relationship> inRels = node2.in("follow").list(); node3 node2 follow followfollow node1
  • 68. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
  • 69. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
  • 70. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
  • 71. Schema Design (Example) RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} r:OUTGOING-follow-nodeId3 {“date”: “2015-02-20”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} r:INCOMING-follow-nodeId3 {“date”: “2015-04-12”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-04-12”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-20”}
  • 72. Consistency Problem ● HBase has no native cross-row transactional support ● Possibility of inconsistency between outgoing and incoming rows
  • 73. Consistency Problem RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”}
  • 74. Consistency Problem RowKey Column Value hash(nodeId1)-nodeId1 n: {“name”: “Taro”} r:OUTGOING-follow-nodeId2 {“date”: “2015-02-19”} hash(nodeId2)-nodeId2 n: {“name”: “Ichiro”} r:INCOMING-follow-nodeId1 {“date”: “2015-02-19”} hash(nodeId3)-nodeId3 n: {“name”: “Jiro”} Inconsistency
  • 75. Coprocessor ● Endpoints o like a stored procedure in RDBMS o push your business logic into RegionServer ● Observers o like a trigger in RDBMS o insert user code by overriding upcall methods
  • 76. Using Observers ● We use 2 observers o WALObserver#postWALWrite o RegionObserver#postWALRestore ● The same logic o write an INCOMING row ● Eventual Consistency
  • 77. Using Observers (Normal Case) Client Memstore RegionServer HDFS WALs WALObserver# postWALWrite
  • 78. Using Observers (Normal Case) Client 1, write only an OUTGOING row Memstore RegionServer HDFS WALs WALObserver# postWALWrite
  • 79. Using Observers (Normal Case) Client 1, write only an OUTGOING row Memstore 2, write to Memstore RegionServer HDFS WALs WALObserver# postWALWrite 3, write WAL to HDFS
  • 80. Using Observers (Normal Case) Client 1, write only an OUTGOING row Memstore 2, write to Memstore RegionServer HDFS WALs WALObserver# postWALWrite 3, write WAL to HDFS 4, write the INCOMING row
  • 81. Using Observers (Normal Case) Client 1, write only an OUTGOING row Memstore 2, write to Memstore RegionServer HDFS WALs WALObserver# postWALWrite 3, write WAL to HDFS 5, respond 4, write the INCOMING row
  • 82. Using Observers (Abnormal Case) Client Memstore RegionServer HDFS WALs WALObserver# postWALWrite
  • 83. Using Observers (Abnormal Case) Client 1, write only an OUTGOING row Memstore RegionServer HDFS WALs WALObserver# postWALWrite
  • 84. Using Observers (Abnormal Case) Client 1, write only an OUTGOING row Memstore 2, write to Memstore RegionServer HDFS WALs WALObserver# postWALWrite 3, write WAL to HDFS
  • 85. Using Observers (Abnormal Case) Client 1, write only an OUTGOING row Memstore 2, write to Memstore RegionServer HDFS WALs WALObserver# postWALWrite 3, write WAL to HDFS
  • 86. Using Observers (Abnormal Case) Another RegionServer HDFS RegionObserver# postWALRestore WALs Memstore
  • 87. Using Observers (Abnormal Case) Another RegionServer HDFS RegionObserver# postWALRestore WALs Memstore 1, replay a WAL of an OUTGOING row
  • 88. Using Observers (Abnormal Case) Another RegionServer HDFS RegionObserver# postWALRestore WALs Memstore 2, write the INCOMING row 1, replay a WAL of an OUTGOING row
  • 89. Summary ● We have used HBase in several projects o Log Analysis, Social Graph, Recommendations, Advertising tech ● We developed graph database built on HBase o HBase is good for storing social graphs o We use coprocessor to resolve consistency problems
  • 90. If you have any questions, please tweet @brfrn169. Questions

Editor's Notes

  1. Hi, thank you for coming to this session. Today, we are going to talk to you about HBase @ CyberAgent.
  2. I am Hirotaka Kakishima. I work for CyberAgent as a Database Engineer, and I will present the first part of this talk. And the second part of this talk will be done by Toshihiro Suzuki. He is a Software Engineer at CyberAgent.
  3. We authored beginner’s Guide to HBase in Japanese this year.
  4. Our office is located in Akihabara, Japan.
  5. This is today’s agenda. We are going to introduce our company and services. And we will talk about our hbase history as well as our use case of HBase.
  6. About CyberAgent
  7. CyberAgent is an internet service company in Japan. Our business is Advertising, Games, and Ameba We have more than 30% of the smartphone advertising market in Japan. We provide smartphone games for iOS, Android, and Web Browsers. Another big business is Ameba.
  8. What’s Ameba?
  9. Ameba is a Blog, Social Networking and Game service platform. We have 40 million Ameba users.
  10. Here’s the ranking of domestic internet services by the number of visitors in Japan announced by Nielsen last year. We ranked 10th in desktop visitors ranking and 9th in smartphone visitor ranking.
  11. To give you a better idea about Ameba, we will introduce Ameba Blog and Ameba Pigg. This is “Ameba Blog”. It is used by more than 10 thousands of Japanese celeprities, like TV personalities, sports players and statesmen. We have more than 1.9 billion blog articles as of September 2014.
  12. This is “Ameba Pigg”. It is 2D virtual world. You can create your avatar, chat, go fishing and much more in this virtual world.
  13. And we have more services on our platform.
  14. Now we will explain how we use HBase @ CyberAgent.
  15. We use HBase for Social Graph , Recommendations, Advertising technology, and Log Analysis. Toshihiro will talk about how we use HBase as a Social Graph Database later. I will talk about our HBase history.
  16. We have used HBase since 2011. Originally, we used HDFS and HBase for Log analysis. We transfered log using Flume, and stored in HDFS. Then we ran M/R jobs through Hive, and we stored the results into HBase. Finally, our analysts and managers obtained the results through our web application. We deployed HBase 0.90 with CDH3 on physical servers. This is how we got our first know-how of HDFS and HBase.
  17. Next, we tried HBase for 24/7 Online Social Graph Database. This time we used HBase 0.92 , but because of performance problems, we switched to a different CDH version for HDFS. In this version, NameNode didn’t have HA functionality. So we have used a Fault Tolerant Server from NEC.
  18. Because of bugs in HBase replication, we copied WAL to backup clusters using our original method. We are still using this method on one cluster. We have 10TB of social graph data, not considering HDFS Replicas and 6 million requests per minute Average Latency is less than 20ms.
  19. Next is the 3rd Generation. Here we upgraded our log analysis system and we deployed more clusters for recommendations, trend detection and other social graph. We used HBase 0.94 with Namenode HA. And We performed provisioning clusters with Chef. Next, we replicated data between HBase clusters using master-slave replication. But, because many of our hostnames normally include hyphen, some clusters had patch HBASE-8207 applied. エイトツーオーセブン
  20. Recently, we started using HBase 0.98 for Advertising technology. We deployed clusters with Master-Master replication in Amazon EC2. And we started using Cloudera Manager to install, configure and keep the cluster up and running.
  21. Currently we have 10 Clusters in Production. And each cluster has between 10 and 50 Region Servers. Almost all clusters have been stable over 1 year.
  22. For running HBase stable, We cherish the basics. Learning architecture Considering Table Schema (very important) Having enough RAM, DISKs, Network Bandwidth Splitting large regions and running major compaction at off-peak Monitoring metrics & tuning configuration parameters Catching up BUG reports @ JIRA
  23. Then, We are going to migrate cluster from 0.92 to 1.0 this year. From now, Toshihiro will continue this presentation. He will talk about how we use HBase as a Social Graph Database. Thank you.
  24. Hello, everyone. My name is Toshihiro Suzuki. I'm going to talk about the Ameba’s social graph, one of the systems where we extensively use HBase.
  25. We provide a platform for smartphone applications where a lot of services are running. For example, games, social networking and message board services. There is a lot of graph data such as users and connections between users like friends and followers. So we needed a large scale graph database when we began the development of the platform.
  26. Our requirements for the graph database are scalability, high availability and low latency. First, the graph database has to be scalable because web services can grow rapidly and unpredictably. Second, our services are used 24/7. So the graph database needs to be highly available. If a service goes down, it doesn’t only reduce our sales but also discourages our users. In addition, our applications have strict response time requirements because they are user-facing applications for online access. So the graph database has to have low latency.
  27. So we considered using HBase. HBase has auto sharding and auto failover, because HBase is designed to be used in distributed environments, so administration of HBase is relatively easy. And HBase can scale to add more RegionServers to the cluster as needed. With auto failover, HBase can recover quickly if any RegionServer goes down. Also, HBase provides low latency access. After considerable research and experimentation, we decided to use HBase and developed a graph database built on it.
  28. Next I'll talk about how we use HBase as a Graph Database.
  29. Here is the system overview of our graph database. When accessing graph data, clients don’t communicate with HBase directly, but via Gateways. Gateways talk to HBase when storing or retrieving graph data.
  30. Next I will explain about Data Model. The graph database provides Property Graph Model. In this model, there are nodes and relationships that are the connection between nodes. A relationship has a type and a direction. In this picture, there are 3 nodes -- "node1", "node2" and "node3", and 3 relationships. This relationship has a "follow" type and a direction from "node1" to "node2". This relationship has a "follow" type and a direction from "node2" to "node3".
  31. Nodes and relationships also have properties in key-value format. In this picture, "node1" has 2 properties, name:Taro and age:24, and this relationship has a property, date:May 7th.
  32. Here is the graph database’s API. It’s very simple.
  33. First, you create a graph object.
  34. Next, you call addNode method to create a Node, and set a property “name” and its value “Taro”.
  35. After that, You create another node and set a property “name” and its value “Ichiro”.
  36. Then, you add a relationship from “node1” to “node2”, a type “follow” and set a property “date” and its value.
  37. Next You can get outgoing relationships from “node1”.
  38. Finally, you can get incoming relationships to “node2”
  39. Here is the graph database schema design. A row key consists of a hash value of a node id and the node id. There are 2 Column Families "n" and "r". All nodes are stored with ColumnFamily "n" and empty Qualifier. All relationships are stored with ColumnFamily "r" and Qualifier that consists of direction, type and node id. Properties are serialized and stored as Value.
  40. For example, you create 3 nodes and set “name” properties to them,
  41. node1
  42. node2
  43. node3
  44. And in HBase,
  45. node1
  46. node2
  47. node3
  48. As you can see, the node data are stored in HBase like this. As mentioned before, the row key consists of a hash value of a node id and the node id. The Node’s Column Family is “n” and the Qualifier is empty. Properties are serialized and stored as Value.
  49. Then, you create 3 relationships and set “date” properties to them,
  50. First relationship,
  51. Second relationship,
  52. And third relationship,
  53. And this is how it is reflected in HBase,
  54. First relationship,
  55. Second relationship,
  56. And third relationship,
  57. As you can see, the relationship’s row key is same as the node’s. The Column Family is “r” and the Qualifier consists of direction “OUTGOING” or “INCOMING” , the type “follow” and node id. Similar to nodes, properties are serialized and stored as Value.
  58. The next example is how to get “OUTGOING” relationships.
  59. When you want to get “OUTGOING” relationships from “node1”,
  60. You can scan with
  61. the row key “nodeId1” and its hash value
  62. the column family “r” and the qualifier whose prefix is “OUTGOING” and “follow”.
  63. Then you can get these relationships.
  64. Next,
  65. When you want to get “INCOMING” relationships to “node2”,
  66. You can scan with
  67. the row key “nodeId2” and its hash value,
  68. the column family “r” and the qualifier whose prefix is “INCOMING” and “follow”.
  69. Then you can get these relationships.
  70. There is a potential consistency problem. As you know, HBase has no native cross-row transactional support. So there is a possibility of inconsistency between outgoing and incoming rows.
  71. For instance, when you try to add a relationship and the system goes down at the same time,
  72. The data inconsistency between outgoing and incoming rows may occur like this.
  73. To resolve this kind of problem, we are using Coprocessor. Coprocessor has two features, Endpoints and Observers. Endpoints are like a stored procedure in RDBMS and you can push your business logic into RegionServer. Observers are like a trigger in RDBMS and you can insert user code by overriding upcall methods.
  74. We use observers to resolve inconsistency problems. We use two observers, postWALWrite method of WALObserver and postWALRestore method of RegionObserver. The postWALWrite method is hooked after writing to WAL. And postWALRestore method is hooked after restoring WAL in a failover process. We implement these observers to insert the same logic for writing an INCOMING row. Thus we ensure eventual consistency between incoming and outgoing rows.
  75. Next I’ll show you how we use observers to resolve inconsistency problems with this animation. First, let’s look at the normal case.
  76. The client sends a put request to RegionServer to write only an outgoing row.
  77. Then, RegionServer writes the data to Memstore and then to WAL in HDFS
  78. Then, RegionServer executes our logic in postWALWrite method of WALObserver and it writes the incoming row.
  79. Finally, RegionServer responds to the client. Normally, we ensure consistency like this.
  80. Next, let’s consider a failure.
  81. First of all, the client sends a put request to RegionServer to write only an outgoing row.
  82. Then, RegionServer writes the data to Memstore and then to WAL in HDFS
  83. If the RegionServer goes down at that time, our logic in postWALWrite method isn’t executed and the incoming row isn’t written. So a data inconsistency is going to occur.
  84. Our logic in postWALRestore method of RegionObserver resolves this problem.
  85. In HBase, when RegionServer goes down, another RegionServer restores data from WALs.
  86. And, If RegionServer replays the WAL of an outgoing row, then our logic in postWALRestore method is executed and it writes the incoming row. As a result, the data inconsistency doesn’t occur even if any RegionServer goes down.
  87. Summary, We have used hbase in several projects, Log Analysis, Social Graph, Recommendations, Advertising technology. And I talked about Social Graph, which is one of our use cases. In our experience, HBase is good for storing social graphs. And We are using coprocessor to resolve consistency problems. Thank you for listening.
  88. If you have any questions, please tweet @brfrn169. Thank you.