@TwitterAds | Confidential
@ctrezzo
HBaseCon 2013
Apache HBase Replication
2@Twitter
About me
Active contributor to Apache HBase
Software Engineer @ Twitter
Core Storage Team - Hadoop/HBase
Follow me @ctrezzo
3@Twitter
Agenda
Introduction
High-level Architecture
Replication State
Path of a replicated edit
Replication Source
Replication Sink
Replication Source Manager
4@Twitter
HBase replication
Asynchronously copy data between two HBase
clusters
Push-based architecture
WAL shipping technique similar to MySQL
5@Twitter
Guarantees of replication
Eventually consistent
Deliver updates at least once
Atomicity of individual updates will be preserved
6@Twitter
Administering Replication
Simply set parameter in hbase-site.xml
hbase.replication => true
Setup replication topologies
add_peer, remove_peer, disable_peer,
enable_peer, list_peers
Create/Alter tables with replication scope set
REPLICATION_SCOPE => ‘1’
7@Twitter
High-Level Architecture
8@Twitter
Replication State
Persistently stored in Zookeeper
Status
Master kill switch
Peers
List of remote target clusters
Queues
List of remaining HLogs to replicate and current
position in each log
9@Twitter
Path of a replicated edit
10@Twitter
Path of a replicated edit
11@Twitter
End-point for shipping WAL entries
One instance for each queue
Runs as a separate thread on region server
Uses AdminProtocol RPC to synchronously
ship entries
Filters edits based on replication scope
Replication Source
12@Twitter
Replication Sink
End-point for receiving shipped WAL entries
One instance per region server
Synchronously receives entries and applies
them using HTable
Batches rows in the same table
13@Twitter
Load balancing
Balances load on remote cluster using
randomization
Ships edits to random subset of remote region
servers
Default is 10%
14@Twitter
Path of a replicated edit
15@Twitter
Replication Source Manager
Manages all replication sources
Manages change in replication state
Log rolling
Region server failure
Addition/deletion of peer clusters
16@Twitter
High-Level Architecture
17@Twitter
Additional Resources
Apache HBase user mailing list
user@hbase.apache.org
Apache HBase reference guide
https://hbase.apache.org/book.html
Tweet me
@ctrezzo
@TwitterAds | Confidential
Questions?
19@Twitter
Replication State
Persistently stored in Zookeeper
Three major replication znodes: Status, Peers, Queues
20@Twitter
Status znode
Master kill switch
Controlled by start_replication, stop_replication
Be careful what you wish for
21@Twitter
Peers znode
A set of remote clusters registered as possible
replication targets
Identified by peer id
Contains status of each peer cluster
22@Twitter
Queues znode
Queues identified by region server and peer id
Queues contain list of HLogs and current position
in log

HBaseCon 2013: Apache HBase Replication