Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
@TwitterAds | Confidential
@ctrezzo
HBaseCon 2013
Apache HBase Replication
Thursday, July 25, 13
@Twitter 2
About me
Active contributor to Apache HBase
Software Engineer @ Twitter
Core Storage Team - Hadoop/HBase
Follow...
@Twitter 3
Agenda
Introduction
High-level Architecture
Replication State
Path of a replicated edit
Replication Source
Repl...
@Twitter 4
HBase replication
Asynchronously copy data between two HBase clusters
Push-based architecture
WAL shipping tech...
@Twitter 5
Guarantees of replication
Eventually consistent
Deliver updates at least once
Atomicity of individual updates w...
@Twitter 6
Administering Replication
Simply set parameter in hbase-site.xml
hbase.replication => true
Setup replication to...
@Twitter 7
High-Level Architecture
ReplicationSource
Manager
ReplicationSource
Region Server
Region Server
ReplicationSink...
@Twitter 8
Replication State
Persistently stored in Zookeeper
Status
Master kill switch
Peers
List of remote target cluste...
@Twitter 9
Path of a replicated edit
ReplicationSource
Region Server
Region Server
ReplicationSink HTable
Region Server
Cl...
@Twitter 10
Path of a replicated edit
ReplicationSource
Region Server 1
Region Server
ReplicationSink HTable
Region Server...
@Twitter
End-point for shipping WAL entries
One instance for each queue
Runs as a separate thread on region server
Uses Ad...
@Twitter 12
Replication Sink
End-point for receiving shipped WAL entries
One instance per region server
Synchronously rece...
@Twitter 13
Load balancing
Balances load on remote cluster using randomization
Ships edits to random subset of remote regi...
@Twitter 14
Path of a replicated edit
ReplicationSource
Region Server
Region Server
ReplicationSink HTable
Region Server
C...
@Twitter 15
Replication Source Manager
Manages all replication sources
Manages change in replication state
Log rolling
Reg...
@Twitter 16
High-Level Architecture
ReplicationSource
Manager
ReplicationSource
Region Server
Region Server
ReplicationSin...
@Twitter 17
Additional Resources
Apache HBase user mailing list
user@hbase.apache.org
Apache HBase reference guide
https:/...
@TwitterAds | Confidential
Questions?
Thursday, July 25, 13
@Twitter 19
Replication State
Persistently stored in Zookeeper
Three major replication znodes: Status, Peers, Queues
/hbas...
@Twitter 20
Status znode
Master kill switch
Controlled by start_replication, stop_replication
Be careful what you wish for...
@Twitter 21
Peers znode
A set of remote clusters registered as possible replication
targets
Identified by peer id
Contains ...
@Twitter 22
Queues znode
Queues identified by region server and peer id
Queues contain list of HLogs and current position i...
Upcoming SlideShare
Loading in …5
×

HBase Replication

3,404 views

Published on

A talk given on HBase Replication at HBaseCon 2013.

Published in: Technology
  • Be the first to comment

HBase Replication

  1. 1. @TwitterAds | Confidential @ctrezzo HBaseCon 2013 Apache HBase Replication Thursday, July 25, 13
  2. 2. @Twitter 2 About me Active contributor to Apache HBase Software Engineer @ Twitter Core Storage Team - Hadoop/HBase Follow me @ctrezzo Thursday, July 25, 13
  3. 3. @Twitter 3 Agenda Introduction High-level Architecture Replication State Path of a replicated edit Replication Source Replication Sink Replication Source Manager Thursday, July 25, 13
  4. 4. @Twitter 4 HBase replication Asynchronously copy data between two HBase clusters Push-based architecture WAL shipping technique similar to MySQL Thursday, July 25, 13
  5. 5. @Twitter 5 Guarantees of replication Eventually consistent Deliver updates at least once Atomicity of individual updates will be preserved Thursday, July 25, 13
  6. 6. @Twitter 6 Administering Replication Simply set parameter in hbase-site.xml hbase.replication => true Setup replication topologies add_peer, remove_peer, disable_peer, enable_peer, list_peers Create/Alter tables with replication scope set REPLICATION_SCOPE => ‘1’ Thursday, July 25, 13
  7. 7. @Twitter 7 High-Level Architecture ReplicationSource Manager ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog ReplicationSource Region Server ReplicationSink HTable Region Server Cluster 3 1 /state /peers /1 /2 /rs Zookeeper 123 Replication Admin Thursday, July 25, 13
  8. 8. @Twitter 8 Replication State Persistently stored in Zookeeper Status Master kill switch Peers List of remote target clusters Queues List of remaining HLogs to replicate and current position in each log Thursday, July 25, 13
  9. 9. @Twitter 9 Path of a replicated edit ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 Thursday, July 25, 13
  10. 10. @Twitter 10 Path of a replicated edit ReplicationSource Region Server 1 Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 ReplicationSource Region Server 2 Region Server ReplicationSink HTable Region Server 1 2 4 3 HLog 12 ReplicationSource Region Server X Region Server ReplicationSink HTable Region Server 1 2 4 3 HLog 1 Thursday, July 25, 13
  11. 11. @Twitter End-point for shipping WAL entries One instance for each queue Runs as a separate thread on region server Uses AdminProtocol RPC to synchronously ship entries Filters edits based on replication scope ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 11 Replication Source Thursday, July 25, 13
  12. 12. @Twitter 12 Replication Sink End-point for receiving shipped WAL entries One instance per region server Synchronously receives entries and applies them using HTable Batches rows in the same table ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 Thursday, July 25, 13
  13. 13. @Twitter 13 Load balancing Balances load on remote cluster using randomization Ships edits to random subset of remote region servers Default is 10% Cluster 2 20 Region Servers Cluster 1 Thursday, July 25, 13
  14. 14. @Twitter 14 Path of a replicated edit ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 Thursday, July 25, 13
  15. 15. @Twitter 15 Replication Source Manager Manages all replication sources Manages change in replication state Log rolling Region server failure Addition/deletion of peer clusters ReplicationSource Manager ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog ReplicationSource Region Server ReplicationSink HTable Region Server Cluster 3 1 /state /peers /1 /2 /rs Zookeeper 123 Thursday, July 25, 13
  16. 16. @Twitter 16 High-Level Architecture ReplicationSource Manager ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog ReplicationSource Region Server ReplicationSink HTable Region Server Cluster 3 1 /state /peers /1 /2 /rs Zookeeper 123 Replication Admin Thursday, July 25, 13
  17. 17. @Twitter 17 Additional Resources Apache HBase user mailing list user@hbase.apache.org Apache HBase reference guide https://hbase.apache.org/book.html Tweet me @ctrezzo Thursday, July 25, 13
  18. 18. @TwitterAds | Confidential Questions? Thursday, July 25, 13
  19. 19. @Twitter 19 Replication State Persistently stored in Zookeeper Three major replication znodes: Status, Peers, Queues /hbase/replication /state [VALUE: true] /peers /1 [Value: zk1.host.com,zk2.host.com,zk3.host.com:2181:/hbase] /peer-state [Value: ENABLED] /2 [Value: zk5.host.com,zk6.host.com,zk7.host.com:2181:/hbase] /peer-state [Value: DISABLED] /rs /hostname.example.org,6020,1234 /1 /23522342.23422 [VALUE: 254] /12340993.22342 [VALUE: 0] /2 /23522342.23422 [VALUE: 34] /12340993.22342 [VALUE: 0] /hostname2.example.org,6020,1234 /1 /23522348.23443 [VALUE: 87] /12340999.22362 [VALUE: 0] /2 /23522348.23443 [VALUE: 127] /12340999.22362 [VALUE: 0] Thursday, July 25, 13
  20. 20. @Twitter 20 Status znode Master kill switch Controlled by start_replication, stop_replication Be careful what you wish for /hbase/replication /state [VALUE: true] Thursday, July 25, 13
  21. 21. @Twitter 21 Peers znode A set of remote clusters registered as possible replication targets Identified by peer id Contains status of each peer cluster /hbase/replication /peers /1 [Value: zk1.host.com,zk2.host.com,zk3.host.com:2181:/hbase] /peer-state [Value: ENABLED] /2 [Value: zk5.host.com,zk6.host.com,zk7.host.com:2181:/hbase] /peer-state [Value: DISABLED] Thursday, July 25, 13
  22. 22. @Twitter 22 Queues znode Queues identified by region server and peer id Queues contain list of HLogs and current position in log /hbase/replication /rs /hostname.example.org,6020,1234 /1 /23522342.23422 [VALUE: 254] /12340993.22342 [VALUE: 0] /2 /23522342.23422 [VALUE: 34] /12340993.22342 [VALUE: 0] /hostname2.example.org,6020,1234 /1 /23522348.23443 [VALUE: 87] /12340999.22362 [VALUE: 0] /2 /23522348.23443 [VALUE: 127] /12340999.22362 [VALUE: 0] Thursday, July 25, 13

×