• Save
HBase Replication
Upcoming SlideShare
Loading in...5
×
 

HBase Replication

on

  • 1,398 views

A talk given on HBase Replication at HBaseCon 2013.

A talk given on HBase Replication at HBaseCon 2013.

Statistics

Views

Total Views
1,398
Views on SlideShare
1,398
Embed Views
0

Actions

Likes
4
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

HBase Replication HBase Replication Presentation Transcript

  • @TwitterAds | Confidential @ctrezzo HBaseCon 2013 Apache HBase Replication Thursday, July 25, 13
  • @Twitter 2 About me Active contributor to Apache HBase Software Engineer @ Twitter Core Storage Team - Hadoop/HBase Follow me @ctrezzo Thursday, July 25, 13
  • @Twitter 3 Agenda Introduction High-level Architecture Replication State Path of a replicated edit Replication Source Replication Sink Replication Source Manager Thursday, July 25, 13
  • @Twitter 4 HBase replication Asynchronously copy data between two HBase clusters Push-based architecture WAL shipping technique similar to MySQL Thursday, July 25, 13
  • @Twitter 5 Guarantees of replication Eventually consistent Deliver updates at least once Atomicity of individual updates will be preserved Thursday, July 25, 13
  • @Twitter 6 Administering Replication Simply set parameter in hbase-site.xml hbase.replication => true Setup replication topologies add_peer, remove_peer, disable_peer, enable_peer, list_peers Create/Alter tables with replication scope set REPLICATION_SCOPE => ‘1’ Thursday, July 25, 13
  • @Twitter 7 High-Level Architecture ReplicationSource Manager ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog ReplicationSource Region Server ReplicationSink HTable Region Server Cluster 3 1 /state /peers /1 /2 /rs Zookeeper 123 Replication Admin Thursday, July 25, 13
  • @Twitter 8 Replication State Persistently stored in Zookeeper Status Master kill switch Peers List of remote target clusters Queues List of remaining HLogs to replicate and current position in each log Thursday, July 25, 13
  • @Twitter 9 Path of a replicated edit ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 Thursday, July 25, 13
  • @Twitter 10 Path of a replicated edit ReplicationSource Region Server 1 Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 ReplicationSource Region Server 2 Region Server ReplicationSink HTable Region Server 1 2 4 3 HLog 12 ReplicationSource Region Server X Region Server ReplicationSink HTable Region Server 1 2 4 3 HLog 1 Thursday, July 25, 13
  • @Twitter End-point for shipping WAL entries One instance for each queue Runs as a separate thread on region server Uses AdminProtocol RPC to synchronously ship entries Filters edits based on replication scope ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 11 Replication Source Thursday, July 25, 13
  • @Twitter 12 Replication Sink End-point for receiving shipped WAL entries One instance per region server Synchronously receives entries and applies them using HTable Batches rows in the same table ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 Thursday, July 25, 13
  • @Twitter 13 Load balancing Balances load on remote cluster using randomization Ships edits to random subset of remote region servers Default is 10% Cluster 2 20 Region Servers Cluster 1 Thursday, July 25, 13
  • @Twitter 14 Path of a replicated edit ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog 123 Thursday, July 25, 13
  • @Twitter 15 Replication Source Manager Manages all replication sources Manages change in replication state Log rolling Region server failure Addition/deletion of peer clusters ReplicationSource Manager ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog ReplicationSource Region Server ReplicationSink HTable Region Server Cluster 3 1 /state /peers /1 /2 /rs Zookeeper 123 Thursday, July 25, 13
  • @Twitter 16 High-Level Architecture ReplicationSource Manager ReplicationSource Region Server Region Server ReplicationSink HTable Region Server Cluster 2Cluster 1 1 2 4 3 HLog ReplicationSource Region Server ReplicationSink HTable Region Server Cluster 3 1 /state /peers /1 /2 /rs Zookeeper 123 Replication Admin Thursday, July 25, 13
  • @Twitter 17 Additional Resources Apache HBase user mailing list user@hbase.apache.org Apache HBase reference guide https://hbase.apache.org/book.html Tweet me @ctrezzo Thursday, July 25, 13
  • @TwitterAds | Confidential Questions? Thursday, July 25, 13
  • @Twitter 19 Replication State Persistently stored in Zookeeper Three major replication znodes: Status, Peers, Queues /hbase/replication /state [VALUE: true] /peers /1 [Value: zk1.host.com,zk2.host.com,zk3.host.com:2181:/hbase] /peer-state [Value: ENABLED] /2 [Value: zk5.host.com,zk6.host.com,zk7.host.com:2181:/hbase] /peer-state [Value: DISABLED] /rs /hostname.example.org,6020,1234 /1 /23522342.23422 [VALUE: 254] /12340993.22342 [VALUE: 0] /2 /23522342.23422 [VALUE: 34] /12340993.22342 [VALUE: 0] /hostname2.example.org,6020,1234 /1 /23522348.23443 [VALUE: 87] /12340999.22362 [VALUE: 0] /2 /23522348.23443 [VALUE: 127] /12340999.22362 [VALUE: 0] Thursday, July 25, 13
  • @Twitter 20 Status znode Master kill switch Controlled by start_replication, stop_replication Be careful what you wish for /hbase/replication /state [VALUE: true] Thursday, July 25, 13
  • @Twitter 21 Peers znode A set of remote clusters registered as possible replication targets Identified by peer id Contains status of each peer cluster /hbase/replication /peers /1 [Value: zk1.host.com,zk2.host.com,zk3.host.com:2181:/hbase] /peer-state [Value: ENABLED] /2 [Value: zk5.host.com,zk6.host.com,zk7.host.com:2181:/hbase] /peer-state [Value: DISABLED] Thursday, July 25, 13
  • @Twitter 22 Queues znode Queues identified by region server and peer id Queues contain list of HLogs and current position in log /hbase/replication /rs /hostname.example.org,6020,1234 /1 /23522342.23422 [VALUE: 254] /12340993.22342 [VALUE: 0] /2 /23522342.23422 [VALUE: 34] /12340993.22342 [VALUE: 0] /hostname2.example.org,6020,1234 /1 /23522348.23443 [VALUE: 87] /12340999.22362 [VALUE: 0] /2 /23522348.23443 [VALUE: 127] /12340999.22362 [VALUE: 0] Thursday, July 25, 13