Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2013: Apache HBase Replication


Published on

Presented by: Chris Trezzo, Twitter

Published in: Technology
  • Be the first to comment

HBaseCon 2013: Apache HBase Replication

  1. 1. @TwitterAds | Confidential @ctrezzo HBaseCon 2013 Apache HBase Replication
  2. 2. 2@Twitter About me Active contributor to Apache HBase Software Engineer @ Twitter Core Storage Team - Hadoop/HBase Follow me @ctrezzo
  3. 3. 3@Twitter Agenda Introduction High-level Architecture Replication State Path of a replicated edit Replication Source Replication Sink Replication Source Manager
  4. 4. 4@Twitter HBase replication Asynchronously copy data between two HBase clusters Push-based architecture WAL shipping technique similar to MySQL
  5. 5. 5@Twitter Guarantees of replication Eventually consistent Deliver updates at least once Atomicity of individual updates will be preserved
  6. 6. 6@Twitter Administering Replication Simply set parameter in hbase-site.xml hbase.replication => true Setup replication topologies add_peer, remove_peer, disable_peer, enable_peer, list_peers Create/Alter tables with replication scope set REPLICATION_SCOPE => ‘1’
  7. 7. 7@Twitter High-Level Architecture
  8. 8. 8@Twitter Replication State Persistently stored in Zookeeper Status Master kill switch Peers List of remote target clusters Queues List of remaining HLogs to replicate and current position in each log
  9. 9. 9@Twitter Path of a replicated edit
  10. 10. 10@Twitter Path of a replicated edit
  11. 11. 11@Twitter End-point for shipping WAL entries One instance for each queue Runs as a separate thread on region server Uses AdminProtocol RPC to synchronously ship entries Filters edits based on replication scope Replication Source
  12. 12. 12@Twitter Replication Sink End-point for receiving shipped WAL entries One instance per region server Synchronously receives entries and applies them using HTable Batches rows in the same table
  13. 13. 13@Twitter Load balancing Balances load on remote cluster using randomization Ships edits to random subset of remote region servers Default is 10%
  14. 14. 14@Twitter Path of a replicated edit
  15. 15. 15@Twitter Replication Source Manager Manages all replication sources Manages change in replication state Log rolling Region server failure Addition/deletion of peer clusters
  16. 16. 16@Twitter High-Level Architecture
  17. 17. 17@Twitter Additional Resources Apache HBase user mailing list Apache HBase reference guide Tweet me @ctrezzo
  18. 18. @TwitterAds | Confidential Questions?
  19. 19. 19@Twitter Replication State Persistently stored in Zookeeper Three major replication znodes: Status, Peers, Queues
  20. 20. 20@Twitter Status znode Master kill switch Controlled by start_replication, stop_replication Be careful what you wish for
  21. 21. 21@Twitter Peers znode A set of remote clusters registered as possible replication targets Identified by peer id Contains status of each peer cluster
  22. 22. 22@Twitter Queues znode Queues identified by region server and peer id Queues contain list of HLogs and current position in log