Successfully reported this slideshow.

HBaseCon 2013: Apache HBase Replication



1 of 22
1 of 22

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

HBaseCon 2013: Apache HBase Replication

  1. 1. @TwitterAds | Confidential @ctrezzo HBaseCon 2013 Apache HBase Replication
  2. 2. 2@Twitter About me Active contributor to Apache HBase Software Engineer @ Twitter Core Storage Team - Hadoop/HBase Follow me @ctrezzo
  3. 3. 3@Twitter Agenda Introduction High-level Architecture Replication State Path of a replicated edit Replication Source Replication Sink Replication Source Manager
  4. 4. 4@Twitter HBase replication Asynchronously copy data between two HBase clusters Push-based architecture WAL shipping technique similar to MySQL
  5. 5. 5@Twitter Guarantees of replication Eventually consistent Deliver updates at least once Atomicity of individual updates will be preserved
  6. 6. 6@Twitter Administering Replication Simply set parameter in hbase-site.xml hbase.replication => true Setup replication topologies add_peer, remove_peer, disable_peer, enable_peer, list_peers Create/Alter tables with replication scope set REPLICATION_SCOPE => ‘1’
  7. 7. 7@Twitter High-Level Architecture
  8. 8. 8@Twitter Replication State Persistently stored in Zookeeper Status Master kill switch Peers List of remote target clusters Queues List of remaining HLogs to replicate and current position in each log
  9. 9. 9@Twitter Path of a replicated edit
  10. 10. 10@Twitter Path of a replicated edit
  11. 11. 11@Twitter End-point for shipping WAL entries One instance for each queue Runs as a separate thread on region server Uses AdminProtocol RPC to synchronously ship entries Filters edits based on replication scope Replication Source
  12. 12. 12@Twitter Replication Sink End-point for receiving shipped WAL entries One instance per region server Synchronously receives entries and applies them using HTable Batches rows in the same table
  13. 13. 13@Twitter Load balancing Balances load on remote cluster using randomization Ships edits to random subset of remote region servers Default is 10%
  14. 14. 14@Twitter Path of a replicated edit
  15. 15. 15@Twitter Replication Source Manager Manages all replication sources Manages change in replication state Log rolling Region server failure Addition/deletion of peer clusters
  16. 16. 16@Twitter High-Level Architecture
  17. 17. 17@Twitter Additional Resources Apache HBase user mailing list Apache HBase reference guide Tweet me @ctrezzo
  18. 18. @TwitterAds | Confidential Questions?
  19. 19. 19@Twitter Replication State Persistently stored in Zookeeper Three major replication znodes: Status, Peers, Queues
  20. 20. 20@Twitter Status znode Master kill switch Controlled by start_replication, stop_replication Be careful what you wish for
  21. 21. 21@Twitter Peers znode A set of remote clusters registered as possible replication targets Identified by peer id Contains status of each peer cluster
  22. 22. 22@Twitter Queues znode Queues identified by region server and peer id Queues contain list of HLogs and current position in log