Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2015: Events @ Box - Using HBase as a Message Queue


Published on

Box's /events API powers our desktop sync experience and provides users with a realtime, guaranteed-delivery event stream. To do that, we use HBase to store and serve a separate message queue for each of 30+ million users. Learn how we implemented queue semantics, were able to replicate our queues between clusters to enable transparent client failover, and why we chose to build a queueing system on top of HBase.

Published in: Software
  • Be the first to comment

HBaseCon 2015: Events @ Box - Using HBase as a Message Queue

  1. 1. 1 David MacKenzie Box Engineering @davrmac @BoxEng /events @ Box: Using HBase as a message queue
  2. 2. 2 Share, manage and access your content from any device, anywhere
  3. 3. 3 What is the /events API? • Realtime stream of all activity happening within a user’s account • GET /events?stream_position=1234&stream_type=all • Persistent and re-playable 1 2 3 4 5 Client
  4. 4. 4 Why did we build it? • Main use-case was sync  switch from batch to incremental diffs • Several requirements arose from the sync use case: ‒ Guaranteed delivery ‒ Clients can be offline for days at a time ‒ Arbitrary number of clients consuming each user’s stream Persistence Re-playability
  5. 5. 5 How is it implemented? • Each user assigned a separate section of the HBase key-space • Messages are stored in order from oldest to newest within a user’s section of the key-space • Reads map directly to scans from the provided position to the user’s end key • Row key structure: <pseudo-random prefix>_<user_id>_<position> 2-bytes of user_id sha1 Millisecond timestamp
  6. 6. 6 Using a timestamp as a queue position • Pro: Allows for allocating roughly monotonically increasing positions with no co-ordination between write requests • Con: Isn’t sufficient to guarantee append-only semantics in the presence of parallel writes Write Write 2 Write R e a d 1 2 R e a d
  7. 7. 7 Time-bounding and Back-scanning • Need to ensure that clients don’t advance their stream positions past writes that will eventually succeed ‒ But clients do need to advance position eventually ‒ How do we know when it’s safe? • Solution: time-bound writes and back-scan reads ‒ Time-bounding: every write to HBase must complete within a fixed time-bound to be considered successful ‒ No guaranteed delivery for unsuccessful writes. ‒ Clients should retry failed writes at higher stream positions. ‒ Back-scanning: clients cannot advance their stream positions further than (current time – back-scan interval) ‒ Back-scan interval >= write time-bound • Provides guaranteed delivery but at the cost of duplicate events
  8. 8. 8 3 Write Write R e a d 2 3 Write R e a d 1 2 3 Write R e a dWrite 4
  9. 9. 9 Replication • Master/slave architecture ‒ One cluster per DC ‒ Master cluster handles all reads and writes ‒ Slave clusters are passive replicas • On promotion, clients transparently fail over to the new master cluster • Can’t use native HBase replication directly ‒ Could cause clients to miss events when failing over to a lagging cluster Replication 1 2 1 Failover Replication 1 2 1 Write R e a d3
  10. 10. 10 Replication Contd. • Replication system needs to be aware of master/slave failovers ‒ Stop exactly replicating messages. Start appending messages to the current ends of the queues. • Currently, use a client-level replication system piggy backing on MySQL replication • Plan to switch to a system that hooks into HBase replication by configuring itself as a slave HBase cluster 1 2 1 Failover 1 2 1 3 4 R e a d
  11. 11. 11 Why HBase? • Closest off-the-rack queuing system is Kafka ‒ Developed at LinkedIn. Open sourced in 2011. ‒ Originally built to power LinkedIn’s analytics pipeline ‒ Very similar model built around “ordered commit logs” ‒ Allow for easy addition of new subscribers ‒ Allow for varying subscriber consumption patterns  slow subscribers don’t back up the pipeline
  12. 12. 12 Why HBase and not Kafka? • Better consistency vs. availability tradeoffs ‒ No automatic rack aware replica placement ‒ No automatic replica re-assignment upon replica failure ‒ On replica failure, no fast failover of new writes to new replicas. ‒ Can’t require minimum replication factor for new writes without significantly impacting availability on replica failure • Replication support ‒ Not enough control over Kafka queue positions to implement transparent client failovers between replica clusters • Unable to scale to millions of topics ‒ Currently tops out in the tens of thousands of topics. ‒ Design requires very granular topic tracking. Barrier to scale.
  13. 13. 13 In conclusion… • We were able to leverage HBase to store millions of guaranteed delivery message queues, each of which was: ‒ replicated between data centers ‒ independently consumable by an arbitrary number of clients • Cluster metrics: ‒ ~30 nodes per cluster ‒ 15K write/sec at peak. Bursts of up to 40K writes/sec. ‒ 50K-60K requests/sec at peak.
  14. 14. 14 Questions? Twitter @davrmac @BoxEng Engineering Blog Platform Open Source