Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase

2,236 views

Published on

Blackbird is a large-scale object store built at Rocket Fuel, which stores 100+ TB of data and provides real time access to 10 billion+ objects in a 2-3 milliseconds at a rate of 1 million+ times per second. In this talk (an update from HBaseCon 2014), we will describe Blackbird's comprehensive collections API and various examples of how it can be used to model collections like sets, maps, and aggregates on these collections like counters, etc. We will also illustrate the flexibility and power of the API by modeling custom collection types that are unique to the Rocket Fuel context.

Published in: Software

HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase

  1. 1. Blackbird Collections In-situ Stream Processing in HBase Ishan Chhabra, Nitin Aggarwal, Deepankar Reddy Rocketfuel Inc.
  2. 2. Quick Recap: Blackbird @ Rocketfuel Inc.
  3. 3. Quick Recap: Blackbird @ Rocketfuel Inc.
  4. 4. Collections: At the center of all non-trivial applications
  5. 5. Usual first attempt: Read Modify Write Read existing 100 elementsEvent Write 101 elements
  6. 6. Problem: Not sympathetic to HBase Internals Memstore bloat Constant flushing Compactions
  7. 7. Problem: Asymptotic network usage DC 1 DC 2 Write 1: 101 elements Write 2: 48 elements
  8. 8. Problem: Concurrency Bug Read existing 100 elements Event 1 Write 101 elements 100 elements Read existing 100 elements Write 101 elements Event 2 101 elements
  9. 9. Append Only Collections: Be sympathetic to HBase Internals Trivial example: Lists But what about Sets, Maps, Counters, etc. and domain specific collections?
  10. 10. Working example: SegmentSet
  11. 11. Working example: SegmentSet Keep the latest entry only for a segment Atmost 1000 of the most recently updated segments
  12. 12. Blackbird Collections: Logical Model Collection of entries Can only add elements to it Apply series of functions during read to enforce properties
  13. 13. Blackbird Collections: Logical Model For every collection: Define the structure Define the series of functions f1, f2, f3… to apply during read
  14. 14. So how would SegmentSet work? Initial State in DB
  15. 15. So how would SegmentSet work? Adding a new element in DB
  16. 16. So how would SegmentSet work? Enforce properties during read at client
  17. 17. Blackbird Collections: HBase Implementation
  18. 18. Combined Column: 100 entries views:4587 1 entry views:2398 Logical Collection 2 entries views:6798 1 entry views:2983 Step 1: Write appends to separate columns
  19. 19. Step 2: Apply the functions during reads Combined Column: 100 entries 1 entry 2 entries 1 entry 104 entries 92 entries f1, f2, …
  20. 20. But what about all the garbage that is building up?
  21. 21. Step 3: Normalization Combined Column: 100 entries 1 entry 2 entries 1 entry 92 entries
  22. 22. Step 3: Normalization 2 kinds of runs: nightly and weekly Nightly run only looks at subset of data (data changed that day) Weekly run looks at all the data
  23. 23. Step 3: Normalization Heavily optimized: < 1h for nightly run and 2- 3h for weekly run (~50TB of data) Made fast by MR over snapshots and bulkloads No impact on live read performance
  24. 24. Blackbird Collections: Updated Logical Model Collection of entries Can only add elements to it Apply series of functions during reads Apply series of functions during daily normalizationApply series of functions during weekly normalization
  25. 25. Another Example: Transient Counters
  26. 26. Another Example: Transient Counters Be able to increment/decrement counts Remove entries if timestamp + time to live < current time Keep the latest 1000 entries only
  27. 27. Another Example: Transient Counters aggregate() Expire() limit_to_1000()
  28. 28. Another Example: Transient Counters aggregate() expire()During read: aggregate() expire() limit_to_1000() Daily Normalization: aggregate() expire() limit_to_1000() Weekly Normalization:
  29. 29. Conclusion
  30. 30. Thank you! Questions? Reach us at: ishan@rocketfuel.com naggarwal@rocketfuel.com dreddy@rocketfuel.com

×