Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

This session is a case study of how we used our already existing HBase cluster as content addressable storage for BLOBs. We will discuss how we wrote a CAS implementation using HBase as the backend, Scala and Finagle as the application and using caching reverse proxies (i.e. Varnish in our case) for serving BLOBs at scale. The talk will dicuss why content addressable storage is the right pattern for many web use cases, how to foster an already existing HBase cluster for better usage of possibly underutilized resources, and operational gotchas to store and serve BLOBs from HBase at scale.

  • Be the first to comment

  • Be the first to like this

HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

  1. 1. Content Addressable Storagesfor fun and profit.
  2. 2. Berk D. Demir @bddemir
  3. 3. I break and fix things at @StumbleUpon.
  4. 4. Problem.
  5. 5. Serve lots of static assets with low latency and high availability.
  6. 6. Understanding data.
  7. 7. A lot. 100 million.Small. 19 kilobytes.Frequent. Updates.
  8. 8. BLOBs don’t change. They get replaced. We want to keep all without duplicating.
  9. 9. ContentAddressable Store
  10. 10. Store immutable content andauthoritatively address it with a cryptographic hash.
  11. 11. We had ideas.
  12. 12. Very bad ideas.
  13. 13. Very bad.Shared Storage, i.e., NFS.
  14. 14. Bad ideas.
  15. 15. Bad.AWS S3, RS Cloud Files, ... Distributed: AFS, Gluster HDFS (Oh my!)
  16. 16. Bad ideas. Take 2
  17. 17. o_O Write a distributed, faulttolerant, replicating, multi datacenter, fast, CAS for BLOBs.
  18. 18. Reimplementing a lot ofthings is generally not a good sign.
  19. 19. Reuse.Don’t reimplement.
  20. 20. HBase Distributed, Fault tolerant, Replicating,Multi datacenter, Fast.
  21. 21. Immutable rows with compact keys, separated into differentcolumn families based on their access patterns.
  22. 22. One table to rule them all. m: d: MD5 16 bytes Metadata BLOB(SHA-1 20 bytes) 9 bytes Many bytes MAX_FILESIZE => 20G, VERSION => 1, BLOCKCACHE => true, BLOOMFILTER => ROW Pre-split into 512 regions at table creation time.
  23. 23. Scala, Finagle,asynchbase, Varnish
  24. 24. HTTP has a lot to offer.
  25. 25. VerbsGET HEAD PUT DELETEGET /KwIEec5utYGrKmzXYLgFzg HTTP/1.1Host:
  26. 26. HeadersCache-Control: max-age=<1 year>Last-Modified: <cell timestamp>Content-MD5: <row key: base64>Content-Disposition: attachment; filename=su.xpi
  27. 27. HBase and HTTP are the perfect tools to build simple, reliable, fast data services.
  28. 28. Get excited and build things!
  29. 29. Thanks.
  30. 30. Like the design of this slide deck?Direct your positive feedback to Coda Hale (@coda)