HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

1,951 views
1,783 views

Published on

This session is a case study of how we used our already existing HBase cluster as content addressable storage for BLOBs. We will discuss how we wrote a CAS implementation using HBase as the backend, Scala and Finagle as the application and using caching reverse proxies (i.e. Varnish in our case) for serving BLOBs at scale. The talk will dicuss why content addressable storage is the right pattern for many web use cases, how to foster an already existing HBase cluster for better usage of possibly underutilized resources, and operational gotchas to store and serve BLOBs from HBase at scale.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,951
On SlideShare
0
From Embeds
0
Number of Embeds
116
Actions
Shares
0
Downloads
60
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir, StumbleUpon

  1. 1. Content Addressable Storagesfor fun and profit.
  2. 2. Berk D. Demir @bddemir
  3. 3. I break and fix things at @StumbleUpon.
  4. 4. Problem.
  5. 5. Serve lots of static assets with low latency and high availability.
  6. 6. Understanding data.
  7. 7. A lot. 100 million.Small. 19 kilobytes.Frequent. Updates.
  8. 8. BLOBs don’t change. They get replaced. We want to keep all without duplicating.
  9. 9. ContentAddressable Store
  10. 10. Store immutable content andauthoritatively address it with a cryptographic hash.
  11. 11. We had ideas.
  12. 12. Very bad ideas.
  13. 13. Very bad.Shared Storage, i.e., NFS.
  14. 14. Bad ideas.
  15. 15. Bad.AWS S3, RS Cloud Files, ... Distributed: AFS, Gluster HDFS (Oh my!)
  16. 16. Bad ideas. Take 2
  17. 17. o_O Write a distributed, faulttolerant, replicating, multi datacenter, fast, CAS for BLOBs.
  18. 18. Reimplementing a lot ofthings is generally not a good sign.
  19. 19. Reuse.Don’t reimplement.
  20. 20. HBase Distributed, Fault tolerant, Replicating,Multi datacenter, Fast.
  21. 21. Immutable rows with compact keys, separated into differentcolumn families based on their access patterns.
  22. 22. One table to rule them all. m: d: MD5 16 bytes Metadata BLOB(SHA-1 20 bytes) 9 bytes Many bytes MAX_FILESIZE => 20G, VERSION => 1, BLOCKCACHE => true, BLOOMFILTER => ROW Pre-split into 512 regions at table creation time.
  23. 23. Scala, Finagle,asynchbase, Varnish
  24. 24. HTTP has a lot to offer.
  25. 25. VerbsGET HEAD PUT DELETEGET /KwIEec5utYGrKmzXYLgFzg HTTP/1.1Host: b9.sustatic.com
  26. 26. HeadersCache-Control: max-age=<1 year>Last-Modified: <cell timestamp>Content-MD5: <row key: base64>Content-Disposition: attachment; filename=su.xpi
  27. 27. HBase and HTTP are the perfect tools to build simple, reliable, fast data services.
  28. 28. Get excited and build things!
  29. 29. Thanks.
  30. 30. Like the design of this slide deck?Direct your positive feedback to Coda Hale (@coda)

×