HBase, Hadoop World NYC <ul><li>Ryan Rawson, Stumbleupon.com, su.pr </li></ul><ul><li>Jonathan Gray, Streamy.com </li></ul>
A presentation in 2 parts
Part 1
About Me <ul><li>Ryan Rawson </li></ul><ul><li>Senior Software Developer @ Stumbleupon </li></ul><ul><li>HBase committer, ...
Stumbleupon <ul><li>Uses HBase in production </li></ul><ul><li>Behind features of our su.pr service </li></ul><ul><li>More...
Adventures with MySQL <ul><li>Scaling MySQL hard, Oracle expensive (and hard) </li></ul><ul><li>Machine cost goes up faste...
MySQL problems cont. <ul><li>Tables can be a problem at sizes as low as 500GB </li></ul><ul><li>Hard to read data quickly ...
Limitations of masters <ul><li>What if your write speed is greater than a single machine? </li></ul><ul><li>All slaves mus...
Sharding
Sharding problems <ul><li>Requires either a hashing function or mapping table to determine shard </li></ul><ul><li>Data ac...
Resharding!
What about schema changes? <ul><li>What about schema changes or migrations? </li></ul><ul><li>MySQL not your friend here <...
HBase to the rescue <ul><li>Clustered, commodity(ish) hardware </li></ul><ul><li>Mostly schema-less </li></ul><ul><li>Dyna...
What is HBase? <ul><li>HBase is an open-source distributed database, inspired by Google’s bigtable </li></ul><ul><li>Part ...
HBase storage model <ul><li>Column-oriented database </li></ul><ul><li>Column name is arbitrary data, can have large, vari...
 
 
Tables <ul><li>Table is split into roughly equal sized “regions” </li></ul><ul><li>Each region is a contiguous range of ke...
Server architecture <ul><li>Similar to HDFS: </li></ul><ul><ul><li>Master = Namenode (ish) </li></ul></ul><ul><ul><li>Regi...
Server Architecture 2 <ul><li>But not quite the same, HBase stores state in HDFS </li></ul><ul><li>HDFS provides robust da...
Region assignment <ul><li>Each region from every table is assigned to a Regionserver </li></ul><ul><li>The master is respo...
Master Duties <ul><li>When machines fail, move regions from affected machines to others </li></ul><ul><li>When regions spl...
What Master does NOT do <ul><li>Does not handle any write requests (not a DB master!) </li></ul><ul><li>Does not handle lo...
Distributed coordination <ul><li>To manage master election and server availability we use ZooKeeper </li></ul><ul><li>Set ...
Scaling HBase <ul><li>Add more machines to scale </li></ul><ul><li>Base model (bigtable) scales past 1000TB </li></ul><ul>...
What to store in HBase? <ul><li>Maybe not your raw log data... </li></ul>
<ul><li>... but the results of processing it with Hadoop! </li></ul><ul><li>By storing the refined version in HBase, can k...
<ul><li>Provides a real time, structured storage layer that integrates on your existing Hadoop clusters </li></ul><ul><li>...
HBase @
Stumbleupon & HBase <ul><li>Started investigating the field in Jan ’09 </li></ul><ul><li>Looked at 3 top (at the time) cho...
Stumbleupon & HBase <ul><li>Picked HBase: </li></ul><ul><ul><li>Community </li></ul></ul><ul><ul><li>Features </li></ul></...
su.pr marketing <ul><li>“ Su.pr is the only URL shortener that also helps your content get discovered! Every Su.pr URL exp...
su.pr tech features <ul><li>Real time stats </li></ul><ul><ul><li>Done directly in HBase </li></ul></ul><ul><li>In depth s...
su.pr web access <ul><li>Using thrift gateway, php code accesses HBase </li></ul><ul><li>No additional caching other than ...
Large data storage <ul><li>Over 9 billion rows and 1300 GB in HBase </li></ul><ul><li>Can map reduce a 700GB table in ~ 20...
Micro read benches <ul><li>Single reads are 1-10ms depending on disk seeks and caching </li></ul><ul><li>Scans can return ...
Serial read speeds <ul><li>A small table </li></ul><ul><li>A bigger table </li></ul><ul><li>(removed printlns from the cod...
Deployment considerations <ul><li>Zookeeper requires IO to complete ops </li></ul><ul><li>Consider hosting on dedicated ma...
What to put on your nodes <ul><li>Regionserver requires 2-4 cores and 3gb+ </li></ul><ul><li>Can’t run HDFS, HBase, maps, ...
Garbage collection <ul><li>GC tuning becomes important. </li></ul><ul><li>Quick tip: use CMS, use -Xmx4000m </li></ul><ul>...
Batch and interactive <ul><li>These may not be compatible </li></ul><ul><li>Latency goes up with heavy batch load </li></u...
Part 2
HBase @ Streamy <ul><li>History of Data </li></ul><ul><li>RDBMS Issues </li></ul><ul><li>HBase to the Rescue </li></ul><ul...
About Me <ul><li>Co-Founder and CTO of Streamy.com </li></ul><ul><li>HBase Committer </li></ul><ul><li>Migrated Streamy fr...
History of Data The Prototype <ul><li>Streamy 1.0 built on PostgreSQL </li></ul><ul><ul><ul><li>All of the bells and whist...
History of Data The Alpha <ul><li>Streamy 1.5 built on  optimized  PostgreSQL </li></ul><ul><ul><ul><li>Remove bells and w...
History of Data The Beta <ul><li>Streamy 2.0 built entirely on HBase </li></ul><ul><ul><ul><li>Custom caches, query engine...
RDBMS Issues <ul><li>Poor disk usage patterns </li></ul><ul><li>Black box query engine </li></ul><ul><li>Write speed degra...
The Read Problem <ul><li>View 30 newest unread stories from blogs </li></ul><ul><ul><li>Not RDBMS friendly, no early-out <...
The Write Problem <ul><li>Rapidly growing items table </li></ul><ul><ul><li>Crawl index from 1k to 100k feeds </li></ul></...
RDBMS Conclusions <ul><li>Enormous functionality and flexibility </li></ul><ul><ul><ul><li>But you throw it out the door a...
What We Wanted <ul><li>Transparent partitioning </li></ul><ul><li>Transparent distribution </li></ul><ul><li>Fast random w...
What We Got <ul><li>Transparent partitioning </li></ul><ul><li>Transparent distribution </li></ul><ul><li>Fast random writ...
What Else We Got <ul><li>Transparent replication </li></ul><ul><li>High availability </li></ul><ul><li>MapReduce </li></ul...
HBase @ Streamy  Today
HBase @ Streamy  Today <ul><li>All data stored in HBase </li></ul><ul><li>Additional caching of hot data </li></ul><ul><li...
HBase @ Streamy  Tomorrow <ul><li>Thumbnail media server </li></ul><ul><li>Slave replication for Backup/DR </li></ul><ul><...
HBase on a Budget <ul><li>HBase works on cheap nodes </li></ul><ul><ul><ul><li>But you need a cluster (5+ nodes) </li></ul...
Lessons Learned <ul><li>Layer of abstraction helps tremendously </li></ul><ul><ul><ul><li>Internal Streamy Data API </li><...
What’s Next for HBase <ul><li>Inter-cluster / Inter-DC replication </li></ul><ul><ul><ul><li>Slave and Multi-Master </li><...
HBase Information <ul><li>Home Page  http://hbase.org </li></ul><ul><li>Wiki  http://wiki.apache.org/hadoop/Hbase </li></u...
Upcoming SlideShare
Loading in...5
×

Hw09 Practical HBase Getting The Most From Your H Base Install

11,613

Published on

Published in: Technology
0 Comments
18 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
11,613
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
533
Comments
0
Likes
18
Embeds 0
No embeds

No notes for slide

Hw09 Practical HBase Getting The Most From Your H Base Install

  1. 1. HBase, Hadoop World NYC <ul><li>Ryan Rawson, Stumbleupon.com, su.pr </li></ul><ul><li>Jonathan Gray, Streamy.com </li></ul>
  2. 2. A presentation in 2 parts
  3. 3. Part 1
  4. 4. About Me <ul><li>Ryan Rawson </li></ul><ul><li>Senior Software Developer @ Stumbleupon </li></ul><ul><li>HBase committer, core contributor </li></ul>
  5. 5. Stumbleupon <ul><li>Uses HBase in production </li></ul><ul><li>Behind features of our su.pr service </li></ul><ul><li>More later </li></ul>
  6. 6. Adventures with MySQL <ul><li>Scaling MySQL hard, Oracle expensive (and hard) </li></ul><ul><li>Machine cost goes up faster speed </li></ul><ul><li>Turn off all relational features to scale </li></ul><ul><li>Turn off secondary (!) indexes too! (!!) </li></ul>
  7. 7. MySQL problems cont. <ul><li>Tables can be a problem at sizes as low as 500GB </li></ul><ul><li>Hard to read data quickly at these sizes </li></ul><ul><li>Future doesn’t look so bright as we contemplate 10x sizes </li></ul><ul><li>MySQL master becomes a problem... </li></ul>
  8. 8. Limitations of masters <ul><li>What if your write speed is greater than a single machine? </li></ul><ul><li>All slaves must have same write capacity as master (can’t cheap out on slaves) </li></ul><ul><li>Single point of failure, no easy failover </li></ul><ul><li>Can (sort of) solve this with sharding... </li></ul>
  9. 9. Sharding
  10. 10. Sharding problems <ul><li>Requires either a hashing function or mapping table to determine shard </li></ul><ul><li>Data access code becomes complex </li></ul><ul><li>What if shard sizes become too large... </li></ul>
  11. 11. Resharding!
  12. 12. What about schema changes? <ul><li>What about schema changes or migrations? </li></ul><ul><li>MySQL not your friend here </li></ul><ul><li>Only gets harder with more data </li></ul>
  13. 13. HBase to the rescue <ul><li>Clustered, commodity(ish) hardware </li></ul><ul><li>Mostly schema-less </li></ul><ul><li>Dynamic distribution </li></ul><ul><li>Spreads writes out over the cluster </li></ul>
  14. 14. What is HBase? <ul><li>HBase is an open-source distributed database, inspired by Google’s bigtable </li></ul><ul><li>Part of the Hadoop ecosystem </li></ul><ul><li>Layers on HDFS for storage </li></ul><ul><li>Native connections to map reduce </li></ul>
  15. 15. HBase storage model <ul><li>Column-oriented database </li></ul><ul><li>Column name is arbitrary data, can have large, variable, number of columns per row </li></ul><ul><li>Rows stored in sorted order </li></ul><ul><li>Can random read and write </li></ul>
  16. 18. Tables <ul><li>Table is split into roughly equal sized “regions” </li></ul><ul><li>Each region is a contiguous range of keys, from [start, to end) </li></ul><ul><li>Regions split as they grow, thus dynamically adjusting to your data set </li></ul>
  17. 19. Server architecture <ul><li>Similar to HDFS: </li></ul><ul><ul><li>Master = Namenode (ish) </li></ul></ul><ul><ul><li>Regionserver = Datanode (ish) </li></ul></ul><ul><li>Often run these alongside each other! </li></ul>
  18. 20. Server Architecture 2 <ul><li>But not quite the same, HBase stores state in HDFS </li></ul><ul><li>HDFS provides robust data storage across machines, insulating against failure </li></ul><ul><li>Master and Regionserver fairly stateless and machine independent </li></ul>
  19. 21. Region assignment <ul><li>Each region from every table is assigned to a Regionserver </li></ul><ul><li>The master is responsible for assignment and noticing if (when!) regionservers go down </li></ul>
  20. 22. Master Duties <ul><li>When machines fail, move regions from affected machines to others </li></ul><ul><li>When regions split, move regions to balance cluster </li></ul><ul><li>Could move regions to respond to load </li></ul><ul><li>Can run multiple backup masters </li></ul>
  21. 23. What Master does NOT do <ul><li>Does not handle any write requests (not a DB master!) </li></ul><ul><li>Does not handle location finding requests </li></ul><ul><li>Not involved in the read/write path! </li></ul><ul><li>Generally does very little most the time </li></ul>
  22. 24. Distributed coordination <ul><li>To manage master election and server availability we use ZooKeeper </li></ul><ul><li>Set up as a cluster, provides distributed coordination primitives </li></ul><ul><li>An excellent tool for building cluster management systems </li></ul>
  23. 25. Scaling HBase <ul><li>Add more machines to scale </li></ul><ul><li>Base model (bigtable) scales past 1000TB </li></ul><ul><li>No inherent reason why HBase couldn’t </li></ul>
  24. 26. What to store in HBase? <ul><li>Maybe not your raw log data... </li></ul>
  25. 27. <ul><li>... but the results of processing it with Hadoop! </li></ul><ul><li>By storing the refined version in HBase, can keep up with huge data demands and serve to your website </li></ul>
  26. 28. <ul><li>Provides a real time, structured storage layer that integrates on your existing Hadoop clusters </li></ul><ul><li>Provides “out of the box” hookups to map-reduce. </li></ul><ul><li>Uses the same loved (or hated) management model as Hadoop </li></ul>HBase & Hadoop
  27. 29. HBase @
  28. 30. Stumbleupon & HBase <ul><li>Started investigating the field in Jan ’09 </li></ul><ul><li>Looked at 3 top (at the time) choices: </li></ul><ul><ul><li>Cassandra </li></ul></ul><ul><ul><li>Hypertable </li></ul></ul><ul><ul><li>HBase </li></ul></ul>cassandra didnt work, didnt like data model - hypertable fast but community and project viability (no major users beyond zvents) - hbase local and good community
  29. 31. Stumbleupon & HBase <ul><li>Picked HBase: </li></ul><ul><ul><li>Community </li></ul></ul><ul><ul><li>Features </li></ul></ul><ul><ul><li>Map-reduce, cascading, etc </li></ul></ul><ul><li>Now highly involved and invested </li></ul>
  30. 32. su.pr marketing <ul><li>“ Su.pr is the only URL shortener that also helps your content get discovered! Every Su.pr URL exposes your content to StumbleUpon's nearly 8 million users!” </li></ul>
  31. 33. su.pr tech features <ul><li>Real time stats </li></ul><ul><ul><li>Done directly in HBase </li></ul></ul><ul><li>In depth stats </li></ul><ul><ul><li>Use cascading, map reduce and put results in hbase </li></ul></ul>
  32. 34. su.pr web access <ul><li>Using thrift gateway, php code accesses HBase </li></ul><ul><li>No additional caching other than what HBase provides </li></ul>
  33. 35. Large data storage <ul><li>Over 9 billion rows and 1300 GB in HBase </li></ul><ul><li>Can map reduce a 700GB table in ~ 20 min </li></ul><ul><li>That is about 6 million rows/sec </li></ul><ul><li>Scales to 2x that speed on 2x the hardware </li></ul>
  34. 36. Micro read benches <ul><li>Single reads are 1-10ms depending on disk seeks and caching </li></ul><ul><li>Scans can return hundreds of rows in dozens of ms </li></ul>
  35. 37. Serial read speeds <ul><li>A small table </li></ul><ul><li>A bigger table </li></ul><ul><li>(removed printlns from the code) </li></ul>
  36. 38. Deployment considerations <ul><li>Zookeeper requires IO to complete ops </li></ul><ul><li>Consider hosting on dedicated machines </li></ul><ul><li>Namenode and HBase master can co-exist </li></ul>
  37. 39. What to put on your nodes <ul><li>Regionserver requires 2-4 cores and 3gb+ </li></ul><ul><li>Can’t run HDFS, HBase, maps, reduces on a 2 core system </li></ul><ul><li>On my 8 core systems I run datanode, regionserver, 2 maps, 2 reduces </li></ul>
  38. 40. Garbage collection <ul><li>GC tuning becomes important. </li></ul><ul><li>Quick tip: use CMS, use -Xmx4000m </li></ul><ul><li>Interested in G1 (if it ever stops crashing) </li></ul>
  39. 41. Batch and interactive <ul><li>These may not be compatible </li></ul><ul><li>Latency goes up with heavy batch load </li></ul><ul><li>May need to use 2 clusters to ensure responsive website </li></ul>
  40. 42. Part 2
  41. 43. HBase @ Streamy <ul><li>History of Data </li></ul><ul><li>RDBMS Issues </li></ul><ul><li>HBase to the Rescue </li></ul><ul><li>Streamy Today and Tomorrow </li></ul><ul><li>Future of HBase </li></ul>
  42. 44. About Me <ul><li>Co-Founder and CTO of Streamy.com </li></ul><ul><li>HBase Committer </li></ul><ul><li>Migrated Streamy from RDBMS to HBase and Hadoop in June 2008 </li></ul>
  43. 45. History of Data The Prototype <ul><li>Streamy 1.0 built on PostgreSQL </li></ul><ul><ul><ul><li>All of the bells and whistles </li></ul></ul></ul><ul><li>Powered by single low-spec node </li></ul><ul><ul><ul><li>8 core / 8 GB / 2TB / $4k </li></ul></ul></ul>Functionally powerful, Woefully slow
  44. 46. History of Data The Alpha <ul><li>Streamy 1.5 built on optimized PostgreSQL </li></ul><ul><ul><ul><li>Remove bells and whistles, add partitioning </li></ul></ul></ul><ul><li>Powered by high-powered master node </li></ul><ul><ul><ul><li>16 core / 64 GB / 15x146GB 15k RPM / $40k </li></ul></ul></ul>Less powerful, still slow ... Insanely expensive
  45. 47. History of Data The Beta <ul><li>Streamy 2.0 built entirely on HBase </li></ul><ul><ul><ul><li>Custom caches, query engines, and API </li></ul></ul></ul><ul><li>Powered by 10 low-spec nodes </li></ul><ul><ul><ul><li>4 core / 4GB / 1TB / $10k for entire cluster </li></ul></ul></ul>Less functional but fast , scalable , and cheap
  46. 48. RDBMS Issues <ul><li>Poor disk usage patterns </li></ul><ul><li>Black box query engine </li></ul><ul><li>Write speed degrades with table size </li></ul><ul><li>Transactions/MVCC unnecessary overhead </li></ul><ul><li>Expensive </li></ul>
  47. 49. The Read Problem <ul><li>View 30 newest unread stories from blogs </li></ul><ul><ul><li>Not RDBMS friendly, no early-out </li></ul></ul><ul><ul><li>PL/Python heap-merge hack helped </li></ul></ul><ul><ul><li>We knew what to do but DB didn’t listen </li></ul></ul>
  48. 50. The Write Problem <ul><li>Rapidly growing items table </li></ul><ul><ul><li>Crawl index from 1k to 100k feeds </li></ul></ul><ul><ul><li>Indexes, static content, dynamic statistics </li></ul></ul><ul><ul><li>Solutions are imperfect </li></ul></ul>
  49. 51. RDBMS Conclusions <ul><li>Enormous functionality and flexibility </li></ul><ul><ul><ul><li>But you throw it out the door at scale </li></ul></ul></ul><ul><li>Stripped down RDBMS still not attractive </li></ul><ul><li>Turned entire team into DBAs </li></ul><ul><li>Gets in the way of domain-specific optimizations </li></ul>
  50. 52. What We Wanted <ul><li>Transparent partitioning </li></ul><ul><li>Transparent distribution </li></ul><ul><li>Fast random writes </li></ul><ul><li>Good data locality </li></ul><ul><li>Fast random reads </li></ul>
  51. 53. What We Got <ul><li>Transparent partitioning </li></ul><ul><li>Transparent distribution </li></ul><ul><li>Fast random writes </li></ul><ul><li>Good data locality </li></ul><ul><li>Fast random reads </li></ul>Regions RegionServers MemStore Column Families HBase 0.20
  52. 54. What Else We Got <ul><li>Transparent replication </li></ul><ul><li>High availability </li></ul><ul><li>MapReduce </li></ul><ul><li>Versioning </li></ul><ul><li>Fast Sequential Reads </li></ul>HDFS No SPOF Input/OutputFormats Column Versions Scanners
  53. 55. HBase @ Streamy Today
  54. 56. HBase @ Streamy Today <ul><li>All data stored in HBase </li></ul><ul><li>Additional caching of hot data </li></ul><ul><li>Query and indexing engines </li></ul><ul><li>MapReduce crawling and analytics </li></ul><ul><li>Zookeeper/Katta/Lucene </li></ul>
  55. 57. HBase @ Streamy Tomorrow <ul><li>Thumbnail media server </li></ul><ul><li>Slave replication for Backup/DR </li></ul><ul><li>More Cascading </li></ul><ul><li>Better Katta integration </li></ul><ul><li>Realtime MapReduce </li></ul>
  56. 58. HBase on a Budget <ul><li>HBase works on cheap nodes </li></ul><ul><ul><ul><li>But you need a cluster (5+ nodes) </li></ul></ul></ul><ul><ul><ul><li>$10k cluster has 10X capacity of $40k node </li></ul></ul></ul><ul><li>Multiple instances on a single cluster </li></ul><ul><li>24/7 clusters + bandwidth != EC2 </li></ul>
  57. 59. Lessons Learned <ul><li>Layer of abstraction helps tremendously </li></ul><ul><ul><ul><li>Internal Streamy Data API </li></ul></ul></ul><ul><ul><ul><li>Storage of serialized types </li></ul></ul></ul><ul><li>Schema design is about reads not writes </li></ul><ul><li>What’s good for HBase is good for Streamy </li></ul>
  58. 60. What’s Next for HBase <ul><li>Inter-cluster / Inter-DC replication </li></ul><ul><ul><ul><li>Slave and Multi-Master </li></ul></ul></ul><ul><li>Master rewrite, more Zookeeper </li></ul><ul><li>Batch operations, HDFS uploader </li></ul><ul><li>No more data loss </li></ul><ul><ul><ul><li>Need HDFS appends </li></ul></ul></ul>
  59. 61. HBase Information <ul><li>Home Page http://hbase.org </li></ul><ul><li>Wiki http://wiki.apache.org/hadoop/Hbase </li></ul><ul><li>Twitter http://twitter.com/hbase </li></ul><ul><li>Freenode IRC #hbase </li></ul><ul><li>Mailing List [email_address] </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×