Rails on HBase

7,901 views
7,691 views

Published on

Zachary Pinter and I gave this talk at RailsConf 2011

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,901
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
77
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Rails on HBase

  1. 1. Rails on HBaseZachary Pinter and Tony HillersonRailsConf 2011
  2. 2. What we will cover
  3. 3. What is it?
  4. 4. What are thetradeoffs that HBase makes?
  5. 5. Why HBase isprobably the wrongchoice for your app
  6. 6. Why HBase might bethe right choice for your app
  7. 7. +How to use HBase with Rails
  8. 8. What we won’t cover
  9. 9. This is not a deep dive
  10. 10. This is not a guide to setting up a production HBase cluster*See http://cloudera.com
  11. 11. So...What is HBase?
  12. 12. HBase is based on Bigtable
  13. 13. HBase Grew out of Hadoop
  14. 14. Core Concepts
  15. 15. Distributed ColumnOriented Database
  16. 16. Tables column key core:linkrow llb0ga http://google.com http://en.oreilly.com/rails2011/public/ llb260 schedule/detail/17886
  17. 17. Column Families core column family stats column family key core:link stats:hits llb0ga http://google.com 100504 http://en.oreilly.com/rails2011/ llb260 2 public/schedule/detail/17886
  18. 18. Column Families Must be defined at creation or added to schema later Should group data accessed and sized similarly
  19. 19. Columns Can be dynamically added given that the family exists Stored as a byte array - do what you need with it
  20. 20. Column Families: Data Locality
  21. 21. Regions master llb0ga lld92q ... ... llbee9 lq1rro Regionserver Regionserver
  22. 22. Main OperationsGetPutIncrDeleteScan
  23. 23. JRuby based shell
  24. 24. hbase shellruby-1.9.2-p180 :002 > status1 servers, 0 dead, 1.0000 average loadruby-1.9.2-p180 :003 > (1..10).each{ status }1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load => 1..10ruby-1.9.2-p180 :004 >
  25. 25. APIRESTThriftApache AvroJava API
  26. 26. What are the tradeoffs thatHBase makes?NoSQL = Tradeoffs
  27. 27. Highly Distributed Pros Cons• Built-in sharding • Makes joins difficult/impossible• No single server holds all the data • Limited real-time queries/sorting• Different servers operate on different slices of the data
  28. 28. Sorted Row KeysBytearray keysAny kind of data you wantSorted in byte orderAll row access through the key
  29. 29. Sorted Row Keys Pros Cons• Real-time random lookups by key • Conflates unique identifier with default• Real-time range queries sort • Keys are an integral part of schema design, requiring extra consideration
  30. 30. Map / Reduce Pros Cons• Scales linearly: more servers = faster • Scans every row per query * response times • Not real-time (batch operation)• Operates on huge datasets (billions of rows)• Works well with dynamic/non-uniform schemas
  31. 31. HBase vs. Xhttp://nosql-database.orghttp://www.sevenweeks.org/Seven Databases in Seven WeeksEric RedmondDon’t Miss His Talk!!
  32. 32. Why HBase isprobably the wrongchoice for your app
  33. 33. HBase Limitations
  34. 34. HBase LimitationsLimited real-time queries (key lookup, range lookup)Limited sorting (one default sort per table)Limited transactionsManual indexingJoins/NormalizationStorage of large binary data *
  35. 35. HBase Requirements• At least 5 Servers (Zookeeper, Region Server, Name Nodes, Data Nodes, etc)• Memory Intensive (2-4gb bar minimum per server, depending on server role)• CPU Intensive• Rough EC2 Cost: 5 c1.xlarge reserved for 1 year = over $850/month (not counting bandwidth!)
  36. 36. Why HBase might bethe right choice for your app
  37. 37. Large data setsAbility to aggregate and analyze billions of rowsStarts to shine when relational databases breakdownIs your db already sharded?Are you using SQL queries that take hours to run?
  38. 38. Hadoop IntegrationHDFS for storing filesNative Map/ReduceSupports data localityDoesn’t require data to be transferred
  39. 39. Flexible schemaSupports non-heterogeneous data through columnfamilies with mixed key/value pairsAbility to refactor the data store via map/reduce(beats a multi-day ALTER TABLE command)
  40. 40. Rails on HBase
  41. 41. hbase-stargatehttps://github.com/greglu/hbase-stargateUses Stargate, HBase REST server
  42. 42. Rhinohttps://github.com/sqs/rhino/Thrift APISeems to be abandoned?
  43. 43. Massive Recordhttps://github.com/CompanyBook/massive_recordIt works!Uses Thrift API
  44. 44. Kwisatz: An URL Shortener
  45. 45. Demo:JRuby Scripts
  46. 46. Demo:Rails Front-End
  47. 47. SourceDemo source codegithub.com/effectiveui/jruby-hbase-demogithub.com/effectiveui/rails-hbase-massive-record
  48. 48. Getting Set Up(Mac) brew install hbaseCloudera VMhttp://bit.ly/dSfPa9
  49. 49. Recap
  50. 50. Where Would I Use HBase?
  51. 51. Where is HBase’s Sweet Spot
  52. 52. Where Would I UseRails With HBase?
  53. 53. Questions?
  54. 54. Thank You!Zachary: @zpinterTony: @thillersonEffectiveUI.com

×