Your SlideShare is downloading. ×
Rails on HBaseZachary Pinter and Tony HillersonRailsConf 2011
What we will cover
What is it?
What are thetradeoffs that HBase      makes?
Why HBase isprobably the wrongchoice for your app
Why HBase might bethe right choice for      your app
+How to use HBase   with Rails
What we won’t cover
This is not a deep        dive
This is not a guide to    setting up a production HBase        cluster*See http://cloudera.com
So...What is HBase?
HBase is based on    Bigtable
HBase Grew out of    Hadoop
Core Concepts
Distributed ColumnOriented Database
Tables                               column          key                    core:linkrow         llb0ga              http:...
Column Families           core column family            stats column family   key                   core:link             ...
Column Families   Must be defined at creation or added to schema   later Should group data accessed and sized similarly
Columns   Can be dynamically added given that the family   exists Stored as a byte array - do what you need with it
Column Families: Data Locality
Regions                       master                   llb0ga   lld92q                      ...      ...                  ...
Main OperationsGetPutIncrDeleteScan
JRuby based shell
hbase shellruby-1.9.2-p180 :002 > status1 servers, 0 dead, 1.0000 average loadruby-1.9.2-p180 :003 > (1..10).each{ status ...
APIRESTThriftApache AvroJava API
What are the tradeoffs thatHBase makes?NoSQL = Tradeoffs
Highly Distributed                       Pros                                   Cons• Built-in sharding                   ...
Sorted Row KeysBytearray keysAny kind of data you wantSorted in byte orderAll row access through the key
Sorted Row Keys                 Pros                                 Cons• Real-time random lookups by key   • Conflates un...
Map / Reduce                   Pros                                       Cons• Scales linearly: more servers = faster   •...
HBase vs. Xhttp://nosql-database.orghttp://www.sevenweeks.org/Seven Databases in Seven WeeksEric RedmondDon’t Miss His Tal...
Why HBase isprobably the wrongchoice for your app
HBase Limitations
HBase LimitationsLimited real-time queries (key lookup, range lookup)Limited sorting (one default sort per table)Limited t...
HBase Requirements• At least 5 Servers (Zookeeper, Region Server, Name Nodes, Data Nodes, etc)• Memory Intensive (2-4gb ba...
Why HBase might bethe right choice for      your app
Large data setsAbility to aggregate and analyze billions of rowsStarts to shine when relational databases breakdownIs your...
Hadoop IntegrationHDFS for storing filesNative Map/ReduceSupports data localityDoesn’t require data to be transferred
Flexible schemaSupports non-heterogeneous data through columnfamilies with mixed key/value pairsAbility to refactor the da...
Rails on HBase
hbase-stargatehttps://github.com/greglu/hbase-stargateUses Stargate, HBase REST server
Rhinohttps://github.com/sqs/rhino/Thrift APISeems to be abandoned?
Massive Recordhttps://github.com/CompanyBook/massive_recordIt works!Uses Thrift API
Kwisatz: An URL  Shortener
Demo:JRuby Scripts
Demo:Rails Front-End
SourceDemo source codegithub.com/effectiveui/jruby-hbase-demogithub.com/effectiveui/rails-hbase-massive-record
Getting Set Up(Mac) brew install hbaseCloudera VMhttp://bit.ly/dSfPa9
Recap
Where Would I Use    HBase?
Where is HBase’s  Sweet Spot
Where Would I UseRails With HBase?
Questions?
Thank You!Zachary: @zpinterTony: @thillersonEffectiveUI.com
Upcoming SlideShare
Loading in...5
×

Rails on HBase

2,252

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,252
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Rails on HBase"

  1. 1. Rails on HBaseZachary Pinter and Tony HillersonRailsConf 2011
  2. 2. What we will cover
  3. 3. What is it?
  4. 4. What are thetradeoffs that HBase makes?
  5. 5. Why HBase isprobably the wrongchoice for your app
  6. 6. Why HBase might bethe right choice for your app
  7. 7. +How to use HBase with Rails
  8. 8. What we won’t cover
  9. 9. This is not a deep dive
  10. 10. This is not a guide to setting up a production HBase cluster*See http://cloudera.com
  11. 11. So...What is HBase?
  12. 12. HBase is based on Bigtable
  13. 13. HBase Grew out of Hadoop
  14. 14. Core Concepts
  15. 15. Distributed ColumnOriented Database
  16. 16. Tables column key core:linkrow llb0ga http://google.com http://en.oreilly.com/rails2011/public/ llb260 schedule/detail/17886
  17. 17. Column Families core column family stats column family key core:link stats:hits llb0ga http://google.com 100504 http://en.oreilly.com/rails2011/ llb260 2 public/schedule/detail/17886
  18. 18. Column Families Must be defined at creation or added to schema later Should group data accessed and sized similarly
  19. 19. Columns Can be dynamically added given that the family exists Stored as a byte array - do what you need with it
  20. 20. Column Families: Data Locality
  21. 21. Regions master llb0ga lld92q ... ... llbee9 lq1rro Regionserver Regionserver
  22. 22. Main OperationsGetPutIncrDeleteScan
  23. 23. JRuby based shell
  24. 24. hbase shellruby-1.9.2-p180 :002 > status1 servers, 0 dead, 1.0000 average loadruby-1.9.2-p180 :003 > (1..10).each{ status }1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load => 1..10ruby-1.9.2-p180 :004 >
  25. 25. APIRESTThriftApache AvroJava API
  26. 26. What are the tradeoffs thatHBase makes?NoSQL = Tradeoffs
  27. 27. Highly Distributed Pros Cons• Built-in sharding • Makes joins difficult/impossible• No single server holds all the data • Limited real-time queries/sorting• Different servers operate on different slices of the data
  28. 28. Sorted Row KeysBytearray keysAny kind of data you wantSorted in byte orderAll row access through the key
  29. 29. Sorted Row Keys Pros Cons• Real-time random lookups by key • Conflates unique identifier with default• Real-time range queries sort • Keys are an integral part of schema design, requiring extra consideration
  30. 30. Map / Reduce Pros Cons• Scales linearly: more servers = faster • Scans every row per query * response times • Not real-time (batch operation)• Operates on huge datasets (billions of rows)• Works well with dynamic/non-uniform schemas
  31. 31. HBase vs. Xhttp://nosql-database.orghttp://www.sevenweeks.org/Seven Databases in Seven WeeksEric RedmondDon’t Miss His Talk!!
  32. 32. Why HBase isprobably the wrongchoice for your app
  33. 33. HBase Limitations
  34. 34. HBase LimitationsLimited real-time queries (key lookup, range lookup)Limited sorting (one default sort per table)Limited transactionsManual indexingJoins/NormalizationStorage of large binary data *
  35. 35. HBase Requirements• At least 5 Servers (Zookeeper, Region Server, Name Nodes, Data Nodes, etc)• Memory Intensive (2-4gb bar minimum per server, depending on server role)• CPU Intensive• Rough EC2 Cost: 5 c1.xlarge reserved for 1 year = over $850/month (not counting bandwidth!)
  36. 36. Why HBase might bethe right choice for your app
  37. 37. Large data setsAbility to aggregate and analyze billions of rowsStarts to shine when relational databases breakdownIs your db already sharded?Are you using SQL queries that take hours to run?
  38. 38. Hadoop IntegrationHDFS for storing filesNative Map/ReduceSupports data localityDoesn’t require data to be transferred
  39. 39. Flexible schemaSupports non-heterogeneous data through columnfamilies with mixed key/value pairsAbility to refactor the data store via map/reduce(beats a multi-day ALTER TABLE command)
  40. 40. Rails on HBase
  41. 41. hbase-stargatehttps://github.com/greglu/hbase-stargateUses Stargate, HBase REST server
  42. 42. Rhinohttps://github.com/sqs/rhino/Thrift APISeems to be abandoned?
  43. 43. Massive Recordhttps://github.com/CompanyBook/massive_recordIt works!Uses Thrift API
  44. 44. Kwisatz: An URL Shortener
  45. 45. Demo:JRuby Scripts
  46. 46. Demo:Rails Front-End
  47. 47. SourceDemo source codegithub.com/effectiveui/jruby-hbase-demogithub.com/effectiveui/rails-hbase-massive-record
  48. 48. Getting Set Up(Mac) brew install hbaseCloudera VMhttp://bit.ly/dSfPa9
  49. 49. Recap
  50. 50. Where Would I Use HBase?
  51. 51. Where is HBase’s Sweet Spot
  52. 52. Where Would I UseRails With HBase?
  53. 53. Questions?
  54. 54. Thank You!Zachary: @zpinterTony: @thillersonEffectiveUI.com

×