Rails on HBase
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Rails on HBase

on

  • 7,930 views

Zachary Pinter and I gave this talk at RailsConf 2011

Zachary Pinter and I gave this talk at RailsConf 2011

Statistics

Views

Total Views
7,930
Views on SlideShare
7,710
Embed Views
220

Actions

Likes
5
Downloads
68
Comments
0

11 Embeds 220

http://en.oreilly.com 112
http://zacharypinter.com 63
http://localhost 32
http://paper.li 4
http://us-w1.rockmelt.com 2
http://feeds.feedburner.com 2
http://localhost:4000 1
http://www.slideshare.net 1
http://twitter.com 1
url_unknown 1
https://twitter.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Rails on HBase Presentation Transcript

  • 1. Rails on HBaseZachary Pinter and Tony HillersonRailsConf 2011
  • 2. What we will cover
  • 3. What is it?
  • 4. What are thetradeoffs that HBase makes?
  • 5. Why HBase isprobably the wrongchoice for your app
  • 6. Why HBase might bethe right choice for your app
  • 7. +How to use HBase with Rails
  • 8. What we won’t cover
  • 9. This is not a deep dive
  • 10. This is not a guide to setting up a production HBase cluster*See http://cloudera.com
  • 11. So...What is HBase?
  • 12. HBase is based on Bigtable
  • 13. HBase Grew out of Hadoop
  • 14. Core Concepts
  • 15. Distributed ColumnOriented Database
  • 16. Tables column key core:linkrow llb0ga http://google.com http://en.oreilly.com/rails2011/public/ llb260 schedule/detail/17886
  • 17. Column Families core column family stats column family key core:link stats:hits llb0ga http://google.com 100504 http://en.oreilly.com/rails2011/ llb260 2 public/schedule/detail/17886
  • 18. Column Families Must be defined at creation or added to schema later Should group data accessed and sized similarly
  • 19. Columns Can be dynamically added given that the family exists Stored as a byte array - do what you need with it
  • 20. Column Families: Data Locality
  • 21. Regions master llb0ga lld92q ... ... llbee9 lq1rro Regionserver Regionserver
  • 22. Main OperationsGetPutIncrDeleteScan
  • 23. JRuby based shell
  • 24. hbase shellruby-1.9.2-p180 :002 > status1 servers, 0 dead, 1.0000 average loadruby-1.9.2-p180 :003 > (1..10).each{ status }1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load => 1..10ruby-1.9.2-p180 :004 >
  • 25. APIRESTThriftApache AvroJava API
  • 26. What are the tradeoffs thatHBase makes?NoSQL = Tradeoffs
  • 27. Highly Distributed Pros Cons• Built-in sharding • Makes joins difficult/impossible• No single server holds all the data • Limited real-time queries/sorting• Different servers operate on different slices of the data
  • 28. Sorted Row KeysBytearray keysAny kind of data you wantSorted in byte orderAll row access through the key
  • 29. Sorted Row Keys Pros Cons• Real-time random lookups by key • Conflates unique identifier with default• Real-time range queries sort • Keys are an integral part of schema design, requiring extra consideration
  • 30. Map / Reduce Pros Cons• Scales linearly: more servers = faster • Scans every row per query * response times • Not real-time (batch operation)• Operates on huge datasets (billions of rows)• Works well with dynamic/non-uniform schemas
  • 31. HBase vs. Xhttp://nosql-database.orghttp://www.sevenweeks.org/Seven Databases in Seven WeeksEric RedmondDon’t Miss His Talk!!
  • 32. Why HBase isprobably the wrongchoice for your app
  • 33. HBase Limitations
  • 34. HBase LimitationsLimited real-time queries (key lookup, range lookup)Limited sorting (one default sort per table)Limited transactionsManual indexingJoins/NormalizationStorage of large binary data *
  • 35. HBase Requirements• At least 5 Servers (Zookeeper, Region Server, Name Nodes, Data Nodes, etc)• Memory Intensive (2-4gb bar minimum per server, depending on server role)• CPU Intensive• Rough EC2 Cost: 5 c1.xlarge reserved for 1 year = over $850/month (not counting bandwidth!)
  • 36. Why HBase might bethe right choice for your app
  • 37. Large data setsAbility to aggregate and analyze billions of rowsStarts to shine when relational databases breakdownIs your db already sharded?Are you using SQL queries that take hours to run?
  • 38. Hadoop IntegrationHDFS for storing filesNative Map/ReduceSupports data localityDoesn’t require data to be transferred
  • 39. Flexible schemaSupports non-heterogeneous data through columnfamilies with mixed key/value pairsAbility to refactor the data store via map/reduce(beats a multi-day ALTER TABLE command)
  • 40. Rails on HBase
  • 41. hbase-stargatehttps://github.com/greglu/hbase-stargateUses Stargate, HBase REST server
  • 42. Rhinohttps://github.com/sqs/rhino/Thrift APISeems to be abandoned?
  • 43. Massive Recordhttps://github.com/CompanyBook/massive_recordIt works!Uses Thrift API
  • 44. Kwisatz: An URL Shortener
  • 45. Demo:JRuby Scripts
  • 46. Demo:Rails Front-End
  • 47. SourceDemo source codegithub.com/effectiveui/jruby-hbase-demogithub.com/effectiveui/rails-hbase-massive-record
  • 48. Getting Set Up(Mac) brew install hbaseCloudera VMhttp://bit.ly/dSfPa9
  • 49. Recap
  • 50. Where Would I Use HBase?
  • 51. Where is HBase’s Sweet Spot
  • 52. Where Would I UseRails With HBase?
  • 53. Questions?
  • 54. Thank You!Zachary: @zpinterTony: @thillersonEffectiveUI.com