Rails on HBase

1,423 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,423
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
33
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • T\n
  • T\n
  • T\n
  • T\n
  • T\n
  • T\n
  • T\n
  • T\n
  • T\n
  • T\n
  • T\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • T\n
  • T\n
  • T\nDo you already have a Hadoop cluster?\nhttp://whynosql.com/cassandra-vs-hbase/\nCassandra requires data to be first transferred into hadoop cluster before map/reduce\n
  • T\n
  • T\n
  • T\n
  • T\n
  • T .... Or forego the ORM layer all together\n
  • T\n
  • T\n
  • T\n
  • T\n
  • Z\n
  • Z\n
  • Z\n
  • Z\n
  • \n
  • \n
  • Rails on HBase

    1. 1. Rails on HBaseZachary Pinter and Tony HillersonRailsConf 2011
    2. 2. What we will cover
    3. 3. What is it?
    4. 4. What are thetradeoffs that HBase makes?
    5. 5. Why HBase isprobably the wrongchoice for your app
    6. 6. Why HBase might bethe right choice for your app
    7. 7. +How to use HBase with Rails
    8. 8. What we won’t cover
    9. 9. This is not a deep dive
    10. 10. This is not a guide to setting up a production HBase cluster*See http://cloudera.com
    11. 11. So...What is HBase?
    12. 12. HBase is based on Bigtable
    13. 13. HBase Grew out of Hadoop
    14. 14. Core Concepts
    15. 15. Distributed ColumnOriented Database
    16. 16. Tables column key core:linkrow llb0ga http://google.com http://en.oreilly.com/rails2011/public/ llb260 schedule/detail/17886
    17. 17. Column Families core column family stats column family key core:link stats:hits llb0ga http://google.com 100504 http://en.oreilly.com/rails2011/ llb260 2 public/schedule/detail/17886
    18. 18. Column Families Must be defined at creation or added to schema later Should group data accessed and sized similarly
    19. 19. Columns Can be dynamically added given that the family exists Stored as a byte array - do what you need with it
    20. 20. Column Families: Data Locality
    21. 21. Regions master llb0ga lld92q ... ... llbee9 lq1rro Regionserver Regionserver
    22. 22. Main OperationsGetPutIncrDeleteScan
    23. 23. JRuby based shell
    24. 24. hbase shellruby-1.9.2-p180 :002 > status1 servers, 0 dead, 1.0000 average loadruby-1.9.2-p180 :003 > (1..10).each{ status }1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load1 servers, 0 dead, 3.0000 average load=> 1..10ruby-1.9.2-p180 :004 >
    25. 25. APIRESTThriftApache AvroJava API
    26. 26. What are the tradeoffs thatHBase makes?NoSQL = Tradeoffs
    27. 27. Highly Distributed Pros Cons• Built-in sharding • Makes joins difficult/impossible• No single server holds all the data • Limited real-time queries/sorting• Different servers operate on different slices of the data
    28. 28. Sorted Row KeysBytearray keysAny kind of data you wantSorted in byte orderAll row access through the key
    29. 29. Sorted Row Keys Pros Cons• Real-time random lookups by key • Conflates unique identifier with default• Real-time range queries sort • Keys are an integral part of schema design, requiring extra consideration
    30. 30. Map / Reduce Pros Cons• Scales linearly: more servers = faster • Scans every row per query * response times • Not real-time (batch operation)• Operates on huge datasets (billions of rows)• Works well with dynamic/non-uniform schemas
    31. 31. HBase vs. Xhttp://nosql-database.orghttp://www.sevenweeks.org/Seven Databases in Seven WeeksEric RedmondDon’t Miss His Talk!!
    32. 32. Why HBase isprobably the wrongchoice for your app
    33. 33. HBase Limitations
    34. 34. HBase LimitationsLimited real-time queries (key lookup, rangelookup)Limited sorting (one default sort per table)Limited transactionsManual indexingJoins/NormalizationStorage of large binary data *
    35. 35. HBase Requirements• At least 5 Servers (Zookeeper, Region Server, Name Nodes, Data Nodes, etc)• Memory Intensive (2-4gb bar minimum per server, depending on server role)• CPU Intensive• Rough EC2 Cost: 5 c1.xlarge reserved for 1 year = over $850/month (not counting bandwidth!)
    36. 36. Why HBase might bethe right choice for your app
    37. 37. Large data setsAbility to aggregate and analyze billions of rowsStarts to shine when relational databases breakdownIs your db already sharded?Are you using SQL queries that take hours to run?
    38. 38. Hadoop IntegrationHDFS for storing filesNative Map/ReduceSupports data localityDoesn’t require data to be transferred
    39. 39. Flexible schemaSupports non-heterogeneous data through columnfamilies with mixed key/value pairsAbility to refactor the data store via map/reduce(beats a multi-day ALTER TABLE command)
    40. 40. Rails on HBase
    41. 41. hbase-stargatehttps://github.com/greglu/hbase-stargateUses Stargate, HBase REST server
    42. 42. Rhinohttps://github.com/sqs/rhino/Thrift APISeems to be abandoned?
    43. 43. Massive Recordhttps://github.com/CompanyBook/massive_recordIt works!Uses Thrift API
    44. 44. Kwisatz: An URL Shortener
    45. 45. Demo
    46. 46. SourceDemo source codegithub.com/effectiveui/jruby-hbase-demogithub.com/effectiveui/rails-hbase-massive-record
    47. 47. Getting Set Up(Mac) brew install hbaseCloudera VMhttp://bit.ly/dSfPa9
    48. 48. Recap
    49. 49. Where Would I Use HBase?
    50. 50. Where is HBase’s Sweet Spot
    51. 51. Where Would I UseRails With HBase?
    52. 52. Questions?
    53. 53. Thank You!Zachary: @zpinterTony: @thillersonEffectiveUI.comhttp://www.slideshare.net/thillerson/rails-on-hbasehttp://github.com/effectiveui/*hbase*

    ×