H base vs hive   srp vs analytics 2-14-2012
Upcoming SlideShare
Loading in...5
×
 

H base vs hive srp vs analytics 2-14-2012

on

  • 934 views

 

Statistics

Views

Total Views
934
Slideshare-icon Views on SlideShare
931
Embed Views
3

Actions

Likes
3
Downloads
26
Comments
0

1 Embed 3

https://si0.twimg.com 3

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Not about HadaptBe inclusive of beginnersBe brief
  • Not a religious presentation – different systems have different properties that work for different needs
  • 10 GB tpc_h dataCDH3B3 hive and HBaseSingle node desktop workstation, 4 cores, 8GB, a few drives

H base vs hive   srp vs analytics 2-14-2012 H base vs hive srp vs analytics 2-14-2012 Presentation Transcript

  • HBase vs. Hive Philip WicklineChief Technology Officer Hadapt
  • GoalsBrief introduction to the differences between transactional/operational and analytical systemsUnderstand when to use Hive and when to use HBase and why 2
  • Databases 3
  • Datastores 4
  • Differences of Purpose : “Transaction Processing”Operational systems• Optimized for small short random access – reads and writes• E.g. record that an employee invested $100 in a S&P500 index fund in his 401(k) *or* record that a user posted something on another users “wall”Traditional DB examples• Oracle• MySQLNoSQL Examples• HBase• MongoDB• Cassandra 5
  • Differences of Purpose: AnalyticsAnalytics• Optimized for read-only computations about large amounts of data• E.g. compute the average amount invested in bond funds and stock funds for all employees at all employers over the last 5 years 10 5 0 5-10DB Examples Option 1 0-5• Netezza• Vertica 16 14 12 Option 1NoSQL Examples 10 8 Plan Acme 6• Hive Actual GM 4 Newco 2• Pig 0 Oldco Oct Nov Dec Jan Feb Mar Bigcorp 6
  • HBase Data Model : ConceptualFrom the BigTable paper:“a sparse, distributed, persistent multi-dimensional sorted map”(row : bytestring, column family : bytestring, column : bytestring,time : int64) -> byte string 7
  • HBase Map{ ”key_1" : { ”columnfamily_a" : { ”column_i" : { 15 : "y", 4 : "m" }, ”column_ii" : { 15 : "d”, }}, “columnfamily_b" : { ”column_other" : { 6 : "w" 3 : "o" 1 : "w” }}}} 8
  • Hive Data Model : ConceptualTraditional Relational TablesCUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL COMMENT451234 NEWC 196 1 111-555- $1,231,285 NULL ORP Broadway 1212 …887765 ACME 1 Main st. 2 222-555- $46,945 “Top … 1212 customer” 9
  • HBase Data Model : PhysicalEvery cell stored with row, family, column and timestampAllows fast lookup with low copy overheadBUTSpace inefficient (optional compression available) and inefficient to scan “key_1” “cf_a” “c_i” 15 “foo” “key_1” “cf_a” “c_ii” 15 “bar” “key_2” “cf_a” “c_ii” 4 “baz” 10
  • Hive Data Model : PhysicalDepends on the underlying storage filesCan use flat text files, RCFiles, even use HBase for storageStandard Row Storage C_1 C_2 C_3 C_4 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44 51 52 53 54 11
  • Hive Data Model : RCFileBreak into row groups, and then store as columns Row Group 1 C_1 11 21 31 C_2 12 22 32 C_3 13 23 33 C_4 14 24 34 Row Group 2 C_1 41 51 C_2 42 52 C_3 43 53 C_4 44 54 12
  • Informal Performance Comparison Hive HBase Insert Speed batch Fast! Update Speed NA Fast! Lookup speed MR lower bound Fast! (10s of seconds) Data warehouse 15x faster on one Uh oh queries test 13
  • THANK YOU