Christopher ShainSoftware Development LeadTresata
   Hadoop for Financial Services   First completely Hadoop-powered analytics    application   Widely recognized as “Big...
   Software Development Lead at Tresata   Background in Financial Services IT     End-User Applications     Data Wareh...
   What is HBase?     From: http://hbase.apache.org:      ▪ “HBase is the Hadoop database.”     I think this is a confu...
   ‘Database’, to many, means:     Transactions     Joins     Indexes   HBase has none of these     More on this later
   HBase is a data storage platform designed to    work hand-in-hand with Hadoop       Distributed       Failure-tolera...
   Need for a low-latency, distributed datastore    with unlimited horizontal scale   Hadoop (MapReduce) doesn’t provide...
   November 2006: Google BigTable whitepaper    published:    http://research.google.com/archive/bigtable.html February ...
   Web Indexing   Social Graph   Messaging (Email etc.)
   HBase is written almost entirely in Java     JVM clients are first-class citizens                                    ...
   All data is stored in Tables   Table rows have exactly one Key, and all rows in a    table are physically ordered by ...
   Defined by the Table   A Column Family is a group of related    columns with it’s own name   All columns must be in ...
      Not exactly the same as rows in a traditional        RDBMS         Key: a byte array (usually a UTF-8 String)     ...
   All cells are created with a timestamp   Column family defines how many versions of    a cell to keep   Updates alwa...
    HBase deletes are a form of write called a     “tombstone”    Indicates that “beyond this point any previously     w...
   Requirement: Store real-time stock tick data     Ticker   Timestamp      Sequence    Bid      Ask      IBM     09:15:0...
Historical Prices:          Keys                   Column             DataType                     Ticker               Va...
Row Key                             Family:Column                                                 Prices:Bid[Ticker].[Rev_...
   HBase scales horizontally   Needs to split data over many RegionServers   Regions are the unit of scale
   All HBase tables are broken into 1 or more    regions   Regions have a start row key and an end row    key   Each Re...
“Users” Table                                         Row Keys in Region                                         “Aaron” –...
Deceptively simple
ZooKeeper                                           Cluster                                                 Backup HBase  ...
   ZooKeeper     Keeps track of which server is the current HBase     Master   HBase Master     Keeps track of Region/...
   RegionServer     Stores table regions   Clients     Need to be smarter than RDBMS clients     First connect to Zoo...
-ROOT- Table                 info:regioninfo.META.[region]   info:server               Points to DataNode hosting         ...
   HBase Master is not necessarily a single point of    failure (SPOF)     Multiple masters can be running     Current ...
ZooKeeper Quorum                 ZooKeeper                   Node     ZooKeeper               ZooKeeper       Node        ...
   HBase tolerates RegionServer failure when    running on HDFS     Data is replicated by HDFS (dfs.replication setting)...
    Similar to log in many RDBMS    All operations by default written to log before considered     ‘committed’ (can be o...
RegionServer Region                  Store                         Store                     Store                        ...
   A RegionServer is not guaranteed to be on    the same physical node as it’s data   Compaction causes RegionServer to ...
   All data is in memory initially (memstore)   HBase is a write-only system     Modifications and deletes are just wri...
   All HBase edits are initially stored in memory    (memstore)   Flushes occur when memstore reaches a    certain size ...
   Triggered when a certain number of HFiles are created for    a given Region Store (+ some other conditions)     By de...
   Triggered every 24 hours (with random    offset) or manually     Large HBase installations usually leave this for    ...
   HBase does not have transactions   However:     Row-level modifications are atomic: All      modifications to a row ...
   DO: Design your schema for linear range scans on your    most common queries.     Scans are the most efficient way to...
   DO: Be mindful of the size of your row and column    keys     They are used in indexes and queries, can be quite     ...
   DO: Query single rows using exact-match on key    (Gets) or Scans for multiple rows     Scans allow efficient I/O vs....
   Deceptively simple                                    HBase Master                   JVM Clients      RegionServer    ...
ZooKeeper Quorum                 ZooKeeper                   Node     ZooKeeper               ZooKeeper       Node        ...
RegionServer Region                  Store                         Store                      Store                       ...
   Requirement: Store an arbitrary set of    preferences for all users   Requirement: Each user may choose to store    a...
   One possible RDBMS solution:     Key/Value table     All values as strings     Flexible, but wastes spaceKeys:     ...
   Store all preferences in the Preferences    column family   Preference name as column name,    preference value as (s...
TriHUG January 2012 Talk by Chris Shain
Upcoming SlideShare
Loading in …5
×

TriHUG January 2012 Talk by Chris Shain

3,617 views

Published on

Intro to Apache HBase

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,617
On SlideShare
0
From Embeds
0
Number of Embeds
1,003
Actions
Shares
0
Downloads
71
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

TriHUG January 2012 Talk by Chris Shain

  1. 1. Christopher ShainSoftware Development LeadTresata
  2. 2.  Hadoop for Financial Services First completely Hadoop-powered analytics application Widely recognized as “Big Data Startup to Watch” Winner of the 2011 NCTA Award for Emerging Tech Company of the Year Based in Charlotte NC We are hiring! curious@tresata.com
  3. 3.  Software Development Lead at Tresata Background in Financial Services IT  End-User Applications  Data Warehousing  ETL Email: chris@tresata.com Twitter: @chrisshain
  4. 4.  What is HBase?  From: http://hbase.apache.org: ▪ “HBase is the Hadoop database.”  I think this is a confusing statement
  5. 5.  ‘Database’, to many, means:  Transactions  Joins  Indexes HBase has none of these  More on this later
  6. 6.  HBase is a data storage platform designed to work hand-in-hand with Hadoop  Distributed  Failure-tolerant  Semi-structured  Low latency  Strictly consistent  HDFS-Aware  “NoSQL”
  7. 7.  Need for a low-latency, distributed datastore with unlimited horizontal scale Hadoop (MapReduce) doesn’t provide low- latency Traditional RDBMS don’t scale out horizontally
  8. 8.  November 2006: Google BigTable whitepaper published: http://research.google.com/archive/bigtable.html February 2007: Initial HBase Prototype October 2007: First ‘usable’ HBase January 2008: HBase becomes Apache subproject of Hadoop March 2009: HBase 0.20.0 May 10th, 2010: HBase becomes Apache Top Level Project
  9. 9.  Web Indexing Social Graph Messaging (Email etc.)
  10. 10.  HBase is written almost entirely in Java  JVM clients are first-class citizens HBase Master RegionServer JVM Clients RegionServer Non-JVM Proxy RegionServer Clients (Thrift or REST) RegionServer
  11. 11.  All data is stored in Tables Table rows have exactly one Key, and all rows in a table are physically ordered by key Tables have a fixed number of Column Families (more on this later!) Each row can have many Columns in each column family Each column has a set of values, each with a timestamp Each row:family:column:timestamp combination represents coordinates for a Cell
  12. 12.  Defined by the Table A Column Family is a group of related columns with it’s own name All columns must be in a column family Each row can have a completely different set of columns for a column familyRow: Column Family: Columns:Chris Friends:BobBob Friends Friends:Chris Friends:JamesJames Friends:Bob
  13. 13.  Not exactly the same as rows in a traditional RDBMS  Key: a byte array (usually a UTF-8 String)  Data: Cells, qualified by column family, column, and timestamp (not shown here)Row Key: Column Families : Columns: Cells: (Defined by the (Defined by the Row) (Created with Columns) Table) (May vary between rows) Attributes Attributes:Age 30 Attributes:Height 68Chris Friends Friends:Bob 1 (Bob’s a cool guy) Friends:Jane 0 (Jane and I don’t get along)
  14. 14.  All cells are created with a timestamp Column family defines how many versions of a cell to keep Updates always create a new cell Deletes create a tombstone (more on that later) Queries can include an “as-of” timestamp to return point-in-time values
  15. 15.  HBase deletes are a form of write called a “tombstone” Indicates that “beyond this point any previously written value is dead” Old values can still be read using point-in-time queries Timestamp Write Type Resulting Value Point-In-Time Value “as of” T+1 T+0 PUT (“Foo”) “Foo” “Foo” T+1 PUT (“Bar”) “Bar” “Bar” T+2 DELETE <none> “Bar” T+3 PUT (“Foo Too”) “Foo Too” “Bar”
  16. 16.  Requirement: Store real-time stock tick data Ticker Timestamp Sequence Bid Ask IBM 09:15:03:001 1 179.16 179.18 MSFT 09:15:04:112 2 28.25 28.27 GOOG 09:15:04:114 3 624.94 624.99 IBM 09:15:04:155 4 179.18 179.19 Requirement: Accommodate many simultaneous readers & writers Requirement: Allow for reading of current price for any ticker at any point in time
  17. 17. Historical Prices: Keys Column DataType Ticker Varchar PK Timestamp DateTime Sequence_Number Integer Bid_Price Decimal Ask_Price DecimalLatest Prices: Keys Column DataType PK Ticker Varchar Bid_Price Decimal Ask_Price Decimal
  18. 18. Row Key Family:Column Prices:Bid[Ticker].[Rev_Timestamp].[Rev_Sequence_Number] Prices:Ask HBase throughput will scale linearly with # of nodes No need to keep separate “latest price” table  A scan starting at “ticker” will always return the latest price row
  19. 19.  HBase scales horizontally Needs to split data over many RegionServers Regions are the unit of scale
  20. 20.  All HBase tables are broken into 1 or more regions Regions have a start row key and an end row key Each Region lives on exactly one RegionServer RegionServers may host many Regions When RegionServers die, Master detects this and assigns Regions to other RegionServers
  21. 21. “Users” Table Row Keys in Region “Aaron” – “George”-META- Table “Aaron” Region “Bob”Table Region Server “Chris” “Aaron” – “George” Node01 Row Keys in RegionUsers “George” – “Matthew” Node02 “George” – “Matthew” “Matthew” – “Zachary” Node01 “George” Row Keys in Region “Matthew” – “Zachary” “Matthew” “Nancy” “Zachary”
  22. 22. Deceptively simple
  23. 23. ZooKeeper Cluster Backup HBase HBase Master Master JVM Clients RegionServer RegionServer RegionServerNon-JVM Proxy Clients (Thrift or REST) RegionServer
  24. 24.  ZooKeeper  Keeps track of which server is the current HBase Master HBase Master  Keeps track of Region/RegionServer mapping  Manages the -ROOT- and .META. tables  Responsible for updating ZooKeeper when these change
  25. 25.  RegionServer  Stores table regions Clients  Need to be smarter than RDBMS clients  First connect to ZooKeeper to get RegionServer for a given Table/Region  Then connect directly to RegionServer to interact with the data  All connections over Hadoop RPC – non-JVM clients use proxy (Thrift or REST (Stargate))
  26. 26. -ROOT- Table info:regioninfo.META.[region] info:server Points to DataNode hosting .META. region.… info:serverstartcode .META. Table info:regioninfo [table],[region start key],[region id] info:server Points to DataNode info:serverstartcode hosting table region. Regular User Table … whatever … …
  27. 27.  HBase Master is not necessarily a single point of failure (SPOF)  Multiple masters can be running  Current ‘active’ Master controlled via ZooKeeper  Make sure you have enough ZooKeeper nodes! Master is not needed for client connectivity  Clients connect directly to ZooKeeper to find Regions  Everything Master does can be put off until one is elected
  28. 28. ZooKeeper Quorum ZooKeeper Node ZooKeeper ZooKeeper Node Node HBase HBase HBase Master Master Master(Current) (Standby) (Standby)
  29. 29.  HBase tolerates RegionServer failure when running on HDFS  Data is replicated by HDFS (dfs.replication setting)  Lots of issues around fsync, failure before data is flushed - some probably still not fixed  Thus, data can still be lost if node fails after a write HDFS NameNode is still SPOF, even for HBase
  30. 30.  Similar to log in many RDBMS All operations by default written to log before considered ‘committed’ (can be overridden for ‘disposable fast writes’) Log can be replayed when region is moved to another RegionServer One WAL per RegionServer Flushed periodically (10s by default) WAL HFile Writes MemStore HFile Flushed when MemStore gets too big
  31. 31. RegionServer Region Store Store Store MemStore MemStore MemStore StoreFile StoreFile StoreFile Log HFile HFile HFileHDFS Client Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block BlockHDFS DataNode HDFS DataNode HDFS DataNode HDFS DataNode
  32. 32.  A RegionServer is not guaranteed to be on the same physical node as it’s data Compaction causes RegionServer to write preferentially to local node  But this is a function of HDFS Client, not HBase
  33. 33.  All data is in memory initially (memstore) HBase is a write-only system  Modifications and deletes are just writes with later timestamps  Function of HDFS being append-only Eventually old writes need to be discarded 2 Types of Compactions:  Minor  Major
  34. 34.  All HBase edits are initially stored in memory (memstore) Flushes occur when memstore reaches a certain size  By default 67,108,864 bytes  Controlled by hbase.hregion.memstore.flush.size configuration property Each flush creates a new HFile
  35. 35.  Triggered when a certain number of HFiles are created for a given Region Store (+ some other conditions)  By default 3 HFiles  Controlled by hbase.hstore.compactionThreshold configuration property Compacts most recent HFiles into one  By default, uses RegionServer-local HDFS node Does not eliminate deletes  Only touches most recent HFiles NOTE: All column families are compacted at once (this might change in the future)
  36. 36.  Triggered every 24 hours (with random offset) or manually  Large HBase installations usually leave this for manual operators Re-writes all HFiles into one Processes deletes  Eliminates tombstones  Erases earlier entries
  37. 37.  HBase does not have transactions However:  Row-level modifications are atomic: All modifications to a row will succeed or fail as a unit  Gets are consistent for a given point in time ▪ But Scans may return 2 rows from different points in time  All data read has been ‘durably stored’ ▪ Does NOT mean flushed to disk- can still be lost!
  38. 38.  DO: Design your schema for linear range scans on your most common queries.  Scans are the most efficient way to query a lot of rows quickly DON’T: Use more than 2 or 3 column families.  Some operations (flushing and compacting) operate on the whole row DO: Be aware of the relative cardinality of column families  Wildly differing cardinality leads to sparsity and bad scanning results.
  39. 39.  DO: Be mindful of the size of your row and column keys  They are used in indexes and queries, can be quite large! DON’T: Use monotonically increasing row keys  Can lead to hotspots on writes DO: Store timestamp keys in reverse  Rows in a table need to be read in order, usually you want most recent
  40. 40.  DO: Query single rows using exact-match on key (Gets) or Scans for multiple rows  Scans allow efficient I/O vs. multiple gets DON’T: Use regex-based or non-prefix column filters  Very inefficient DO: Tune the scan cache and batch size parameters  Drastically improves performance when returning lots of rows
  41. 41.  Deceptively simple HBase Master JVM Clients RegionServer RegionServer Non-JVM Proxy RegionServer Clients (Thrift or REST) RegionServer
  42. 42. ZooKeeper Quorum ZooKeeper Node ZooKeeper ZooKeeper Node Node HBase HBase HBase Master Master Master(Current) (Standby) (Standby)
  43. 43. RegionServer Region Store Store Store MemStore MemStore MemStore StoreFile StoreFile StoreFile Log HFile HFile HFileHDFS Client Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block BlockHDFS DataNode HDFS DataNode HDFS DataNode HDFS DataNode
  44. 44.  Requirement: Store an arbitrary set of preferences for all users Requirement: Each user may choose to store a different set of preferences Requirement: Preferences may be of different data types (Strings, Integers, etc) Requirement: Developers will add new preference options all the time, so we shouldn’t need to modify the database structure when adding them
  45. 45.  One possible RDBMS solution:  Key/Value table  All values as strings  Flexible, but wastes spaceKeys: Column: Data Type: UserID Int PK PreferenceName Varchar PreferenceValue Varchar
  46. 46.  Store all preferences in the Preferences column family Preference name as column name, preference value as (serialized) byte array:  HBase client library provides methods for serializing many common data typesRow Key: Family: Column: Value: Age 30 Chris Preferences Hometown “Mineola, NY” Joe Preferences Birthdate 11/13/1987

×