Intro to HBase Internals & Schema Design (for HBase users)


Published on

This presentations covers the internals of HBase system which HBase users should be aware of.

Published in: Technology
  • For data visualization,data analytics,data intelligence and ERP Tools, online training with job placements, register at
    Are you sure you want to  Yes  No
    Your message goes here
  • one of the best presentations on HBase so far!
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi Arockiaraj,
    You will find useful info on how data is distributed by row keys and how salting helps with RS hotspotting in one of my blog posts here:
    Are you sure you want to  Yes  No
    Your message goes here
  • Can you please explain how exactly hbase chooses a region for a given rowkey for writing?
    To reduce region server hot spotting , row keys has to be salted. How does salting help choose different region servers(i.e. just by appending 0_ or 1_ )?
    Are you sure you want to  Yes  No
    Your message goes here
  • Thank you!

    You can find more links to my presentations and posts around HBase at (e.g. published one about HBase Memstore 2 days ago)
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Intro to HBase Internals & Schema Design (for HBase users)

  1. 1. Intro to HBase Internals & Schema Design (for HBase Users) Alex Baranau, Sematext International, 2012Monday, July 9, 12
  2. 2. About Me Software Engineer at Sematext International @abaranau (abaranau)Monday, July 9, 12
  3. 3. Agenda Logical view Physical view Schema design Other/Advanced topicsMonday, July 9, 12
  4. 4. Why? Why should I (HBase user) care about HBase internals? HBase will not adjust cluster settings to optimal based on usage patterns automatically Schema design, table settings (defined upon creation), etc. depend on HBase implementation aspectsMonday, July 9, 12
  5. 5. Logical ViewMonday, July 9, 12
  6. 6. Logical View: Regions HBase cluster serves multiple tables, distinguished by name Each table contains of rows Each row contains cells: (row key, column family, column, timestamp) -> value Table is split into Regions (table shards, each contains full rows), defined by start and end row keysMonday, July 9, 12
  7. 7. Logical View: Regions are Shards Regions are “atoms of distribution” Each region assigned to single RegionServer (HBase cluster slave) Rows of particular Region served by single RS (cluster slave) Regions are distributed evenly across RSs Region has configurable max size When region reaches max size (or on request) it is split into two smaller regions, which can be assigned to different RSsMonday, July 9, 12
  8. 8. Logical View: Regions on Cluster ZooKeeper ZooKeeper ZooKeeper client HMaster HMaster Region Region RegionServer Region Region Region Region RegionServer RegionServer RegionServer RegionServerMonday, July 9, 12
  9. 9. Logical View: Regions Load It is essential for Regions under the load to be evenly distributed across the cluster It is HBase user’s job to make sure the above is true. Note: even distribution of Regions over cluster doesn’t imply that the load is evenly distributedMonday, July 9, 12
  10. 10. Logical View: Regions Load Take into account that rows are stored in ordered manner Make sure you don’t write rows with sequential keys to avoid RS hotspotting* When writing data with monotonically increasing/decreasing keys, data is written at one RS at a time Use pre-splitting of the table upon creation Starting with single region means using one RS for some time In general, splitting can be expensive Increase max region size * see, July 9, 12
  11. 11. Logical View: Slow RSs When load is distributed evenly, watch for slowest RSs (HBase slaves) Since every region served by single RS, one slow RS can slow down cluster performance e.g. when: data is written into multiple RSs at even pace (random value-based row keys) data is being read from many RSs when doing scanMonday, July 9, 12
  12. 12. Physical ViewMonday, July 9, 12
  13. 13. Physical View: Write/Read Flow HTable client client buffer HTable write read RegionServer Region z Region ... ... Store Store MemStore MemStore (per CF) (per CF) flush HFile HFile ... HFile HFile Write Ahead Log HDFSMonday, July 9, 12
  14. 14. Physical: Speed up Writing Enabling & increasing client-side buffer reduces RPC operations amount warn: possible loss of buffered data in case of client failure; design for failover in case of write failure (networking/server- side issues); can be handled on client Disabling WAL increases write speed warn: possible data loss in case of RS failure Use bulk import functionality (writes HFiles directly, which can be later added to HBase)Monday, July 9, 12
  15. 15. Physical: Memstore Flushes When memstore is flushed N HFiles are created (one per CF) Memstore size which causes flushing is configured on two levels: per RS: % of heap occupied by memstores per table: size in MB of single memstore (per CF) of Region When Region memstores flushes, memstores of all CFs are flushed Uneven data amount between CFs causes too many flushes & creation of too many HFiles (one per CF every time) In most cases having one CF is the best designMonday, July 9, 12
  16. 16. Physical: Memstore Flushes Important: there are Memstore size thresholds which cause writes to be blocked, so slow memstore flushes and overuse of memory by memstore can cause write perf degradation Hint: watch for flush queue size metric on RSs At the same time the more memory memstore uses the better for writing/reading perf (unless it reaches those “write blocking” thresholds)Monday, July 9, 12
  17. 17. Physical: Memstore Flushes Example of good situation * *, July 9, 12
  18. 18. Physical: HFiles Compaction HFiles are periodically compacted into bigger HFiles containing same data Reading from less HFiles faster Important: there’s a configured max number of files in Store which, when reached causes writes to block Hint: watch for compaction queue size metric on RSs read Store MemStore (per CF) HFile HFileMonday, July 9, 12
  19. 19. Physical: Data Locality RSs are usually collocated HDFS with HDFS DataNodes MapReduce HBase RegionServer RegionServer TaskTracker TaskTracker DataNode DataNode Slave Node Slave NodeMonday, July 9, 12
  20. 20. Physical: Data Locality HBase tries to assign Regions to RSs so that Region data stored physically on the same node. But sometimes fails after Region splits there’s no guarantee that there’s a node that has all blocks (HDFS level) of new Region and no guarantee that HBase will not re-assign this Region to different RS in future (even distribution of Regions takes preference over data locality) There’s an ongoing work towards better preserving data localityMonday, July 9, 12
  21. 21. Physical: Data Locality Also, data locality can break when: Adding new slaves to cluster Removing slaves from cluster Incl. node failures Hint: look at networking IO between slaves when writing/reading data, it should be minimal Important: make sure HDFS is well balanced (use balancer tool) try to rebalance Regions in HBase cluster if possible (HBase Master restart will do that) to regain data locality Pre-split table on creation to limit (ideally avoid) splits and regions movement; manage splits manually sometimes helpsMonday, July 9, 12
  22. 22. Schema Design (very briefly)Monday, July 9, 12
  23. 23. Schema: row keys Using row key (or keys range) is the most efficient way to retrieve the data from HBase Row key design is major part of schema design Note: no secondary indices available out of the box Row Key Data ‘login_2012-03-01.00:09:17’ d:{‘user’:‘alex’} ... ... ‘login_2012-03-01.23:59:35’ d:{‘user’:‘otis’} ‘login_2012-03-02.00:00:21’ d:{‘user’:‘david’}Monday, July 9, 12
  24. 24. Schema: row keys Redundancy is OK! warn: changing two rows in HBase is not atomic operation Row Key Data ‘login_2010-01-01.00:09:17’ d:{‘user’:‘alex’} ... ... ‘login_2012-03-01.23:59:35’ d:{‘user’:‘otis’} ‘alex_2010-01-01.00:09:17’ d:{‘action’:‘login’} ... ... ‘otis_2012-03-01.23:59:35’ d:{‘action’:‘login’} ‘alex_login_2010-01-01.00:09:17’ d:{‘device’:’pc’} ... ... ‘otis_login_2012-03-01.23:59:35’ d:{‘device’:‘mobile’}Monday, July 9, 12
  25. 25. Schema: Relations Not relational No joins Denormalization is OK! Use ‘nested entities’ Row Key Data d:{ student_firstname:Alex, student_lastname:Baranau, student professor_math_firstname:David, * ‘student_abaranau’ professor_math_lastname:Smart, * professor_cs_firstname:Jack, professor professor_cs_lastname:Weird, } ‘prof_dsmart’ d:{...}Monday, July 9, 12
  26. 26. Schema: row key/CF/qual size HBase stores cells individually great for “sparse” data row key, CF name and column name stored with each cell which may affect data amount to be stored and managed keep them short serialize and store many values into single cell Row Key Data d:{ s:Alex#Baranau#cs#2009, ‘s_abaranau’ p_math:David#Smart, p_cs:Jack#Weird, }Monday, July 9, 12
  27. 27. Other/Advanced TopicsMonday, July 9, 12
  28. 28. Advanced: Co-Processors CoProcessors API (HBase 0.92.0+) allows to: execute (querying/aggregation/etc.) logic on server side (you may think of it as of stored procedures in RDBMS) perform auditing of actions performed on server-side (you may think of it as of triggers in RDBMS) apply security rules for data access and many more cool stuffMonday, July 9, 12
  29. 29. Other: Use Compression Using compression: reduces data amount to be stored on disks reduces data amount to be transferred when RS reading data not from local replica increases amount of CPU used, but CPU isn’t usually a bottleneck Favor compression speed over compression ratio SNAPPY is good Use wisely: e.g. avoid wasting CPU cycles on compressing images compression can be configured on per CF basis, so storing non-compressible data in separate CF sometimes helps data blocks are uncompressed in memory, avoid this to cause OOME note: when scanning (seeking data to return for scan) many data blocks can be uncompressed even if none of the data will be returned from those blockMonday, July 9, 12
  30. 30. Other: Use Monitoring TBD Ganglia, Cacti, other*, Just use it! *, July 9, 12
  31. 31. Qs? Sematext is hiring!Monday, July 9, 12