• Save
Hadoop World 2011: Advanced HBase Schema Design

Like this? Share it with your network


Hadoop World 2011: Advanced HBase Schema Design



While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per ...

While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second.
This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns.



Total Views
Views on SlideShare
Embed Views



46 Embeds 8,795

http://java.dzone.com 3000
http://www.cloudera.com 1818
http://nosql.mypopescu.com 1015
http://www.ctrl-r.org 973
http://d.hatena.ne.jp 753
http://play.daumcorp.com 386
http://www.scoop.it 328
http://wiki-jp.nhncorp.com 248
http://www.bigdatacloud.com 57
http://blog.cloudera.com 31
http://tumblr.hootsuite.com 27
http://feeds.feedburner.com 25
http://pinterest.com 19
http://www.newsblur.com 15
http://architects.dzone.com 13
http://www.hanrss.com 9
http://www.sozialpapier.com 9
http://www.pinterest.com 7
http://ornl.socialtext.net 6
http://wiki.linecorp.com 6
http://www.samsung.net 5
http://www.tatoglu.net 4
http://sozialpapier.com 4
http://webcache.googleusercontent.com 3
http://translate.googleusercontent.com 3
http://www.crowdlens.com 2
https://twitter.com 2
http://wiki.daumkakao.com 2
http://a0.twimg.com 2
http://www.dzone.com 2
http://tatoglu.net 2
http://search.daum.net 2
https://www.cloudera.com 2
http://feedproxy.google.com 2
http://twimblr.appspot.com 2
http://www.google.fr 1
http://dschool.co 1 1
http://cloudera.matt.dev 1 1
http://paper.li 1
http://twitter.com 1
http://cafe.naver.com 1
https://si0.twimg.com 1
http://ctrl-r.org. 1
http://pavan.pinterdev.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Why can we download the PDF/PPT...?
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Hadoop World 2011: Advanced HBase Schema Design Presentation Transcript

  • 1. November 2011 – Hadoop World NYCAdvanced HBase Schema DesignLars George, Solutions Architect
  • 2. Agenda1 Intro to HBase Architecture2 Schema Design3 Examples4 Wrap up2 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 3. About Me• Solutions Architect @ Cloudera• Apache HBase & Whirr Committer• Author of HBase – The Definitive Guide• Working with HBase since end of 2007• Organizer of the Munich OpenHUG• Speaker at Conferences (Fosdem, Hadoop World)
  • 4. Overview• Schema design is vital• Needs to be done eventually – Same for RDBMS• Exposes architecture and implementation “features”• Might be handled in storage layer – eg. MegaStore, Percolator
  • 5. Configuration Layers (aka “OSI for HBase”)
  • 6. HBase Architecture
  • 7. HBase Architecture• HBase uses HDFS (or similar) as its reliable storage layer – Handles checksums, replication, failover• Native Java API, Gateway for REST, Thrift, Avro• Master manages cluster• RegionServer manage data• ZooKeeper is used the “neural network” – Crucial for HBase – Bootstraps and coordinates cluster
  • 8. Auto Sharding
  • 9. Distribution
  • 10. Auto Sharding and Distribution• Unit of scalability in HBase is the Region• Sorted, contiguous range of rows• Spread “randomly” across RegionServer• Moved around for load balancing and failover• Split automatically or manually to scale with growing data• Capacity is solely a factor of cluster nodes vs. regions per node
  • 11. Column Families
  • 12. Storage Separation• Column Families allow for separation of data – Used by Columnar Databases for fast analytical queries, but on column level only – Allows different or no compression depending on the content type• Segregate information based on access pattern• Data is stored in one or more storage file, called HFiles
  • 13. Merge Reads
  • 14. Bloom Filter• Bloom Filters are generated when HFile is persisted – Stored at the end of each HFile – Loaded into memory• Allows check on row or row+column level• Can filter entire store files from reads – Useful when data is grouped• Also useful when many misses are expected during reads (non existing keys)
  • 15. Bloom Filter
  • 16. Fold, Store, and Shift
  • 17. Fold, Store, and Shift• Logical layout does not match physical one• All values are stored with the full coordinates, including: Row Key, Column Family, Column Qualifier, and Timestamp• Folds columns into “row per column”• NULLs are cost free as nothing is stored• Versions are multiple “rows” in folded table
  • 18. Key Cardinality
  • 19. Key Cardinality• The best performance is gained from using row keys• Time range bound reads can skip store files – So can Bloom Filters• Selecting column families reduces the amount of data to be scanned• Pure value based filtering is a full table scan – Filters often are too, but reduce network traffic
  • 20. Tall-Narrow vs. Flat-Wide Tables• Rows do not split – Might end up with one row per region• Same storage footprint• Put more details into the row key – Sometimes dummy column only – Make use of partial key scans• Tall with Scans, Wide with Gets – Atomicity only on row level• Example: Large graphs, stored as adjacency matrix
  • 21. Example: Mail Inbox <userId> : <colfam> : <messageId> : <timestamp> : <email-message>12345 : data : 5fc38314-e290-ae5da5fc375d : 1307097848 : "Hi Lars, ..."12345 : data : 725aae5f-d72e-f90f3f070419 : 1307099848 : "Welcome, and ..."12345 : data : cc6775b3-f249-c6dd2b1a7467 : 1307101848 : "To Whom It ..."12345 : data : dcbee495-6d5e-6ed48124632c : 1307103848 : "Hi, how are ..." or12345-5fc38314-e290-ae5da5fc375d : data : : 1307097848 : "Hi Lars, ..."12345-725aae5f-d72e-f90f3f070419 : data : : 1307099848 : "Welcome, and ..."12345-cc6775b3-f249-c6dd2b1a7467 : data : : 1307101848 : "To Whom It ..."12345-dcbee495-6d5e-6ed48124632c : data : : 1307103848 : "Hi, how are ..."  Same Storage Requirements
  • 22. Partial Key ScansKey Description<userId> Scan over all messages for a given user ID<userId>-<date> Scan over all messages on a given date for the given user ID<userId>-<date>-<messageId> Scan over all parts of a message for a given user ID and date<userId>-<date>-<messageId>-<attachmentId> Scan over all attachments of a message for a given user ID and date
  • 23. Sequential Keys <timestamp><more key>: {CF: {CQ: {TS : Val}}}• Hotspotting on Regions: bad!• Instead do one of the following: – Salting • Prefix <timestamp> with distributed value • Binning or bucketing rows across regions – Key field swap/promotion • Move <more key> before the timestamp (see OpenTSDB later) – Randomization • Move <timestamp> out of key
  • 24. Key Design
  • 25. Key Design• Based on access pattern, either use sequential or random keys• Often a combination of both is needed – Overcome architectural limitations• Neither is necessarily bad – Use bulk import for sequential keys and reads – Random keys are good for random access patterns
  • 26. Example: Facebook Insights• > 20B Events per Day• 1M Counter Updates per Second – 100 Nodes Cluster – 10K OPS per Node• ”Like” button triggers AJAX request• Event written to log file• 30mins current for website owner Web ➜ Scribe ➜ Ptail ➜ Puma ➜ HBase
  • 27. HBase Counters• Store counters per Domain and per URL – Leverage HBase increment (atomic read-modify- write) feature• Each row is one specific Domain or URL• The columns are the counters for specific metrics• Column families are used to group counters by time range – Set time-to-live on CF level to auto-expire counters by age to save space, e.g., 2 weeks on “Daily Counters” family
  • 28. Key Design• Reversed Domains – Examples: “com.cloudera.www”, “com.cloudera.blog” – Helps keeping pages per site close, as HBase efficiently scans blocks of sorted keys• Domain Row Key =
MD5(Reversed Domain) + Reversed Domain – Leading MD5 hash spreads keys randomly across all regions for load balancing reasons – Only hashing the domain groups per site (and per subdomain if needed)• URL Row Key =
MD5(Reversed Domain) + Reversed Domain + URL ID – Unique ID per URL already available, make use of it
  • 29. Insights Schema
  • 30. Example: OpenTSDB• Metric Type, Tags are stored as IDs• Periodically rolled up
  • 31. Summary• Design for Use-Case – Read, Write, or Both?• Avoid Hotspotting• Consider using IDs instead of full text• Leverage Column Family to HFile relation• Shift details to appropriate position – Composite Keys – Column Qualifiers
  • 32. Summary (cont.)• Schema design is a combination of – Designing the keys (row and column) – Segregate data into column families – Choose compression and block sizes• Similar techniques are needed to scale most systems – Add indexes, partition data, consistent hashing• Denormalization, Duplication, and Intelligent Keys (DDI)
  • 33. Questions?Email: lars@cloudera.comTwitter: @larsgeorge