Intro to HBase - Lars George


Published on

With the public confession of Facebook, HBase is on everyone's lips when it comes to the discussion around the new "NoSQL" area of databases. In this talk, Lars will introduce and present a comprehensive overview of HBase. This includes the history of HBase, the underlying architecture, available interfaces, and integration with Hadoop.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Intro to HBase - Lars George

  1. 1. HBASE – THE SCALABLEDATA STOREAn Introduction to HBaseJAX UK, October 2012Lars GeorgeDirector EMEA Services
  2. 2. About Me•  Director EMEA Services @ Cloudera •  Consulting on Hadoop projects (everywhere)•  Apache Committer •  HBase and Whirr•  O’Reilly Author •  HBase – The Definitive Guide •  Now in Japanese!•  Contact • 日本語版も出ました!   •  @larsgeorge
  3. 3. Agenda•  Introduction to HBase•  HBase Architecture•  MapReduce with HBase•  Advanced Techniques•  Current Project Status
  5. 5. Why Hadoop/HBase?•  Datasets are constantly growing and intake soars •  Yahoo! has 140PB+ and 42k+ machines •  Facebook adds 500TB+ per day, 100PB+ raw data, on tens of thousands of machines •  Are you “throwing” data away today?•  Traditional databases are expensive to scale and inherently difficult to distribute•  Commodity hardware is cheap and powerful •  $1000 buys you 4-8 cores/4GB/1TB •  600GB 15k RPM SAS nearly $500•  Need for random access and batch processing •  Hadoop only supports batch/streaming
  6. 6. History of Hadoop/HBase•  Google solved its scalability problems •  “The Google File System” published October 2003 •  Hadoop DFS •  “MapReduce: Simplified Data Processing on Large Clusters” published December 2004 •  Hadoop MapReduce •  “BigTable: A Distributed Storage System for Structured Data” published November 2006 •  HBase
  7. 7. Hadoop Introduction•  Two main components •  Hadoop Distributed File System (HDFS) •  A scalable, fault-tolerant, high performance distributed file system capable of running on commodity hardware •  Hadoop MapReduce •  Software framework for distributed computation•  Significant adoption •  Used in production in hundreds of organizations •  Primary contributors: Yahoo!, Facebook, Cloudera
  8. 8. HDFS: Hadoop Distributed File System•  Reliably store petabytes of replicated data across thousands of nodes •  Data divided into 64MB blocks, each block replicated three times•  Master/Slave architecture •  Master NameNode contains block locations •  Slave DataNode manages block on local file system•  Built on commodity hardware •  No 15k RPM disks or RAID required (nor wanted!)
  9. 9. MapReduce•  Distributed programming model to reliably process petabytes of data using its locality •  Built-in bindings for Java and C •  Can be used with any language via Hadoop Streaming•  Inspired by map and reduce functions in functional programming Input  è  Map()  è  Copy/Sort  è  Reduce()  è  Output    
  10. 10. Hadoop…•  … is designed to store and stream extremely large datasets in batch•  … is not intended for realtime querying•  … does not support random access•  … does not handle billions of small files well •  Less than default block size of 64MB and smaller •  Keeps “inodes” in memory on master•  … is not supporting structured data more than unstructured or complex data That is why we have HBase!
  11. 11. Why HBase and not …?•  Question: Why HBase and not <put-your-favorite- nosql-solution-here>?•  What else is there? •  Key/value stores •  Document-oriented stores •  Column-oriented stores •  Graph-oriented stores•  Features to ask for •  In memory or persistent? •  Strict or eventual consistency? •  Distributed or single machine (or afterthought)? •  Designed for read and/or write speeds? •  How does it scale? (if that is what you need)
  12. 12. What is HBase?•  Distributed•  Column-Oriented•  Multi-Dimensional•  High-Availability (CAP anyone?)•  High-Performance•  Storage System Project Goals Billions of Rows * Millions of Columns * Thousands of Versions Petabytes across thousands of commodity servers
  13. 13. HBase is not…•  An SQL Database •  No joins, no query engine, no types, no SQL •  Transactions and secondary indexes only as add-ons but immature•  A drop-in replacement for your RDBMS•  You must be OK with RDBMS anti-schema •  Denormalized data •  Wide and sparsely populated tables •  Just say “no” to your inner DBA Keyword: Impedance Match
  14. 14. HBase Tables
  15. 15. HBase Tables
  16. 16. HBase Tables
  17. 17. HBase Tables
  18. 18. HBase Tables
  19. 19. HBase Tables
  20. 20. HBase Tables
  21. 21. HBase Tables
  22. 22. HBase Tables
  23. 23. HBase Tables
  24. 24. HBase Tables•  Tables are sorted by the Row Key in lexicographical order•  Table schema only defines its Column Families •  Each family consists of any number of Columns •  Each column consists of any number of Versions •  Columns only exist when inserted, NULLs are free •  Columns within a family are sorted and stored together •  Everything except table names are byte[](Table, Row, Family:Column, Timestamp) è Value
  25. 25. Column Family vs. Column•  Use only a few column families •  Causes many files that need to stay open per region plus class overhead per family•  Best used when logical separation between data and meta columns•  Sorting per family can be used to convey application logic or access pattern
  26. 26. HBase Architecture•  Table is made up of any number if regions•  Region is specified by its startKey and endKey •  Empty table: (Table, NULL, NULL) •  Two-region table: (Table, NULL, “com.cloudera.www”) and (Table, “com.cloudera.www”, NULL)•  Each region may live on a different node and is made up of several HDFS files and blocks, each of which is replicated by Hadoop
  27. 27. HBase Architecture (cont.)•  Two types of HBase nodes: Master and RegionServer•  Special tables -ROOT- and.META. store schema information and region locations•  Master server responsible for RegionServer monitoring as well as assignment and load balancing of regions•  Uses ZooKeeper as its distributed coordination service •  Manages Master election and server availability
  28. 28. Web Crawl Example•  Canonical use-case for BigTable•  Store web crawl data •  Table webtable with family content and meta •  Row is reversed URL with Columns •  content:data stores the raw crawled data •  meta:language stores http language header •  meta:type stores http content-type header •  While processing raw data for hyperlinks and images, add families links and images •  links:<rurl> column for each hyperlink •  images:<rurl> column for each image
  29. 29. HBase Clients•  Native Java Client/API•  Non-Java Clients •  REST server •  Avro server •  Thrift server •  Jython, Scala, Groovy DSL•  TableInputFormat/TableOutputFormat for MapReduce •  HBase as MapReduce source and/or target•  HBase Shell •  JRuby shell adding get, put, scan and admin calls
  30. 30. Java API•  CRUD •  get: retrieve an entire, or partial row (R) •  put: create and update a row (CU) •  delete: delete a cell, column, columns, or row (D) Result get(Get get) throws IOException; void put(Put put) throws IOException; void delete(Delete delete) throws IOException;
  31. 31. Java API (cont.)•  CRUD+SI •  scan: Scan any number of rows (S) •  increment: Increment a column value (I)ResultScanner getScanner(Scan scan) throws IOException;Result increment(Increment increment) throws IOException ;
  32. 32. Java API (cont.)•  CRUD+SI+CAS •  Atomic compare-and-swap (CAS)•  Combined get, check, and put operation•  Helps to overcome lack of full transactions
  33. 33. Batch Operations•  Support Get, Put, and Delete•  Reduce network round-trips•  If possible, batch operation to the server to gain better overall throughput void batch(List<Row> actions, Object[] results) throws IOException, InterruptedException; Object[] batch(List<Row> actions) throws IOException, InterruptedException;
  34. 34. Filters•  Can be used with Get and Scan operations•  Server side hinting•  Reduce data transferred to client•  Filters are no guarantee for fast scans •  Still full table scan in worst-case scenario •  Might have to implement your own•  Filters can hint next row key
  35. 35. HBase Extensions•  Hive, Pig, Cascading •  Hadoop-targeted MapReduce tools with HBase integration•  Sqoop •  Read and write to HBase for further processing in Hadoop•  HBase Explorer, Nutch, Heretrix•  SpringData•  Toad
  36. 36. History of HBase•  November 2006 •  Google releases paper on BigTable•  February 2007 •  Initial HBase prototype created as Hadoop contrib•  October 2007 •  First “useable” HBase (Hadoop 0.15.0)•  January 2008 •  Hadoop becomes TLP, HBase becomes subproject•  October 2008 •  HBase 0.18.1 released•  January 2009 •  HBase 0.19.0•  September 2009 •  HBase 0.20.0 released (Performance Release)•  May 2010 •  HBase becomes TLP•  June 2010 •  HBase 0.89.20100621, first developer release•  May 2011 •  HBase 0.90.3 release
  37. 37. HBase Users•  Adobe•  eBay•  Facebook•  Mozilla (Socorro)•  Trend Micro (Advanced Threat Research)•  Twitter•  Yahoo!•  …
  39. 39. HBase Architecture
  40. 40. HBase Architecture (cont.)
  41. 41. HBase Architecture (cont.)•  Based on Log-Structured Merge-Trees (LSM-Trees)•  Inserts are done in write-ahead log first•  Data is stored in memory and flushed to disk on regular intervals or based on size•  Small flushes are merged in the background to keep number of files small•  Reads read memory stores first and then disk based files second•  Deletes are handled with “tombstone” markers•  Atomicity on row level no matter how many columns •  keeps locking model easy
  42. 42. Write Ahead Log
  44. 44. MapReduce with HBase•  Framework to use HBase as source and/or sink for MapReduce jobs•  Thin layer over native Java API•  Provides helper class to set up jobs easier TableMapReduceUtil.initTableMapperJob( “test”, scan, MyMapper.class, ImmutableBytesWritable.class, RowResult.class, job); TableMapReduceUtil.initTableReducerJob( “table”, MyReducer.class, job);
  45. 45. MapReduce with HBase (cont.)•  Special use-case in regards to Hadoop•  Tables are sorted and have unique keys •  Often we do not need a Reducer phase •  Combiner not needed•  Need to make sure load is distributed properly by randomizing keys (or use bulk import)•  Partial or full table scans possible•  Scans are very efficient as they make use of block caches •  But then make sure you do not create to much churn, or better switch caching off when doing full table scans.•  Can use filters to limit rows being processed
  46. 46. TableInputFormat•  Transforms a HBase table into a source for MapReduce jobs•  Internally uses a TableRecordReader which wraps a Scan instance •  Supports restarts to handle temporary issues•  Splits table by region boundaries and stores current region locality
  47. 47. TableOutputFormat•  Allows to use HBase table as output target•  Put and Delete support from mapper or reducer class•  Uses TableOutputCommitter to write data•  Disables auto-commit on table to make use of client side write buffer•  Handles final flush in close()
  48. 48. HFileOutputFormat•  Used to bulk load data into HBase•  Bypasses normal API and generates low-level store files•  Prepares files for final bulk insert•  Needs special handling of sort order and partitioning•  Only supports one column family (for now)•  Can load bulk updates into existing tables
  49. 49. MapReduce Helper•  TableMapReduceUtil•  IdentityTableMapper •  Passes on key and value, where value is a Result instance and key is set to value.getRow()•  IdentityTableReducer •  Stores values into HBase, must be Put or Delete instances•  HRegionPartitioner •  Not set by default, use it to control partioning on Hadoop level
  50. 50. Custom MapReduce over Tables•  No requirement to use provided framework•  Can read from or write to one or many tables in mapper and reducer•  Can split not on regions but arbitrary boundaries•  Make sure to use write buffer in OutputFormat to get best performance (do not forget to call flushCommits() at the end!)
  52. 52. Advanced Techniques•  Key/Table Design•  DDI•  Salting•  Hashing vs. Sequential Keys•  ColumnFamily vs. Column•  Using BloomFilter•  Data Locality•  checkAndPut() and checkAndDelete()•  Coprocessors
  53. 53. Coprocessors•  New addition to feature set•  Based on talk by Jeff Dean at LADIS 2009 •  Run arbitrary code on each region in RegionServer •  High level call interface for clients •  Calls are addressed to rows or ranges of rows while Coprocessors client library resolves locations •  Calls to multiple rows are atomically split •  Provides model for distributed services •  Automatic scaling, load balancing, request routing
  54. 54. Coprocessors in HBase•  Use for efficient computational parallelism•  Secondary indexing (HBASE-2038)•  Column Aggregates (HBASE-1512) •  SQL-like sum(), avg(), max(), min(), etc.•  Access control (HBASE-3025, HBASE-3045) •  Provide basic access control•  Table Metacolumns•  New filtering •  predicate pushdown•  Table/Region access statistics•  HLog extensions (HBASE-3257)
  55. 55. Coprocessor and RegionObserver•  The Coprocessor interface defines these hooks •  preOpen, postOpen: Called before and after the region is reported as online to the master •  preFlush, postFlush: Called before and after the memstore is flushed into a new store file •  preCompact, postCompact: Called before and after compaction •  preSplit, postSplit: Called after the region is split •  preClose, postClose: Called before and after the region is reported as closed to the master
  56. 56. Coprocessor and RegionObserver•  The RegionObserver interface is defines these hooks: •  preGet, postGet: Called before and after a client makes a Get request •  preExists, postExists: Called before and after the client tests for existence using a Get •  prePut, postPut: Called before and after the client stores a value •  preDelete, postDelete: Called before and after the client deletes a value •  preScannerOpen, postScannerOpen: Called before and after the client opens a new scanner •  preScannerNext, postScannerNext: Called before and after the client asks for the next row on a scanner •  preScannerClose, postScannerClose: Called before and after the client closes a scanner •  preCheckAndPut, postCheckAndPut: Called before and after the client calls checkAndPut() •  preCheckAndDelete, postCheckAndDelete: Called before and after the client calls checkAndDelete()
  58. 58. Current Project Status•  HBase 0.90.x “Advanced Concepts” •  Master Rewrite – More Zookeeper •  Intra Row Scanning •  Further optimizations on algorithms and data structures CDH3•  HBase 0.92.x “Coprocessors” •  Multi-DC Replication •  Discretionary Access Control •  Coprocessors CDH4
  59. 59. Current Project Status (cont.)•  HBase 0.94.x “Performance Release” •  Read CRC Improvements •  Seek Optimizations •  WAL Compression •  Prefix Compression (aka Block Encoding) •  Atomic Append •  Atomic put+delete •  Multi Increment and Multi Append •  Per-region (i.e. local) Multi-Row Transactions •  WALPlayer CDH4.x (soon)
  60. 60. Current Project Status (cont.)•  HBase 0.96.x “The Singularity” •  Protobuf RPC •  Rolling Upgrades •  Multiversion Access •  Metrics V2 •  Preview Technologies •  Snapshots •  PrefixTrie Block Encoding CDH5 ?
  61. 61. Ques%ons?