HBase and Accumulo
 Washington DC Hadoop User Group
          Jan 25th, 2012

            Todd Lipcon
     Software Engineer, Cloudera
   todd@cloudera.com / @tlipcon


     Copyright 2011 Cloudera Inc. All rights reserved
Background – Overview

• HBase and Accumulo are both open-source, Apache
  2.0 licensed implementations of Google’s BigTable
  infrastructure, running on Apache Hadoop
• Scalable, distributed storage
   • Scalable data storage at petabyte scale, storing trillions of
     rows distributed across hundreds or thousands of machines
   • Automatic fault tolerance and data distribution as machines
     crash or rejoin the cluster
   • Linear scaling of IOPS and data capacity by adding servers
• Data model is a big sorted hierarchical map

                    Copyright 2012 Cloudera Inc. All rights reserved   2
Sorted Map Datastores
• Each row has a row key (like a Primary Key in RDBMS
  terms)
   • Users may query by exact row key or by range of row keys
   • Data is always stored and returned in sorted order
• Each row has some number of columns
   • Each column has a qualifier and some piece of data. Like a
     Map<byte[], byte[]>
   • Different rows may have different sets of columns
   • Each cell has an associated timestamp and may retain a
     history of previous values
• Columns are grouped into column families and locality
  groups
                    Copyright 2012 Cloudera Inc. All rights reserved   3
Sorted Map Datastore
   (logical view as “records”)

  Implicit PRIMARY KEY in
             RDBMS terms                              Data is all byte[] in HBase


                       Row key       Data
  Different types of
data separated into
                       cutting       info: , ‘height’: ‘9ft’, ‘state’: ‘CA’ -
           different                 roles: , ‘ASF’: ‘Director’, ‘Hadoop’: ‘Founder’ -
 “column families”     tlipcon       info: , ‘height’: ‘5ft7, ‘state’: ‘CA’ -
                                     roles: , ‘Hadoop’: ‘Committer’@ts=2010,
                                             ‘Hadoop’: ‘PMC’@ts=2011,
                                             ‘Hive’: ‘Contributor’ -

  Different rows may have different sets              A single cell might have different
              of columns(table is sparse)             values at different timestamps

         Useful for *-To-Many mappings
Locality Groups
• Different sets of columns may have different properties
  and access patterns
   • Perhaps a few columns are accessed all the time, whereas
     others are large and rarely needed
   • For example, a user’s metadata (1kb, accessed frequently) and
     their photo (1MB, cached by CDN and accessed rarely)
• Put metadata in one locality group and photos in
  another
• Locality groups stored separately on disk: access just the
  metadata without reading the photo
Sorted Map Datastore
          (physical view as “cells”)
                         info Column Family / Locality Group
               Row key    Column key         Timestamp          Cell value
               cutting    info:height        1273516197868      9ft
               cutting    info:state         1043871824184      CA
               tlipcon    info:height        1273878447049      5ft7
               tlipcon    info:state         1273616297446      CA

                         roles Column Family / Locality Group
               Row key    Column key         Timestamp          Cell value
               cutting    roles:ASF          1273871823022      Director
     Sorted
  on disk by   cutting    roles:Hadoop       1183746289103      Founder
Row key, Col   tlipcon    roles:Hadoop       1300062064923      PMC
        key,
 descending    tlipcon    roles:Hadoop       1293388212294      Committer
 timestamp
               tlipcon    roles:Hive         1273616297446      Contributor

                                       Milliseconds since unix epoch
(image from Accumulo manual)
Copyright 2012 Cloudera Inc. All rights reserved   7
Accumulo/HBase Terminology
Accumulo     HBase                  Definition
Tablet       Region                 A partition of a table (eg email inboxes starting
                                    with ‘a’-’c’)
TabletServer RegionServer           A server in the cluster which hosts a number of
                                    tablets/regions, providing read/write access
Log/WAL      HLog/WAL               Write-ahead log – used for durably logging edits
Minor        Flush                  Writing data from memory to disk
compaction
Major        Minor                  Merging several on-disk files into a larger one
compaction   Compaction
Major          Major                Merging all of the on-disk files into a larger one
compaction compaction
with all files

                        Copyright 2012 Cloudera Inc. All rights reserved                 8
That’s all the intro we have time for…

• Check out the excellent Accumulo manual at
  http://incubator.apache.org/accumulo
• And the HBase manual at
  http://hbase.apache.org/book.html
• Also some longer intro videos on Cloudera’s website,
  and an excellent O’Reilly book




                 Copyright 2012 Cloudera Inc. All rights reserved   9
Commonalities (the non-controversial stuff)

• Both systems scale well
   • Clusters with >1000 nodes, >1PB
   • Example HBase users: StumbleUpon, TrendMicro, Facebook,
     eBay, Flurry, ngmoco, Mozilla, Adobe, etc.
   • Example Accumulo users: ??????? (I don’t have clearance but
     I’m told they’re big and important)
• Both systems perform well
   • Depending on tuning, one might beat the other at any given
     benchmark, but overall results seem comparable
• Both open source with active development

                   Copyright 2012 Cloudera Inc. All rights reserved   10
Commonalities (the non-controversial stuff)

• Storage formats are very similar
   • Used to be the same, then diverged, then re-converged!
   • Multi-level BTrees, bloom filters, compression
   • Prefix compression currently missing in HBase, 95% complete
     for 0.94.0
• Caching code very similar
   • Accumulo uses an older version of HBase’s LRUBlockCache
   • HBase has some recent improvements (off-heap cache), but I
     imagine Accumulo will grab them soon enough.



                   Copyright 2012 Cloudera Inc. All rights reserved   11
General features

• Both have good MapReduce integration
• Both have a command-line shell
• Both have a pretty good test suite
   • Accumulo used to be ahead here, but we traded off some
     ideas and use similar testing strategies now
• Both use ZooKeeper for fault tolerant metadata storage,
  and support failover Masters




                   Copyright 2012 Cloudera Inc. All rights reserved   12
Now for the fun part… BigTable shootout 2012

• Warning: I am necessarily biased as an HBase
  committer.
• I will be comparing the very latest versions
   • HBase 0.92.0 (released only 2 days ago!)
   • Accumulo 1.4 (not yet released, due out mid Feb?)
• Please feel free to loudly disagree after the talk during
  the time allotted for questions – I am happy to be
  proven wrong! I’ll invite Aaron Cordova and John Vines
  up to help answer questions.


                   Copyright 2012 Cloudera Inc. All rights reserved   13
Differences – Active contributors and users




                                                             (plus various contractors thereof)
       (I ran out of space)


                       Copyright 2012 Cloudera Inc. All rights reserved                           14
Differences – User Mailing list activity




   500-600 messages                                      50-100 messages
   per month (peak                                       per month (peak
   1088)                                                 105)

                                                         *but it’s new at Apache+



     Winner:
                Copyright 2012 Cloudera Inc. All rights reserved                    15
Differences – Access Control

• Accumulo has per-cell visibility labels as well as table
  ACLs
   • Each cell has an ACL of what users may see it. (eg
     (TS|(SECRET&PROJECTX)))
   • Users who don’t have access can’t tell the cell even exists
   • Very useful for classified information!
• HBase has column family ACLs but no built-in per-cell
  visibility support
   • Some early work to add visibility labels, but not done yet

   Winner:
                     Copyright 2012 Cloudera Inc. All rights reserved   16
Differences – Authentication

• Accumulo has a built-in user database
   • Users are authenticated by username/password
   • Passed in plaintext over the wire
• HBase optionally uses Kerberos
   • Central administration (eg via Active Directory)
   • Key-based secure credential exchange
   • Temporary delegation tokens are created for MR jobs, so even
     if a job’s data leaks, credentials are not compromised
   • Consistent with rest of Hadoop ecosystem

   Winner:
                   Copyright 2012 Cloudera Inc. All rights reserved   17
Differences – Locality Groups

• HBase has a 1:1 correspondence of Column Families
  and Locality Groups
   • Moving columns from one locality group to another after data
     has been inserted is impossible
• Accumulo has a proper distinction and allows online
  reassignment of column-to-locality-group mappings



Winner:


                    Copyright 2012 Cloudera Inc. All rights reserved   18
Differences – extensibility frameworks
• Accumulo has iterators
   • Allows custom processing to be inserted in the read path as
     well as into the table maintenance code. Provides neat
     features like automatic summary maintenance, for example.
• HBase has coprocessors
   • Much more general framework that also subsumes triggers,
     stored procedures, and cluster management hooks. (e.g
     Access Control is an HBase coprocessor).
   • Generality has its cost: very difficult to do some things that
     are simple with iterators
   • Some iterator use cases can be done with HBase filters
• I’ll call this one a tie
                     Copyright 2012 Cloudera Inc. All rights reserved   19
Differences – Web UI and Monitoring




    Winner:
              Copyright 2012 Cloudera Inc. All rights reserved   20
Differences – Write-ahead logging

• HBase uses HDFS files as a WAL
  • Takes advantage of HDFS performance improvements as they
    are developed
  • Same trusted replication and checksumming schemes as HDFS
• Accumulo has its own Logger implementation
  • Extra daemons to run
  • Does not leverage improvements in HDFS
  • Won’t re-replicate if loggers go down


  Winner:
                  Copyright 2012 Cloudera Inc. All rights reserved   21
Differences – Other features

• Accumulo has a nice mock Accumulo implementation
   • Nice for testing user software
• Accumulo supports isolated scans on super-wide rows
   • HBase supports wide rows but isolation properties are lost
• Accumulo supports tablet merging
   • If tablets get too small, they’ll merge with neighbors
• Accumulo supports table snapshotting/cloning
• Other sundry features: logical clocks, RPC tracing, RPC
  wire compatibility, and more.

                     Copyright 2012 Cloudera Inc. All rights reserved   22
Differences – Other features
• HBase has RPM and Debian packages as part of Apache
  BigTop
   • Integrated (and integration-tested) with Hive, Pig, and others
• HBase has commercial support available from Cloudera,
  as well as several vendors and other projects building
  on top (Lily, OMID, etc)
• HBase has first-class support for REST clients and thin
  Thrift clients
• HBase has inter-cluster wide-area replication
• HBase has significantly more advanced bloom filters
  and other such optimizations (thanks Facebook!)
                    Copyright 2012 Cloudera Inc. All rights reserved   23
Summary

• Neither system is better!
• One system may very well be better for your use case,
  or for the community you want to interact with
• Over time, the feature sets are converging
   • RFile vs HFile v2, Security, Caching, Compaction policies,
     Iterators/Coprocessors
• Now that both projects are in Apache, open dialogue,
  code sharing, and friendly competition will help make
  both projects better!


                     Copyright 2012 Cloudera Inc. All rights reserved   24
Thanks!

Aaron Cordova and John Vines
(Accumulo committers) will now join
me for some discussion / questions



          Email: todd@cloudera.com
          Twitter: @tlipcon
            Copyright 2012 Cloudera Inc. All rights reserved   25

HBase and Accumulo | Washington DC Hadoop User Group

  • 1.
    HBase and Accumulo Washington DC Hadoop User Group Jan 25th, 2012 Todd Lipcon Software Engineer, Cloudera todd@cloudera.com / @tlipcon Copyright 2011 Cloudera Inc. All rights reserved
  • 2.
    Background – Overview •HBase and Accumulo are both open-source, Apache 2.0 licensed implementations of Google’s BigTable infrastructure, running on Apache Hadoop • Scalable, distributed storage • Scalable data storage at petabyte scale, storing trillions of rows distributed across hundreds or thousands of machines • Automatic fault tolerance and data distribution as machines crash or rejoin the cluster • Linear scaling of IOPS and data capacity by adding servers • Data model is a big sorted hierarchical map Copyright 2012 Cloudera Inc. All rights reserved 2
  • 3.
    Sorted Map Datastores •Each row has a row key (like a Primary Key in RDBMS terms) • Users may query by exact row key or by range of row keys • Data is always stored and returned in sorted order • Each row has some number of columns • Each column has a qualifier and some piece of data. Like a Map<byte[], byte[]> • Different rows may have different sets of columns • Each cell has an associated timestamp and may retain a history of previous values • Columns are grouped into column families and locality groups Copyright 2012 Cloudera Inc. All rights reserved 3
  • 4.
    Sorted Map Datastore (logical view as “records”) Implicit PRIMARY KEY in RDBMS terms Data is all byte[] in HBase Row key Data Different types of data separated into cutting info: , ‘height’: ‘9ft’, ‘state’: ‘CA’ - different roles: , ‘ASF’: ‘Director’, ‘Hadoop’: ‘Founder’ - “column families” tlipcon info: , ‘height’: ‘5ft7, ‘state’: ‘CA’ - roles: , ‘Hadoop’: ‘Committer’@ts=2010, ‘Hadoop’: ‘PMC’@ts=2011, ‘Hive’: ‘Contributor’ - Different rows may have different sets A single cell might have different of columns(table is sparse) values at different timestamps Useful for *-To-Many mappings
  • 5.
    Locality Groups • Differentsets of columns may have different properties and access patterns • Perhaps a few columns are accessed all the time, whereas others are large and rarely needed • For example, a user’s metadata (1kb, accessed frequently) and their photo (1MB, cached by CDN and accessed rarely) • Put metadata in one locality group and photos in another • Locality groups stored separately on disk: access just the metadata without reading the photo
  • 6.
    Sorted Map Datastore (physical view as “cells”) info Column Family / Locality Group Row key Column key Timestamp Cell value cutting info:height 1273516197868 9ft cutting info:state 1043871824184 CA tlipcon info:height 1273878447049 5ft7 tlipcon info:state 1273616297446 CA roles Column Family / Locality Group Row key Column key Timestamp Cell value cutting roles:ASF 1273871823022 Director Sorted on disk by cutting roles:Hadoop 1183746289103 Founder Row key, Col tlipcon roles:Hadoop 1300062064923 PMC key, descending tlipcon roles:Hadoop 1293388212294 Committer timestamp tlipcon roles:Hive 1273616297446 Contributor Milliseconds since unix epoch
  • 7.
    (image from Accumulomanual) Copyright 2012 Cloudera Inc. All rights reserved 7
  • 8.
    Accumulo/HBase Terminology Accumulo HBase Definition Tablet Region A partition of a table (eg email inboxes starting with ‘a’-’c’) TabletServer RegionServer A server in the cluster which hosts a number of tablets/regions, providing read/write access Log/WAL HLog/WAL Write-ahead log – used for durably logging edits Minor Flush Writing data from memory to disk compaction Major Minor Merging several on-disk files into a larger one compaction Compaction Major Major Merging all of the on-disk files into a larger one compaction compaction with all files Copyright 2012 Cloudera Inc. All rights reserved 8
  • 9.
    That’s all theintro we have time for… • Check out the excellent Accumulo manual at http://incubator.apache.org/accumulo • And the HBase manual at http://hbase.apache.org/book.html • Also some longer intro videos on Cloudera’s website, and an excellent O’Reilly book Copyright 2012 Cloudera Inc. All rights reserved 9
  • 10.
    Commonalities (the non-controversialstuff) • Both systems scale well • Clusters with >1000 nodes, >1PB • Example HBase users: StumbleUpon, TrendMicro, Facebook, eBay, Flurry, ngmoco, Mozilla, Adobe, etc. • Example Accumulo users: ??????? (I don’t have clearance but I’m told they’re big and important) • Both systems perform well • Depending on tuning, one might beat the other at any given benchmark, but overall results seem comparable • Both open source with active development Copyright 2012 Cloudera Inc. All rights reserved 10
  • 11.
    Commonalities (the non-controversialstuff) • Storage formats are very similar • Used to be the same, then diverged, then re-converged! • Multi-level BTrees, bloom filters, compression • Prefix compression currently missing in HBase, 95% complete for 0.94.0 • Caching code very similar • Accumulo uses an older version of HBase’s LRUBlockCache • HBase has some recent improvements (off-heap cache), but I imagine Accumulo will grab them soon enough. Copyright 2012 Cloudera Inc. All rights reserved 11
  • 12.
    General features • Bothhave good MapReduce integration • Both have a command-line shell • Both have a pretty good test suite • Accumulo used to be ahead here, but we traded off some ideas and use similar testing strategies now • Both use ZooKeeper for fault tolerant metadata storage, and support failover Masters Copyright 2012 Cloudera Inc. All rights reserved 12
  • 13.
    Now for thefun part… BigTable shootout 2012 • Warning: I am necessarily biased as an HBase committer. • I will be comparing the very latest versions • HBase 0.92.0 (released only 2 days ago!) • Accumulo 1.4 (not yet released, due out mid Feb?) • Please feel free to loudly disagree after the talk during the time allotted for questions – I am happy to be proven wrong! I’ll invite Aaron Cordova and John Vines up to help answer questions. Copyright 2012 Cloudera Inc. All rights reserved 13
  • 14.
    Differences – Activecontributors and users (plus various contractors thereof) (I ran out of space) Copyright 2012 Cloudera Inc. All rights reserved 14
  • 15.
    Differences – UserMailing list activity 500-600 messages 50-100 messages per month (peak per month (peak 1088) 105) *but it’s new at Apache+ Winner: Copyright 2012 Cloudera Inc. All rights reserved 15
  • 16.
    Differences – AccessControl • Accumulo has per-cell visibility labels as well as table ACLs • Each cell has an ACL of what users may see it. (eg (TS|(SECRET&PROJECTX))) • Users who don’t have access can’t tell the cell even exists • Very useful for classified information! • HBase has column family ACLs but no built-in per-cell visibility support • Some early work to add visibility labels, but not done yet Winner: Copyright 2012 Cloudera Inc. All rights reserved 16
  • 17.
    Differences – Authentication •Accumulo has a built-in user database • Users are authenticated by username/password • Passed in plaintext over the wire • HBase optionally uses Kerberos • Central administration (eg via Active Directory) • Key-based secure credential exchange • Temporary delegation tokens are created for MR jobs, so even if a job’s data leaks, credentials are not compromised • Consistent with rest of Hadoop ecosystem Winner: Copyright 2012 Cloudera Inc. All rights reserved 17
  • 18.
    Differences – LocalityGroups • HBase has a 1:1 correspondence of Column Families and Locality Groups • Moving columns from one locality group to another after data has been inserted is impossible • Accumulo has a proper distinction and allows online reassignment of column-to-locality-group mappings Winner: Copyright 2012 Cloudera Inc. All rights reserved 18
  • 19.
    Differences – extensibilityframeworks • Accumulo has iterators • Allows custom processing to be inserted in the read path as well as into the table maintenance code. Provides neat features like automatic summary maintenance, for example. • HBase has coprocessors • Much more general framework that also subsumes triggers, stored procedures, and cluster management hooks. (e.g Access Control is an HBase coprocessor). • Generality has its cost: very difficult to do some things that are simple with iterators • Some iterator use cases can be done with HBase filters • I’ll call this one a tie Copyright 2012 Cloudera Inc. All rights reserved 19
  • 20.
    Differences – WebUI and Monitoring Winner: Copyright 2012 Cloudera Inc. All rights reserved 20
  • 21.
    Differences – Write-aheadlogging • HBase uses HDFS files as a WAL • Takes advantage of HDFS performance improvements as they are developed • Same trusted replication and checksumming schemes as HDFS • Accumulo has its own Logger implementation • Extra daemons to run • Does not leverage improvements in HDFS • Won’t re-replicate if loggers go down Winner: Copyright 2012 Cloudera Inc. All rights reserved 21
  • 22.
    Differences – Otherfeatures • Accumulo has a nice mock Accumulo implementation • Nice for testing user software • Accumulo supports isolated scans on super-wide rows • HBase supports wide rows but isolation properties are lost • Accumulo supports tablet merging • If tablets get too small, they’ll merge with neighbors • Accumulo supports table snapshotting/cloning • Other sundry features: logical clocks, RPC tracing, RPC wire compatibility, and more. Copyright 2012 Cloudera Inc. All rights reserved 22
  • 23.
    Differences – Otherfeatures • HBase has RPM and Debian packages as part of Apache BigTop • Integrated (and integration-tested) with Hive, Pig, and others • HBase has commercial support available from Cloudera, as well as several vendors and other projects building on top (Lily, OMID, etc) • HBase has first-class support for REST clients and thin Thrift clients • HBase has inter-cluster wide-area replication • HBase has significantly more advanced bloom filters and other such optimizations (thanks Facebook!) Copyright 2012 Cloudera Inc. All rights reserved 23
  • 24.
    Summary • Neither systemis better! • One system may very well be better for your use case, or for the community you want to interact with • Over time, the feature sets are converging • RFile vs HFile v2, Security, Caching, Compaction policies, Iterators/Coprocessors • Now that both projects are in Apache, open dialogue, code sharing, and friendly competition will help make both projects better! Copyright 2012 Cloudera Inc. All rights reserved 24
  • 25.
    Thanks! Aaron Cordova andJohn Vines (Accumulo committers) will now join me for some discussion / questions Email: todd@cloudera.com Twitter: @tlipcon Copyright 2012 Cloudera Inc. All rights reserved 25

Editor's Notes

  • #3 Earlier, I said that Hbase is a big sorted map. Here is an example of a table. The map key is (row key+column+timestamp). The value is the cell contents. The rows in the map are sorted by key. In this example, Row1 has 3 columns in the &quot;info&quot; column family. Row2 only has a single column. A column can also be empty.Each row has a timestamp. By default, the timestamp is set to the current time (in milliseconds since the Unix Epoch, January 1st 1970) when the row is inserted. A client can specify a timestamp when inserting or retrieving data, and specify how many versions of each cell should be maintained.Data in HBase is non-typed; everything is an array of bytes. Rows are sorted lexicographically. This order is maintained on disk, so Row1 and Row2 can be read together in just one disk seek.
  • #4 Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  • #5 Earlier, I said that Hbase is a big sorted map. Here is an example of a table. The map key is (row key+column+timestamp). The value is the cell contents. The rows in the map are sorted by key. In this example, Row1 has 3 columns in the &quot;info&quot; column family. Row2 only has a single column. A column can also be empty.Each row has a timestamp. By default, the timestamp is set to the current time (in milliseconds since the Unix Epoch, January 1st 1970) when the row is inserted. A client can specify a timestamp when inserting or retrieving data, and specify how many versions of each cell should be maintained.Data in HBase is non-typed; everything is an array of bytes. Rows are sorted lexicographically. This order is maintained on disk, so Row1 and Row2 can be read together in just one disk seek.