HBase Client API
 (for webapps?)
         Nick Dimiduk
   Seattle Scalability Meetup
          2013-03-27




                                1
2
3
What are my choices?
   switch (technology) {

       case ‘    ’:
         ...

       case ‘    ’:
         ...

       case ‘    ’:
         ...
   }

                           4
Apache HBase




               5
Java client Interfaces
•   Configuration holds details where to find the cluster and tunable
    settings. Roughly equivalent to JDBC connection string.

•   HConnection represents connections to to the cluster.

•   HBaseAdmin handles DDL operations (create, list, drop, alter, &c.)

•   HTablePool connection pool for table handles.

•   HTable (HTableInterface) is a handle on a single HBase table.
    Send "commands" to the table (Put, Get, Scan, Delete, Increment)

                                                                         6
Java client Example
public static final byte[] TABLE_NAME = Bytes.toBytes("twits");
public static final byte[] TWITS_FAM = Bytes.toBytes("twits");

public static final byte[] USER_COL                                 = Bytes.toBytes("user");
public static final byte[] TWIT_COL                                 = Bytes.toBytes("twit");

private HTablePool pool = new HTablePool();




 https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L23-L30

                                                                                                                    7
Java client Example
private static class Twit {

   private Twit(Result r) {
     this(
           r.getColumnLatest(TWITS_FAM, USER_COL).getValue(),
           Arrays.copyOfRange(r.getRow(), Md5Utils.MD5_LENGTH,
             Md5Utils.MD5_LENGTH + longLength),
           r.getColumnLatest(TWITS_FAM, TWIT_COL).getValue());
   }

   private Twit(byte[] user, byte[] dt, byte[] text) {
     this(
           Bytes.toString(user),
           new DateTime(-1 * Bytes.toLong(dt)),
           Bytes.toString(text));
   }
https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L129-L143

                                                                                                                     8
Java client Example

       private static Get mkGet(String user, DateTime dt) {
         Get g = new Get(mkRowKey(user, dt));
         g.addColumn(TWITS_FAM, USER_COL);
         g.addColumn(TWITS_FAM, TWIT_COL);
         return g;
       }




https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L60-L65

                                                                                                                   9
Ruby, Python client Interface




                                10
Ruby, Python client Interface
        Jyth on
JRu by,

             : '(




                                11
Thrift client Interface


1. Generate bindings

2. Run a “Gateway” between clients and cluster

3. ... profit? code!
        w rite




                                                 12
HBase Cluster

 HBase Clients




Sidebar: Architecture Recap
                                 13
Thrift
                    Gateway     HBase Cluster

Thrift Clients




                 Thrift Architecture
                                                14
Thrift client Interface


•   Thrift gateway exposes a client to RegionServers

•   stateless :D

•   ... except for scanners :'(




                                                       15
Thrift client Example

transport = TSocket.TSocket(host, port)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)
transport.open()




      https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L17-L21

                                                                                     16
Thrift client Example
columns = ['info:user','info:name','info:email']
scanner = client.scannerOpen('users', '', columns)
row = client.scannerGet(scanner)
while row:
    yield user_from_row(row[0])
    row = scannerGet(scanner)
client.scannerClose(scanner)




     https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L33-L39

                                                                                    17
Thrift client Example

def user_from_row(row):
    user = {}
    for col,cell in row.columns.items():
        user[col[5:]] = cell.value
    return "<User: {user}, {name}, {email}>".format(**user)




         https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L26-L30

                                                                                        18
REST client Interface


1. Stand up a "REST Gateway" between your application and the cluster

2. HTTP verbs translate (roughly) into table commands

3. decent support for basic DDL, HTable operations




                                                                        19
HBase Cluster
                   REST
REST Clients
                  Gateway




               REST Architecture
                                            20
REST client Interface


•   REST gateway exposes a client to RegionServers

•   stateless :D

•   ... except for scanners :'(




                                                     21
REST client Example

$ curl -H "Accept: application/json" http://host:port/
{
  "table": [ {
               "name": "followers"
             }, {
               "name": "twits"
             }, {
               "name": "users"
             }
           ]
}




                                                         22
REST client Example
$ curl -H ... http://host:port/table/row [/family:qualifier]
{
    "Row": [
        {
             "key": "VGhlUmVhbE1U",
             "Cell": [
                 {
                     "$": "c2FtdWVsQGNsZW1lbnMub3Jn",
                     "column": "aW5mbzplbWFpbA==",
                     "timestamp": 1338701491422
                 },
                 {
                     "$": "TWFyayBUd2Fpbg==",
                     "column": "aW5mbzpuYW1l",
                     "timestamp": 1338701491422
                 },
             ]
        } ] }

                                                               23
REST client Example

<Rows>
  <Row key="VGhlUmVhbE1U">
    <Cells>
       <Cell column="aW5mbzplbWFpbA==" timestamp="1338701491422">
         c2FtdWVsQGNsZW1lbnMub3Jn
       </Cell>
       <Cell ...>
       ...
    </Cells>
  </Row>
</Rows>




                                                                    24
Beyond Apache




                25
asynchbase
•   Asynchronous non-blocking interface.

•   Inspired by Twisted Python.

•   Partial implementation of HTableInterface.

•   HBaseClient provides entry-point to data.



                                   https://github.com/OpenTSDB/asynchbase
                  http://tsunanet.net/~tsuna/asynchbase/api/org/hbase/async/HBaseClient.html


                                                                                               26
asynchbase
                                                                UpdateResult
                                                                   object



                                 output to
                            => [next state]               3
                        /
                                                               Interpret
input => [this state]                                         response
                        
                            => [error state]
                                 Exception
                                                 Boolean
                                               Put response
                                                                UpdateFailed
                                                                 Exception




                                                                               27
asynchbase Example
       final Scanner scanner = client.newScanner(TABLE_NAME);
       scanner.setFamily(INFO_FAM);
       scanner.setQualifier(PASSWORD_COL);

       ArrayList<ArrayList<KeyValue>> rows = null;
       ArrayList<Deferred<Boolean>> workers = new ArrayList<Deferred<Boolean>>();
       while ((rows = scanner.nextRows(1).joinUninterruptibly()) != null) {
         for (ArrayList<KeyValue> row : rows) {
           KeyValue kv = row.get(0);
           byte[] expected = kv.value();
           String userId = new String(kv.key());
           PutRequest put = new PutRequest(
               TABLE_NAME, kv.key(), kv.family(),
               kv.qualifier(), mkNewPassword(expected));
           Deferred<Boolean> d = client.compareAndSet(put, expected)
             .addCallback(new InterpretResponse(userId))
             .addCallbacks(new ResultToMessage(), new FailureToMessage())
             .addCallback(new SendMessage());
           workers.add(d);
         }
       }

https://github.com/hbaseinaction/twitbase-async/blob/master/src/main/java/HBaseIA/TwitBase/AsyncUsersTool.java#L151-L173

                                                                                                                           28
Others
Reduce day-to-day                                                        Full-blown schema
 developer pain                                                             management


      [Orderly]

                                                               Phoenix




                    Spring-Data
                      Hadoop                                                Kiji.org

                               https://github.com/ndimiduk/orderly
                            http://www.springsource.org/spring-data/
                            https://github.com/forcedotcom/phoenix
                                         http://www.kiji.org/
                                                                                             29
Apache Futures


•   Protobuf wire messages (0.96)

•   C client (TBD, HBASE-1015)

•   HBase Types (TBD, HBASE-8089)




                                     30
So, Webapps?




http://www.amazon.com/Back-Point-Rapiers/dp/B0000271GC

                                                         31
Software Architecture


•   Isolate DAO from app logic, separation of concerns, &c.

•   Separate environment configs from code.

•   Watch out for resource contention.




                                                              32
Deployment Architecture


•   Cache everywhere.

•   Know your component layers.




                                   33
HBase Warts

•   Know thy (HBase) version 0.{92,94,96} !

•   long-running client bug (HBASE-4805).

•   Gateway APIs only as up to date as the people before you require.

•   REST API particularly unpleasant for “Web2.0” folk.



                                                                        34
Thanks!

                                         Nick Dimiduk
                                              github.com/ndimiduk
                                              @xefyr
Nick Dimiduk
Amandeep Khurana
                                              n10k.com
                      FOREWORD BY
                   Michael Stack




                   MANNING




hbaseinaction.com


                                                                    35

HBase Client APIs (for webapps?)

  • 1.
    HBase Client API (for webapps?) Nick Dimiduk Seattle Scalability Meetup 2013-03-27 1
  • 2.
  • 3.
  • 4.
    What are mychoices? switch (technology) { case ‘ ’: ... case ‘ ’: ... case ‘ ’: ... } 4
  • 5.
  • 6.
    Java client Interfaces • Configuration holds details where to find the cluster and tunable settings. Roughly equivalent to JDBC connection string. • HConnection represents connections to to the cluster. • HBaseAdmin handles DDL operations (create, list, drop, alter, &c.) • HTablePool connection pool for table handles. • HTable (HTableInterface) is a handle on a single HBase table. Send "commands" to the table (Put, Get, Scan, Delete, Increment) 6
  • 7.
    Java client Example publicstatic final byte[] TABLE_NAME = Bytes.toBytes("twits"); public static final byte[] TWITS_FAM = Bytes.toBytes("twits"); public static final byte[] USER_COL = Bytes.toBytes("user"); public static final byte[] TWIT_COL = Bytes.toBytes("twit"); private HTablePool pool = new HTablePool(); https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L23-L30 7
  • 8.
    Java client Example privatestatic class Twit { private Twit(Result r) { this( r.getColumnLatest(TWITS_FAM, USER_COL).getValue(), Arrays.copyOfRange(r.getRow(), Md5Utils.MD5_LENGTH, Md5Utils.MD5_LENGTH + longLength), r.getColumnLatest(TWITS_FAM, TWIT_COL).getValue()); } private Twit(byte[] user, byte[] dt, byte[] text) { this( Bytes.toString(user), new DateTime(-1 * Bytes.toLong(dt)), Bytes.toString(text)); } https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L129-L143 8
  • 9.
    Java client Example private static Get mkGet(String user, DateTime dt) { Get g = new Get(mkRowKey(user, dt)); g.addColumn(TWITS_FAM, USER_COL); g.addColumn(TWITS_FAM, TWIT_COL); return g; } https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L60-L65 9
  • 10.
    Ruby, Python clientInterface 10
  • 11.
    Ruby, Python clientInterface Jyth on JRu by, : '( 11
  • 12.
    Thrift client Interface 1.Generate bindings 2. Run a “Gateway” between clients and cluster 3. ... profit? code! w rite 12
  • 13.
    HBase Cluster HBaseClients Sidebar: Architecture Recap 13
  • 14.
    Thrift Gateway HBase Cluster Thrift Clients Thrift Architecture 14
  • 15.
    Thrift client Interface • Thrift gateway exposes a client to RegionServers • stateless :D • ... except for scanners :'( 15
  • 16.
    Thrift client Example transport= TSocket.TSocket(host, port) transport = TTransport.TBufferedTransport(transport) protocol = TBinaryProtocol.TBinaryProtocol(transport) client = Hbase.Client(protocol) transport.open() https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L17-L21 16
  • 17.
    Thrift client Example columns= ['info:user','info:name','info:email'] scanner = client.scannerOpen('users', '', columns) row = client.scannerGet(scanner) while row: yield user_from_row(row[0]) row = scannerGet(scanner) client.scannerClose(scanner) https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L33-L39 17
  • 18.
    Thrift client Example defuser_from_row(row): user = {} for col,cell in row.columns.items(): user[col[5:]] = cell.value return "<User: {user}, {name}, {email}>".format(**user) https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L26-L30 18
  • 19.
    REST client Interface 1.Stand up a "REST Gateway" between your application and the cluster 2. HTTP verbs translate (roughly) into table commands 3. decent support for basic DDL, HTable operations 19
  • 20.
    HBase Cluster REST REST Clients Gateway REST Architecture 20
  • 21.
    REST client Interface • REST gateway exposes a client to RegionServers • stateless :D • ... except for scanners :'( 21
  • 22.
    REST client Example $curl -H "Accept: application/json" http://host:port/ { "table": [ { "name": "followers" }, { "name": "twits" }, { "name": "users" } ] } 22
  • 23.
    REST client Example $curl -H ... http://host:port/table/row [/family:qualifier] { "Row": [ { "key": "VGhlUmVhbE1U", "Cell": [ { "$": "c2FtdWVsQGNsZW1lbnMub3Jn", "column": "aW5mbzplbWFpbA==", "timestamp": 1338701491422 }, { "$": "TWFyayBUd2Fpbg==", "column": "aW5mbzpuYW1l", "timestamp": 1338701491422 }, ] } ] } 23
  • 24.
    REST client Example <Rows> <Row key="VGhlUmVhbE1U"> <Cells> <Cell column="aW5mbzplbWFpbA==" timestamp="1338701491422"> c2FtdWVsQGNsZW1lbnMub3Jn </Cell> <Cell ...> ... </Cells> </Row> </Rows> 24
  • 25.
  • 26.
    asynchbase • Asynchronous non-blocking interface. • Inspired by Twisted Python. • Partial implementation of HTableInterface. • HBaseClient provides entry-point to data. https://github.com/OpenTSDB/asynchbase http://tsunanet.net/~tsuna/asynchbase/api/org/hbase/async/HBaseClient.html 26
  • 27.
    asynchbase UpdateResult object output to => [next state] 3 / Interpret input => [this state] response => [error state] Exception Boolean Put response UpdateFailed Exception 27
  • 28.
    asynchbase Example final Scanner scanner = client.newScanner(TABLE_NAME); scanner.setFamily(INFO_FAM); scanner.setQualifier(PASSWORD_COL); ArrayList<ArrayList<KeyValue>> rows = null; ArrayList<Deferred<Boolean>> workers = new ArrayList<Deferred<Boolean>>(); while ((rows = scanner.nextRows(1).joinUninterruptibly()) != null) { for (ArrayList<KeyValue> row : rows) { KeyValue kv = row.get(0); byte[] expected = kv.value(); String userId = new String(kv.key()); PutRequest put = new PutRequest( TABLE_NAME, kv.key(), kv.family(), kv.qualifier(), mkNewPassword(expected)); Deferred<Boolean> d = client.compareAndSet(put, expected) .addCallback(new InterpretResponse(userId)) .addCallbacks(new ResultToMessage(), new FailureToMessage()) .addCallback(new SendMessage()); workers.add(d); } } https://github.com/hbaseinaction/twitbase-async/blob/master/src/main/java/HBaseIA/TwitBase/AsyncUsersTool.java#L151-L173 28
  • 29.
    Others Reduce day-to-day Full-blown schema developer pain management [Orderly] Phoenix Spring-Data Hadoop Kiji.org https://github.com/ndimiduk/orderly http://www.springsource.org/spring-data/ https://github.com/forcedotcom/phoenix http://www.kiji.org/ 29
  • 30.
    Apache Futures • Protobuf wire messages (0.96) • C client (TBD, HBASE-1015) • HBase Types (TBD, HBASE-8089) 30
  • 31.
  • 32.
    Software Architecture • Isolate DAO from app logic, separation of concerns, &c. • Separate environment configs from code. • Watch out for resource contention. 32
  • 33.
    Deployment Architecture • Cache everywhere. • Know your component layers. 33
  • 34.
    HBase Warts • Know thy (HBase) version 0.{92,94,96} ! • long-running client bug (HBASE-4805). • Gateway APIs only as up to date as the people before you require. • REST API particularly unpleasant for “Web2.0” folk. 34
  • 35.
    Thanks! Nick Dimiduk github.com/ndimiduk @xefyr Nick Dimiduk Amandeep Khurana n10k.com FOREWORD BY Michael Stack MANNING hbaseinaction.com 35