HBaseCon, May 2012

HBase Coprocessors
Lars George | Solutions Architect
Revision History

Version      Revised By                                    Description of Revision
Version 1    Lars George                                   Initial version




2                     ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                     Reproduction or redistribution without written permission is
                                             prohibited.
Overview

•  Coprocessors were added to Bigtable
  –  Mentioned during LADIS 2009 talk
•  Runs user code within each region of a
   table
  –  Code split and moves with region
•  Defines high level call interface for clients
•  Calls addressed to rows or ranges of rows
•  Implicit automatic scaling, load balancing,
   and request routing
Examples Use-Cases

•  Bigtable uses Coprocessors
  –  Scalable metadata management
  –  Distributed language model for machine
     translation
  –  Distributed query processing for full-text index
  –  Regular expression search in code repository
•  MapReduce jobs over HBase are often map-
   only jobs
  –  Row keys are already sorted and distinct
  ➜ Could be replaced by Coprocessors
HBase Coprocessors
•  Inspired by Google’s Coprocessors
   –  Not much information available, but general idea is
      understood
•  Define various types of server-side code
   extensions
   –  Associated with table using a table property
   –  Attribute is a path to JAR file
   –  JAR is loaded when region is opened
   –  Blends new functionality with existing
•  Can be chained with Priorities and Load Order

➜ Allows for dynamic RPC extensions
Coprocessor Classes and Interfaces

•  The Coprocessor Interface
  –  All user code must inherit from this class
•  The CoprocessorEnvironment Interface
  –  Retains state across invocations
  –  Predefined classes
•  The CoprocessorHost Interface
  –  Ties state and user code together
  –  Predefined classes
Coprocessor Priority

•  System or User


/** Highest installation priority */
static final int PRIORITY_HIGHEST = 0;
/** High (system) installation priority */
static final int PRIORITY_SYSTEM = Integer.MAX_VALUE / 4;
/** Default installation prio for user coprocessors */
static final int PRIORITY_USER = Integer.MAX_VALUE / 2;
/** Lowest installation priority */
static final int PRIORITY_LOWEST = Integer.MAX_VALUE;
Coprocessor Environment

•  Available Methods
Coprocessor Host

•  Maintains all Coprocessor instances and
   their environments (state)
•  Concrete Classes
  –  MasterCoprocessorHost
  –  RegionCoprocessorHost
  –  WALCoprocessorHost
•  Subclasses provide access to specialized
   Environment implementations
Control Flow
Coprocessor Interface

•  Base for all other types of Coprocessors
•  start() and stop() methods for lifecycle
   management
•  State as defined in the interface:
Observer Classes

•  Comparable to database triggers
  –  Callback functions/hooks for every explicit API
     method, but also all important internal calls
•  Concrete Implementations
  –  MasterObserver
     •  Hooks into HMaster API
  –  RegionObserver
     •  Hooks into Region related operations
  –  WALObserver
     •  Hooks into write-ahead log operations
Region Observers

•  Can mediate (veto) actions
  –  Used by the security policy extensions
  –  Priority allows mediators to run first
•  Hooks into all CRUD+S API calls and more
  –  get(), put(), delete(), scan(), increment(),…
  –  checkAndPut(), checkAndDelete(),…
  –  flush(), compact(), split(),…
•  Pre/Post Hooks for every call
•  Can be used to build secondary indexes,
   filters
Endpoint Classes

•  Define a dynamic RPC protocol, used
   between client and region server
•  Executes arbitrary code, loaded in region
   server
  –  Future development will add code weaving/
     inspection to deny any malicious code
•  Steps to add your own methods
  –  Define and implement your own protocol
  –  Implement endpoint coprocessor
  –  Call HTable’s coprocessorExec() or
     coprocessorProxy()
Coprocessor Loading

•  There are two ways: dynamic or static
  –  Static: use configuration files and table schema
  –  Dynamic: not available (yet)
•  For static loading from configuration:
  –  Order is important (defines the execution order)
  –  Special property key for each host type
  –  Region related classes are loaded for all regions
     and tables
  –  Priority is always System
  –  JAR must be on class path
Loading from Configuration

•  Example:
  <property>!
    <name>hbase.coprocessor.region.classes</name> !
    <value>coprocessor.RegionObserverExample, !
      coprocessor.AnotherCoprocessor</value>!
  </property>

  <property> !
    <name>hbase.coprocessor.master.classes</name> !
    <value>coprocessor.MasterObserverExample</value>!
  </property>

  <property> !
    <name>hbase.coprocessor.wal.classes</name> !
    <value>coprocessor.WALObserverExample, !
      bar.foo.MyWALObserver</value> !
  </property> !
  !
Coprocessor Loading (cont.)

•  For static loading from table schema:
  –  Definition per table
  –  For all regions of the table
  –  Only region related classes, not WAL or Master
  –  Added to HTableDescriptor, when table is created
     or altered
  –  Allows to set the priority and JAR path
  COPROCESSOR$<num> ➜ !
      <path-to-jar>|<classname>|<priority> !
Loading from Table Schema

•  Example:

'COPROCESSOR$1' =>  !
  'hdfs://localhost:8020/users/leon/test.jar| !
   coprocessor.Test|10' !
!
'COPROCESSOR$2' =>  !
  '/Users/laura/test2.jar| !
   coprocessor.AnotherTest|1000' !
!
Example: Add Coprocessor
public static void main(String[] args) throws IOException { !
  Configuration conf = HBaseConfiguration.create(); !
  FileSystem fs = FileSystem.get(conf);

  Path path = new Path(fs.getUri() + Path.SEPARATOR +!
    "test.jar"); !
  HTableDescriptor htd = new HTableDescriptor("testtable");!
  htd.addFamily(new HColumnDescriptor("colfam1"));!
  htd.setValue("COPROCESSOR$1", path.toString() +!
    "|" + RegionObserverExample.class.getCanonicalName() +!
    "|" + Coprocessor.PRIORITY_USER); !
  HBaseAdmin admin = new HBaseAdmin(conf);!
  admin.createTable(htd); !
  System.out.println(admin.getTableDescriptor(!
    Bytes.toBytes("testtable"))); !
} !
Example Output
{NAME => 'testtable', COPROCESSOR$1 =>!
'file:/test.jar|coprocessor.RegionObserverExample|
1073741823', FAMILIES => [{NAME => 'colfam1',
BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
'false', BLOCKCACHE => 'true'}]} !
!
Region Observers

•  Handles all region related events
•  Hooks for two classes of operations:
  –  Lifecycle changes
  –  Client API Calls
•  All client API calls have a pre/post hook
  –  Can be used to grant access on preGet()
  –  Can be used to update secondary indexes on
     postPut()
Handling Region Lifecycle Events




•  Hook into pending open, open, and pending
   close state changes
•  Called implicitly by the framework
  –  preOpen(), postOpen(),…
•  Used to piggyback or fail the process, e.g.
  –  Cache warm up after a region opens
  –  Suppress region splitting, compactions, flushes
Region Environment
Special Hook Parameter
public interface RegionObserver extends Coprocessor {!
!
  /**!
   * Called before the region is reported as open to the master.!
   * @param c the environment provided by the region server!
   */!
  void preOpen(final!
    ObserverContext<RegionCoprocessorEnvironment> c);!
!
  /**!
   * Called after the region is reported as open to the master.!
   * @param c the environment provided by the region server!
   */!
  void postOpen(final !
    ObserverContext<RegionCoprocessorEnvironment> c);!
!
ObserverContext
Chain of Command

•  Especially the complete() and bypass()
   methods allow to change the processing
   chain
  –  complete() ends the chain at the current
     coprocessor
  –  bypass() completes the pre/post chain but
     uses the last value returned by the
     coprocessors, possibly not calling the actual
     API method (for pre-hooks)
Example: Pre-Hook Complete



@Override !
public void preSplit(ObserverContext!
       <RegionCoprocessorEnvironment> e) {!
   e.complete(); !
}!
Master Observer

•  Handles all HMaster related events
  –  DDL type calls, e.g. create table, add column
  –  Region management calls, e.g. move, assign
•  Pre/post hooks with Context
•  Specialized environment provided
Master Environment
Master Services (cont.)

•  Very powerful features
  –  Access the AssignmentManager to modify
     plans
  –  Access the MasterFileSystem to create or
     access resources on HDFS
  –  Access the ServerManager to get the list of
     known servers
  –  Use the ExecutorService to run system-wide
     background processes
•  Be careful (for now)!
Example: Master Post Hook
public class MasterObserverExample !
  extends BaseMasterObserver { !
  @Override public void postCreateTable( !
     ObserverContext<MasterCoprocessorEnvironment> env, !
     HRegionInfo[] regions, boolean sync) !
     throws IOException { !
     String tableName = !
       regions[0].getTableDesc().getNameAsString(); !
     MasterServices services =!
       env.getEnvironment().getMasterServices();!
     MasterFileSystem masterFileSystem =!
      services.getMasterFileSystem(); !
     FileSystem fileSystem = masterFileSystem.getFileSystem();!
     Path blobPath = new Path(tableName + "-blobs");!
     fileSystem.mkdirs(blobPath); !
  }!
} !
!
Example Output

 hbase(main):001:0> create
   'testtable', 'colfam1‘!
 0 row(s) in 0.4300 seconds !
 !
 $ bin/hadoop dfs -ls

   Found 1 items

   drwxr-xr-x - larsgeorge
   supergroup 0 ... /user/
   larsgeorge/testtable-blobs !
Endpoints

•  Dynamic RPC extends server-side
   functionality
  –  Useful for MapReduce like implementations
  –  Handles the Map part server-side, Reduce needs
     to be done client side
•  Based on CoprocessorProtocol interface
•  Routing to regions is based on either single
   row keys, or row key ranges
  –  Call is sent, no matter if row exists or not since
     region start and end keys are coarse grained
Custom Endpoint Implementation

•  Involves two steps:
  –  Extend the CoprocessorProtocol interface
     •  Defines the actual protocol
  –  Extend the BaseEndpointCoprocessor
     •  Provides the server-side code and the dynamic
        RPC method
Example: Row Count Protocol

public interface RowCountProtocol!
  extends CoprocessorProtocol {!
  long getRowCount() !
    throws IOException; !
  long getRowCount(Filter filter)!
    throws IOException; !
  long getKeyValueCount() !
    throws IOException; !
} !
!
Example: Endpoint for Row Count
public class RowCountEndpoint !
extends BaseEndpointCoprocessor !
implements RowCountProtocol { !
!
  private long getCount(Filter filter, !
    boolean countKeyValues) throws IOException {

  Scan scan = new Scan();!
    scan.setMaxVersions(1); !
    if (filter != null) { !
      scan.setFilter(filter); !
    } !
Example: Endpoint for Row Count
  RegionCoprocessorEnvironment environment = !
    (RegionCoprocessorEnvironment)!
    getEnvironment();!
  // use an internal scanner to perform!
  // scanning.!
  InternalScanner scanner =!
    environment.getRegion().getScanner(scan); !
  int result = 0;!
Example: Endpoint for Row Count
      try { !
        List<KeyValue> curVals = !
          new ArrayList<KeyValue>(); !
        boolean done = false;!
        do { !
          curVals.clear(); !
          done = scanner.next(curVals); !
          result += countKeyValues ? curVals.size() : 1; !
        } while (done); !
      } finally { !
        scanner.close(); !
      } !
      return result; !
    } !
!
Example: Endpoint for Row Count
        @Override!
        public long getRowCount() throws IOException {!
          return getRowCount(new FirstKeyOnlyFilter()); !
        } !
!
        @Override !
        public long getRowCount(Filter filter) throws IOException {!
         return getCount(filter, false); !
        } !
!
        @Override!
        public long getKeyValueCount() throws IOException {!
          return getCount(null, true); !
        } !
}

        !
    !
!
Endpoint Invocation

•  There are two ways to invoke the call
  –  By Proxy, using HTable.coprocessorProxy()
     •  Uses a delayed model, i.e. the call is send when the
        proxied method is invoked
  –  By Exec, using HTable.coprocessorExec()
     •  The call is send in parallel to all regions and the results
        are collected immediately
•  The Batch.Call class is used be
   coprocessorExec() to wrap the calls per
   region
•  The optional Batch.Callback can be used to
   react upon completion of the remote call
Exec vs. Proxy
Example: Invocation by Exec

public static void main(String[] args) throws IOException { !
  Configuration conf = HBaseConfiguration.create(); !
  HTable table = new HTable(conf, "testtable");!
  try { !
    Map<byte[], Long> results = !
       table.coprocessorExec(RowCountProtocol.class, null, null,!
       new Batch.Call<RowCountProtocol, Long>() { !
         @Override!
         public Long call(RowCountProtocol counter) !
         throws IOException { !
           return counter.getRowCount(); !
         } !
       }); !
     !
Example: Invocation by Exec
       long total = 0;!
       for (Map.Entry<byte[], Long> entry : !
            results.entrySet()) { !
         total += entry.getValue().longValue();!
         System.out.println("Region: " + !
           Bytes.toString(entry.getKey()) +!
           ", Count: " + entry.getValue()); !
    } !
    System.out.println("Total Count: " + total); !
  } catch (Throwable throwable) { !
      throwable.printStackTrace(); !
  } !
} !
Example Output

Region: testtable,,
  1303417572005.51f9e2251c...cbcb
  0c66858f., Count: 2 !
Region: testtable,row3,
  1303417572005.7f3df4dcba...dbc9
  9fce5d87., Count: 3 !
Total Count: 5 !
!
Batch Convenience

•  The Batch.forMethod() helps to quickly
   map a protocol function into a Batch.Call
•  Useful for single method calls to the
   servers
•  Uses the Java reflection API to retrieve the
   named method
•  Saves you from implementing the
   anonymous inline class
Batch Convenience

    Batch.Call call =!
      Batch.forMethod(!
        RowCountProtocol.class,!
        "getKeyValueCount"); !
    Map<byte[], Long> results =!
      table.coprocessorExec(!
        RowCountProtocol.class, !
        null, null, call); !
    !
Call Multiple Endpoints

•  Sometimes you need to call more than
   one endpoint in a single roundtrip call to
   the servers
•  This requires an anonymous inline class,
   since Batch.forMethod cannot handle this
Call Multiple Endpoints

   Map<byte[], Pair<Long, Long>> !
   results = table.coprocessorExec( !
     RowCountProtocol.class, null, null,!
     new Batch.Call<RowCountProtocol,!
       Pair<Long, Long>>() { !
       public Pair<Long, Long> call(!
          RowCountProtocol counter) !
       throws IOException {

          return new Pair(!
           counter.getRowCount(), !
           counter.getKeyValueCount()); !
       }!
     }); !
Example: Invocation by Proxy


   RowCountProtocol protocol =!
     table.coprocessorProxy(!
       RowCountProtocol.class,!
       Bytes.toBytes("row4")); !
   long rowsInRegion =!
     protocol.getRowCount(); !
     System.out.println(!
       "Region Row Count: " +!
       rowsInRegion); !
   !
50    ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
     Reproduction or redistribution without written permission is
                             prohibited.

HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on the Cluster - Cloudera

  • 1.
    HBaseCon, May 2012 HBaseCoprocessors Lars George | Solutions Architect
  • 2.
    Revision History Version Revised By Description of Revision Version 1 Lars George Initial version 2 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 3.
    Overview •  Coprocessors wereadded to Bigtable –  Mentioned during LADIS 2009 talk •  Runs user code within each region of a table –  Code split and moves with region •  Defines high level call interface for clients •  Calls addressed to rows or ranges of rows •  Implicit automatic scaling, load balancing, and request routing
  • 4.
    Examples Use-Cases •  Bigtableuses Coprocessors –  Scalable metadata management –  Distributed language model for machine translation –  Distributed query processing for full-text index –  Regular expression search in code repository •  MapReduce jobs over HBase are often map- only jobs –  Row keys are already sorted and distinct ➜ Could be replaced by Coprocessors
  • 5.
    HBase Coprocessors •  Inspiredby Google’s Coprocessors –  Not much information available, but general idea is understood •  Define various types of server-side code extensions –  Associated with table using a table property –  Attribute is a path to JAR file –  JAR is loaded when region is opened –  Blends new functionality with existing •  Can be chained with Priorities and Load Order ➜ Allows for dynamic RPC extensions
  • 6.
    Coprocessor Classes andInterfaces •  The Coprocessor Interface –  All user code must inherit from this class •  The CoprocessorEnvironment Interface –  Retains state across invocations –  Predefined classes •  The CoprocessorHost Interface –  Ties state and user code together –  Predefined classes
  • 7.
    Coprocessor Priority •  Systemor User /** Highest installation priority */ static final int PRIORITY_HIGHEST = 0; /** High (system) installation priority */ static final int PRIORITY_SYSTEM = Integer.MAX_VALUE / 4; /** Default installation prio for user coprocessors */ static final int PRIORITY_USER = Integer.MAX_VALUE / 2; /** Lowest installation priority */ static final int PRIORITY_LOWEST = Integer.MAX_VALUE;
  • 8.
  • 9.
    Coprocessor Host •  Maintainsall Coprocessor instances and their environments (state) •  Concrete Classes –  MasterCoprocessorHost –  RegionCoprocessorHost –  WALCoprocessorHost •  Subclasses provide access to specialized Environment implementations
  • 10.
  • 11.
    Coprocessor Interface •  Basefor all other types of Coprocessors •  start() and stop() methods for lifecycle management •  State as defined in the interface:
  • 12.
    Observer Classes •  Comparableto database triggers –  Callback functions/hooks for every explicit API method, but also all important internal calls •  Concrete Implementations –  MasterObserver •  Hooks into HMaster API –  RegionObserver •  Hooks into Region related operations –  WALObserver •  Hooks into write-ahead log operations
  • 13.
    Region Observers •  Canmediate (veto) actions –  Used by the security policy extensions –  Priority allows mediators to run first •  Hooks into all CRUD+S API calls and more –  get(), put(), delete(), scan(), increment(),… –  checkAndPut(), checkAndDelete(),… –  flush(), compact(), split(),… •  Pre/Post Hooks for every call •  Can be used to build secondary indexes, filters
  • 14.
    Endpoint Classes •  Definea dynamic RPC protocol, used between client and region server •  Executes arbitrary code, loaded in region server –  Future development will add code weaving/ inspection to deny any malicious code •  Steps to add your own methods –  Define and implement your own protocol –  Implement endpoint coprocessor –  Call HTable’s coprocessorExec() or coprocessorProxy()
  • 15.
    Coprocessor Loading •  Thereare two ways: dynamic or static –  Static: use configuration files and table schema –  Dynamic: not available (yet) •  For static loading from configuration: –  Order is important (defines the execution order) –  Special property key for each host type –  Region related classes are loaded for all regions and tables –  Priority is always System –  JAR must be on class path
  • 16.
    Loading from Configuration • Example: <property>! <name>hbase.coprocessor.region.classes</name> ! <value>coprocessor.RegionObserverExample, ! coprocessor.AnotherCoprocessor</value>! </property>
 <property> ! <name>hbase.coprocessor.master.classes</name> ! <value>coprocessor.MasterObserverExample</value>! </property>
 <property> ! <name>hbase.coprocessor.wal.classes</name> ! <value>coprocessor.WALObserverExample, ! bar.foo.MyWALObserver</value> ! </property> ! !
  • 17.
    Coprocessor Loading (cont.) • For static loading from table schema: –  Definition per table –  For all regions of the table –  Only region related classes, not WAL or Master –  Added to HTableDescriptor, when table is created or altered –  Allows to set the priority and JAR path COPROCESSOR$<num> ➜ ! <path-to-jar>|<classname>|<priority> !
  • 18.
    Loading from TableSchema •  Example: 'COPROCESSOR$1' => ! 'hdfs://localhost:8020/users/leon/test.jar| ! coprocessor.Test|10' ! ! 'COPROCESSOR$2' => ! '/Users/laura/test2.jar| ! coprocessor.AnotherTest|1000' ! !
  • 19.
    Example: Add Coprocessor publicstatic void main(String[] args) throws IOException { ! Configuration conf = HBaseConfiguration.create(); ! FileSystem fs = FileSystem.get(conf);
 Path path = new Path(fs.getUri() + Path.SEPARATOR +! "test.jar"); ! HTableDescriptor htd = new HTableDescriptor("testtable");! htd.addFamily(new HColumnDescriptor("colfam1"));! htd.setValue("COPROCESSOR$1", path.toString() +! "|" + RegionObserverExample.class.getCanonicalName() +! "|" + Coprocessor.PRIORITY_USER); ! HBaseAdmin admin = new HBaseAdmin(conf);! admin.createTable(htd); ! System.out.println(admin.getTableDescriptor(! Bytes.toBytes("testtable"))); ! } !
  • 20.
    Example Output {NAME =>'testtable', COPROCESSOR$1 =>! 'file:/test.jar|coprocessor.RegionObserverExample| 1073741823', FAMILIES => [{NAME => 'colfam1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} ! !
  • 21.
    Region Observers •  Handlesall region related events •  Hooks for two classes of operations: –  Lifecycle changes –  Client API Calls •  All client API calls have a pre/post hook –  Can be used to grant access on preGet() –  Can be used to update secondary indexes on postPut()
  • 22.
    Handling Region LifecycleEvents •  Hook into pending open, open, and pending close state changes •  Called implicitly by the framework –  preOpen(), postOpen(),… •  Used to piggyback or fail the process, e.g. –  Cache warm up after a region opens –  Suppress region splitting, compactions, flushes
  • 23.
  • 24.
    Special Hook Parameter publicinterface RegionObserver extends Coprocessor {! ! /**! * Called before the region is reported as open to the master.! * @param c the environment provided by the region server! */! void preOpen(final! ObserverContext<RegionCoprocessorEnvironment> c);! ! /**! * Called after the region is reported as open to the master.! * @param c the environment provided by the region server! */! void postOpen(final ! ObserverContext<RegionCoprocessorEnvironment> c);! !
  • 25.
  • 26.
    Chain of Command • Especially the complete() and bypass() methods allow to change the processing chain –  complete() ends the chain at the current coprocessor –  bypass() completes the pre/post chain but uses the last value returned by the coprocessors, possibly not calling the actual API method (for pre-hooks)
  • 27.
    Example: Pre-Hook Complete @Override! public void preSplit(ObserverContext! <RegionCoprocessorEnvironment> e) {! e.complete(); ! }!
  • 28.
    Master Observer •  Handlesall HMaster related events –  DDL type calls, e.g. create table, add column –  Region management calls, e.g. move, assign •  Pre/post hooks with Context •  Specialized environment provided
  • 29.
  • 30.
    Master Services (cont.) • Very powerful features –  Access the AssignmentManager to modify plans –  Access the MasterFileSystem to create or access resources on HDFS –  Access the ServerManager to get the list of known servers –  Use the ExecutorService to run system-wide background processes •  Be careful (for now)!
  • 31.
    Example: Master PostHook public class MasterObserverExample ! extends BaseMasterObserver { ! @Override public void postCreateTable( ! ObserverContext<MasterCoprocessorEnvironment> env, ! HRegionInfo[] regions, boolean sync) ! throws IOException { ! String tableName = ! regions[0].getTableDesc().getNameAsString(); ! MasterServices services =! env.getEnvironment().getMasterServices();! MasterFileSystem masterFileSystem =! services.getMasterFileSystem(); ! FileSystem fileSystem = masterFileSystem.getFileSystem();! Path blobPath = new Path(tableName + "-blobs");! fileSystem.mkdirs(blobPath); ! }! } ! !
  • 32.
    Example Output hbase(main):001:0>create 'testtable', 'colfam1‘! 0 row(s) in 0.4300 seconds ! ! $ bin/hadoop dfs -ls
 Found 1 items
 drwxr-xr-x - larsgeorge supergroup 0 ... /user/ larsgeorge/testtable-blobs !
  • 33.
    Endpoints •  Dynamic RPCextends server-side functionality –  Useful for MapReduce like implementations –  Handles the Map part server-side, Reduce needs to be done client side •  Based on CoprocessorProtocol interface •  Routing to regions is based on either single row keys, or row key ranges –  Call is sent, no matter if row exists or not since region start and end keys are coarse grained
  • 34.
    Custom Endpoint Implementation • Involves two steps: –  Extend the CoprocessorProtocol interface •  Defines the actual protocol –  Extend the BaseEndpointCoprocessor •  Provides the server-side code and the dynamic RPC method
  • 35.
    Example: Row CountProtocol public interface RowCountProtocol! extends CoprocessorProtocol {! long getRowCount() ! throws IOException; ! long getRowCount(Filter filter)! throws IOException; ! long getKeyValueCount() ! throws IOException; ! } ! !
  • 36.
    Example: Endpoint forRow Count public class RowCountEndpoint ! extends BaseEndpointCoprocessor ! implements RowCountProtocol { ! ! private long getCount(Filter filter, ! boolean countKeyValues) throws IOException {
 Scan scan = new Scan();! scan.setMaxVersions(1); ! if (filter != null) { ! scan.setFilter(filter); ! } !
  • 37.
    Example: Endpoint forRow Count RegionCoprocessorEnvironment environment = ! (RegionCoprocessorEnvironment)! getEnvironment();! // use an internal scanner to perform! // scanning.! InternalScanner scanner =! environment.getRegion().getScanner(scan); ! int result = 0;!
  • 38.
    Example: Endpoint forRow Count try { ! List<KeyValue> curVals = ! new ArrayList<KeyValue>(); ! boolean done = false;! do { ! curVals.clear(); ! done = scanner.next(curVals); ! result += countKeyValues ? curVals.size() : 1; ! } while (done); ! } finally { ! scanner.close(); ! } ! return result; ! } ! !
  • 39.
    Example: Endpoint forRow Count @Override! public long getRowCount() throws IOException {! return getRowCount(new FirstKeyOnlyFilter()); ! } ! ! @Override ! public long getRowCount(Filter filter) throws IOException {! return getCount(filter, false); ! } ! ! @Override! public long getKeyValueCount() throws IOException {! return getCount(null, true); ! } ! }
 ! ! !
  • 40.
    Endpoint Invocation •  Thereare two ways to invoke the call –  By Proxy, using HTable.coprocessorProxy() •  Uses a delayed model, i.e. the call is send when the proxied method is invoked –  By Exec, using HTable.coprocessorExec() •  The call is send in parallel to all regions and the results are collected immediately •  The Batch.Call class is used be coprocessorExec() to wrap the calls per region •  The optional Batch.Callback can be used to react upon completion of the remote call
  • 41.
  • 42.
    Example: Invocation byExec public static void main(String[] args) throws IOException { ! Configuration conf = HBaseConfiguration.create(); ! HTable table = new HTable(conf, "testtable");! try { ! Map<byte[], Long> results = ! table.coprocessorExec(RowCountProtocol.class, null, null,! new Batch.Call<RowCountProtocol, Long>() { ! @Override! public Long call(RowCountProtocol counter) ! throws IOException { ! return counter.getRowCount(); ! } ! }); ! !
  • 43.
    Example: Invocation byExec long total = 0;! for (Map.Entry<byte[], Long> entry : ! results.entrySet()) { ! total += entry.getValue().longValue();! System.out.println("Region: " + ! Bytes.toString(entry.getKey()) +! ", Count: " + entry.getValue()); ! } ! System.out.println("Total Count: " + total); ! } catch (Throwable throwable) { ! throwable.printStackTrace(); ! } ! } !
  • 44.
    Example Output Region: testtable,, 1303417572005.51f9e2251c...cbcb 0c66858f., Count: 2 ! Region: testtable,row3, 1303417572005.7f3df4dcba...dbc9 9fce5d87., Count: 3 ! Total Count: 5 ! !
  • 45.
    Batch Convenience •  TheBatch.forMethod() helps to quickly map a protocol function into a Batch.Call •  Useful for single method calls to the servers •  Uses the Java reflection API to retrieve the named method •  Saves you from implementing the anonymous inline class
  • 46.
    Batch Convenience Batch.Call call =! Batch.forMethod(! RowCountProtocol.class,! "getKeyValueCount"); ! Map<byte[], Long> results =! table.coprocessorExec(! RowCountProtocol.class, ! null, null, call); ! !
  • 47.
    Call Multiple Endpoints • Sometimes you need to call more than one endpoint in a single roundtrip call to the servers •  This requires an anonymous inline class, since Batch.forMethod cannot handle this
  • 48.
    Call Multiple Endpoints Map<byte[], Pair<Long, Long>> ! results = table.coprocessorExec( ! RowCountProtocol.class, null, null,! new Batch.Call<RowCountProtocol,! Pair<Long, Long>>() { ! public Pair<Long, Long> call(! RowCountProtocol counter) ! throws IOException {
 return new Pair(! counter.getRowCount(), ! counter.getKeyValueCount()); ! }! }); !
  • 49.
    Example: Invocation byProxy RowCountProtocol protocol =! table.coprocessorProxy(! RowCountProtocol.class,! Bytes.toBytes("row4")); ! long rowsInRegion =! protocol.getRowCount(); ! System.out.println(! "Region Row Count: " +! rowsInRegion); ! !
  • 50.
    50 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.