HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on the Cluster - Cloudera

Cloudera, Inc.
Cloudera, Inc.Cloudera, Inc.
HBaseCon, May 2012

HBase Coprocessors
Lars George | Solutions Architect
Revision History

Version      Revised By                                    Description of Revision
Version 1    Lars George                                   Initial version




2                     ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                     Reproduction or redistribution without written permission is
                                             prohibited.
Overview

•  Coprocessors were added to Bigtable
  –  Mentioned during LADIS 2009 talk
•  Runs user code within each region of a
   table
  –  Code split and moves with region
•  Defines high level call interface for clients
•  Calls addressed to rows or ranges of rows
•  Implicit automatic scaling, load balancing,
   and request routing
Examples Use-Cases

•  Bigtable uses Coprocessors
  –  Scalable metadata management
  –  Distributed language model for machine
     translation
  –  Distributed query processing for full-text index
  –  Regular expression search in code repository
•  MapReduce jobs over HBase are often map-
   only jobs
  –  Row keys are already sorted and distinct
  ➜ Could be replaced by Coprocessors
HBase Coprocessors
•  Inspired by Google’s Coprocessors
   –  Not much information available, but general idea is
      understood
•  Define various types of server-side code
   extensions
   –  Associated with table using a table property
   –  Attribute is a path to JAR file
   –  JAR is loaded when region is opened
   –  Blends new functionality with existing
•  Can be chained with Priorities and Load Order

➜ Allows for dynamic RPC extensions
Coprocessor Classes and Interfaces

•  The Coprocessor Interface
  –  All user code must inherit from this class
•  The CoprocessorEnvironment Interface
  –  Retains state across invocations
  –  Predefined classes
•  The CoprocessorHost Interface
  –  Ties state and user code together
  –  Predefined classes
Coprocessor Priority

•  System or User


/** Highest installation priority */
static final int PRIORITY_HIGHEST = 0;
/** High (system) installation priority */
static final int PRIORITY_SYSTEM = Integer.MAX_VALUE / 4;
/** Default installation prio for user coprocessors */
static final int PRIORITY_USER = Integer.MAX_VALUE / 2;
/** Lowest installation priority */
static final int PRIORITY_LOWEST = Integer.MAX_VALUE;
Coprocessor Environment

•  Available Methods
Coprocessor Host

•  Maintains all Coprocessor instances and
   their environments (state)
•  Concrete Classes
  –  MasterCoprocessorHost
  –  RegionCoprocessorHost
  –  WALCoprocessorHost
•  Subclasses provide access to specialized
   Environment implementations
Control Flow
Coprocessor Interface

•  Base for all other types of Coprocessors
•  start() and stop() methods for lifecycle
   management
•  State as defined in the interface:
Observer Classes

•  Comparable to database triggers
  –  Callback functions/hooks for every explicit API
     method, but also all important internal calls
•  Concrete Implementations
  –  MasterObserver
     •  Hooks into HMaster API
  –  RegionObserver
     •  Hooks into Region related operations
  –  WALObserver
     •  Hooks into write-ahead log operations
Region Observers

•  Can mediate (veto) actions
  –  Used by the security policy extensions
  –  Priority allows mediators to run first
•  Hooks into all CRUD+S API calls and more
  –  get(), put(), delete(), scan(), increment(),…
  –  checkAndPut(), checkAndDelete(),…
  –  flush(), compact(), split(),…
•  Pre/Post Hooks for every call
•  Can be used to build secondary indexes,
   filters
Endpoint Classes

•  Define a dynamic RPC protocol, used
   between client and region server
•  Executes arbitrary code, loaded in region
   server
  –  Future development will add code weaving/
     inspection to deny any malicious code
•  Steps to add your own methods
  –  Define and implement your own protocol
  –  Implement endpoint coprocessor
  –  Call HTable’s coprocessorExec() or
     coprocessorProxy()
Coprocessor Loading

•  There are two ways: dynamic or static
  –  Static: use configuration files and table schema
  –  Dynamic: not available (yet)
•  For static loading from configuration:
  –  Order is important (defines the execution order)
  –  Special property key for each host type
  –  Region related classes are loaded for all regions
     and tables
  –  Priority is always System
  –  JAR must be on class path
Loading from Configuration

•  Example:
  <property>!
    <name>hbase.coprocessor.region.classes</name> !
    <value>coprocessor.RegionObserverExample, !
      coprocessor.AnotherCoprocessor</value>!
  </property>

  <property> !
    <name>hbase.coprocessor.master.classes</name> !
    <value>coprocessor.MasterObserverExample</value>!
  </property>

  <property> !
    <name>hbase.coprocessor.wal.classes</name> !
    <value>coprocessor.WALObserverExample, !
      bar.foo.MyWALObserver</value> !
  </property> !
  !
Coprocessor Loading (cont.)

•  For static loading from table schema:
  –  Definition per table
  –  For all regions of the table
  –  Only region related classes, not WAL or Master
  –  Added to HTableDescriptor, when table is created
     or altered
  –  Allows to set the priority and JAR path
  COPROCESSOR$<num> ➜ !
      <path-to-jar>|<classname>|<priority> !
Loading from Table Schema

•  Example:

'COPROCESSOR$1' =>  !
  'hdfs://localhost:8020/users/leon/test.jar| !
   coprocessor.Test|10' !
!
'COPROCESSOR$2' =>  !
  '/Users/laura/test2.jar| !
   coprocessor.AnotherTest|1000' !
!
Example: Add Coprocessor
public static void main(String[] args) throws IOException { !
  Configuration conf = HBaseConfiguration.create(); !
  FileSystem fs = FileSystem.get(conf);

  Path path = new Path(fs.getUri() + Path.SEPARATOR +!
    "test.jar"); !
  HTableDescriptor htd = new HTableDescriptor("testtable");!
  htd.addFamily(new HColumnDescriptor("colfam1"));!
  htd.setValue("COPROCESSOR$1", path.toString() +!
    "|" + RegionObserverExample.class.getCanonicalName() +!
    "|" + Coprocessor.PRIORITY_USER); !
  HBaseAdmin admin = new HBaseAdmin(conf);!
  admin.createTable(htd); !
  System.out.println(admin.getTableDescriptor(!
    Bytes.toBytes("testtable"))); !
} !
Example Output
{NAME => 'testtable', COPROCESSOR$1 =>!
'file:/test.jar|coprocessor.RegionObserverExample|
1073741823', FAMILIES => [{NAME => 'colfam1',
BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
'false', BLOCKCACHE => 'true'}]} !
!
Region Observers

•  Handles all region related events
•  Hooks for two classes of operations:
  –  Lifecycle changes
  –  Client API Calls
•  All client API calls have a pre/post hook
  –  Can be used to grant access on preGet()
  –  Can be used to update secondary indexes on
     postPut()
Handling Region Lifecycle Events




•  Hook into pending open, open, and pending
   close state changes
•  Called implicitly by the framework
  –  preOpen(), postOpen(),…
•  Used to piggyback or fail the process, e.g.
  –  Cache warm up after a region opens
  –  Suppress region splitting, compactions, flushes
Region Environment
Special Hook Parameter
public interface RegionObserver extends Coprocessor {!
!
  /**!
   * Called before the region is reported as open to the master.!
   * @param c the environment provided by the region server!
   */!
  void preOpen(final!
    ObserverContext<RegionCoprocessorEnvironment> c);!
!
  /**!
   * Called after the region is reported as open to the master.!
   * @param c the environment provided by the region server!
   */!
  void postOpen(final !
    ObserverContext<RegionCoprocessorEnvironment> c);!
!
ObserverContext
Chain of Command

•  Especially the complete() and bypass()
   methods allow to change the processing
   chain
  –  complete() ends the chain at the current
     coprocessor
  –  bypass() completes the pre/post chain but
     uses the last value returned by the
     coprocessors, possibly not calling the actual
     API method (for pre-hooks)
Example: Pre-Hook Complete



@Override !
public void preSplit(ObserverContext!
       <RegionCoprocessorEnvironment> e) {!
   e.complete(); !
}!
Master Observer

•  Handles all HMaster related events
  –  DDL type calls, e.g. create table, add column
  –  Region management calls, e.g. move, assign
•  Pre/post hooks with Context
•  Specialized environment provided
Master Environment
Master Services (cont.)

•  Very powerful features
  –  Access the AssignmentManager to modify
     plans
  –  Access the MasterFileSystem to create or
     access resources on HDFS
  –  Access the ServerManager to get the list of
     known servers
  –  Use the ExecutorService to run system-wide
     background processes
•  Be careful (for now)!
Example: Master Post Hook
public class MasterObserverExample !
  extends BaseMasterObserver { !
  @Override public void postCreateTable( !
     ObserverContext<MasterCoprocessorEnvironment> env, !
     HRegionInfo[] regions, boolean sync) !
     throws IOException { !
     String tableName = !
       regions[0].getTableDesc().getNameAsString(); !
     MasterServices services =!
       env.getEnvironment().getMasterServices();!
     MasterFileSystem masterFileSystem =!
      services.getMasterFileSystem(); !
     FileSystem fileSystem = masterFileSystem.getFileSystem();!
     Path blobPath = new Path(tableName + "-blobs");!
     fileSystem.mkdirs(blobPath); !
  }!
} !
!
Example Output

 hbase(main):001:0> create
   'testtable', 'colfam1‘!
 0 row(s) in 0.4300 seconds !
 !
 $ bin/hadoop dfs -ls

   Found 1 items

   drwxr-xr-x - larsgeorge
   supergroup 0 ... /user/
   larsgeorge/testtable-blobs !
Endpoints

•  Dynamic RPC extends server-side
   functionality
  –  Useful for MapReduce like implementations
  –  Handles the Map part server-side, Reduce needs
     to be done client side
•  Based on CoprocessorProtocol interface
•  Routing to regions is based on either single
   row keys, or row key ranges
  –  Call is sent, no matter if row exists or not since
     region start and end keys are coarse grained
Custom Endpoint Implementation

•  Involves two steps:
  –  Extend the CoprocessorProtocol interface
     •  Defines the actual protocol
  –  Extend the BaseEndpointCoprocessor
     •  Provides the server-side code and the dynamic
        RPC method
Example: Row Count Protocol

public interface RowCountProtocol!
  extends CoprocessorProtocol {!
  long getRowCount() !
    throws IOException; !
  long getRowCount(Filter filter)!
    throws IOException; !
  long getKeyValueCount() !
    throws IOException; !
} !
!
Example: Endpoint for Row Count
public class RowCountEndpoint !
extends BaseEndpointCoprocessor !
implements RowCountProtocol { !
!
  private long getCount(Filter filter, !
    boolean countKeyValues) throws IOException {

  Scan scan = new Scan();!
    scan.setMaxVersions(1); !
    if (filter != null) { !
      scan.setFilter(filter); !
    } !
Example: Endpoint for Row Count
  RegionCoprocessorEnvironment environment = !
    (RegionCoprocessorEnvironment)!
    getEnvironment();!
  // use an internal scanner to perform!
  // scanning.!
  InternalScanner scanner =!
    environment.getRegion().getScanner(scan); !
  int result = 0;!
Example: Endpoint for Row Count
      try { !
        List<KeyValue> curVals = !
          new ArrayList<KeyValue>(); !
        boolean done = false;!
        do { !
          curVals.clear(); !
          done = scanner.next(curVals); !
          result += countKeyValues ? curVals.size() : 1; !
        } while (done); !
      } finally { !
        scanner.close(); !
      } !
      return result; !
    } !
!
Example: Endpoint for Row Count
        @Override!
        public long getRowCount() throws IOException {!
          return getRowCount(new FirstKeyOnlyFilter()); !
        } !
!
        @Override !
        public long getRowCount(Filter filter) throws IOException {!
         return getCount(filter, false); !
        } !
!
        @Override!
        public long getKeyValueCount() throws IOException {!
          return getCount(null, true); !
        } !
}

        !
    !
!
Endpoint Invocation

•  There are two ways to invoke the call
  –  By Proxy, using HTable.coprocessorProxy()
     •  Uses a delayed model, i.e. the call is send when the
        proxied method is invoked
  –  By Exec, using HTable.coprocessorExec()
     •  The call is send in parallel to all regions and the results
        are collected immediately
•  The Batch.Call class is used be
   coprocessorExec() to wrap the calls per
   region
•  The optional Batch.Callback can be used to
   react upon completion of the remote call
Exec vs. Proxy
Example: Invocation by Exec

public static void main(String[] args) throws IOException { !
  Configuration conf = HBaseConfiguration.create(); !
  HTable table = new HTable(conf, "testtable");!
  try { !
    Map<byte[], Long> results = !
       table.coprocessorExec(RowCountProtocol.class, null, null,!
       new Batch.Call<RowCountProtocol, Long>() { !
         @Override!
         public Long call(RowCountProtocol counter) !
         throws IOException { !
           return counter.getRowCount(); !
         } !
       }); !
     !
Example: Invocation by Exec
       long total = 0;!
       for (Map.Entry<byte[], Long> entry : !
            results.entrySet()) { !
         total += entry.getValue().longValue();!
         System.out.println("Region: " + !
           Bytes.toString(entry.getKey()) +!
           ", Count: " + entry.getValue()); !
    } !
    System.out.println("Total Count: " + total); !
  } catch (Throwable throwable) { !
      throwable.printStackTrace(); !
  } !
} !
Example Output

Region: testtable,,
  1303417572005.51f9e2251c...cbcb
  0c66858f., Count: 2 !
Region: testtable,row3,
  1303417572005.7f3df4dcba...dbc9
  9fce5d87., Count: 3 !
Total Count: 5 !
!
Batch Convenience

•  The Batch.forMethod() helps to quickly
   map a protocol function into a Batch.Call
•  Useful for single method calls to the
   servers
•  Uses the Java reflection API to retrieve the
   named method
•  Saves you from implementing the
   anonymous inline class
Batch Convenience

    Batch.Call call =!
      Batch.forMethod(!
        RowCountProtocol.class,!
        "getKeyValueCount"); !
    Map<byte[], Long> results =!
      table.coprocessorExec(!
        RowCountProtocol.class, !
        null, null, call); !
    !
Call Multiple Endpoints

•  Sometimes you need to call more than
   one endpoint in a single roundtrip call to
   the servers
•  This requires an anonymous inline class,
   since Batch.forMethod cannot handle this
Call Multiple Endpoints

   Map<byte[], Pair<Long, Long>> !
   results = table.coprocessorExec( !
     RowCountProtocol.class, null, null,!
     new Batch.Call<RowCountProtocol,!
       Pair<Long, Long>>() { !
       public Pair<Long, Long> call(!
          RowCountProtocol counter) !
       throws IOException {

          return new Pair(!
           counter.getRowCount(), !
           counter.getKeyValueCount()); !
       }!
     }); !
Example: Invocation by Proxy


   RowCountProtocol protocol =!
     table.coprocessorProxy(!
       RowCountProtocol.class,!
       Bytes.toBytes("row4")); !
   long rowsInRegion =!
     protocol.getRowCount(); !
     System.out.println(!
       "Region Row Count: " +!
       rowsInRegion); !
   !
50    ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
     Reproduction or redistribution without written permission is
                             prohibited.
1 of 50

Recommended

Meet HBase 1.0 by
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
8.2K views48 slides
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera by
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaHBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
5.5K views30 slides
HBaseCon 2015: HBase Performance Tuning @ Salesforce by
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
6.1K views54 slides
HBaseCon 2015: HBase Operations at Xiaomi by
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon
4.5K views35 slides
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment by
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
4K views31 slides
Cross-Site BigTable using HBase by
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBaseHBaseCon
3.5K views41 slides

More Related Content

What's hot

HBaseCon 2015: Elastic HBase on Mesos by
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon
3.1K views47 slides
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More by
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreHBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreCloudera, Inc.
5K views35 slides
HBase: Where Online Meets Low Latency by
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBaseCon
4.7K views42 slides
HBase 0.20.0 Performance Evaluation by
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationSchubert Zhang
3.8K views7 slides
Tales from the Cloudera Field by
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
4K views38 slides
HBase and HDFS: Understanding FileSystem Usage in HBase by
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
74K views33 slides

What's hot(20)

HBaseCon 2015: Elastic HBase on Mesos by HBaseCon
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon3.1K views
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More by Cloudera, Inc.
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreHBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
Cloudera, Inc.5K views
HBase: Where Online Meets Low Latency by HBaseCon
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
HBaseCon4.7K views
HBase 0.20.0 Performance Evaluation by Schubert Zhang
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
Schubert Zhang3.8K views
Tales from the Cloudera Field by HBaseCon
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
HBaseCon4K views
HBase and HDFS: Understanding FileSystem Usage in HBase by enissoz
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz74K views
The State of HBase Replication by HBaseCon
The State of HBase ReplicationThe State of HBase Replication
The State of HBase Replication
HBaseCon9.1K views
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket by Cloudera, Inc.
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
Cloudera, Inc.3.6K views
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce by Cloudera, Inc.
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
Cloudera, Inc.9.5K views
Rigorous and Multi-tenant HBase Performance Measurement by DataWorks Summit
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
DataWorks Summit3.6K views
HBaseCon 2012 | HBase, the Use Case in eBay Cassini by Cloudera, Inc.
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Cloudera, Inc.6.1K views
HBase: Extreme Makeover by HBaseCon
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
HBaseCon3.3K views
Off-heaping the Apache HBase Read Path by HBaseCon
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
HBaseCon4.2K views
Apache HBase Performance Tuning by Lars Hofhansl
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl39.6K views
Meet hbase 2.0 by enissoz
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
enissoz5.3K views
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc... by Cloudera, Inc.
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
Cloudera, Inc.9.3K views
Apache HBase, Accelerated: In-Memory Flush and Compaction by HBaseCon
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
HBaseCon2.5K views
hbaseconasia2017: HBase在Hulu的使用和实践 by HBaseCon
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon878 views
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase by Cloudera, Inc.
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Cloudera, Inc.3.2K views

Viewers also liked

HBaseCon 2013: A Developer’s Guide to Coprocessors by
HBaseCon 2013: A Developer’s Guide to CoprocessorsHBaseCon 2013: A Developer’s Guide to Coprocessors
HBaseCon 2013: A Developer’s Guide to CoprocessorsCloudera, Inc.
8K views17 slides
HBase, crazy dances on the elephant back. by
HBase, crazy dances on the elephant back.HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.Roman Nikitchenko
1K views35 slides
HBase Coprocessor Introduction by
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor IntroductionSchubert Zhang
12K views20 slides
Hindex: Secondary indexes for faster HBase queries by
Hindex: Secondary indexes for faster HBase queriesHindex: Secondary indexes for faster HBase queries
Hindex: Secondary indexes for faster HBase queriesRajeshbabu Chintaguntla
6.9K views80 slides
HBase Secondary Indexing by
HBase Secondary Indexing HBase Secondary Indexing
HBase Secondary Indexing Gino McCarty
2.9K views16 slides
eHarmony @ Hbase Conference 2016 by vijay vangapandu. by
eHarmony @ Hbase Conference 2016 by vijay vangapandu.eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.Vijaykumar Vangapandu
698 views38 slides

Viewers also liked(20)

HBaseCon 2013: A Developer’s Guide to Coprocessors by Cloudera, Inc.
HBaseCon 2013: A Developer’s Guide to CoprocessorsHBaseCon 2013: A Developer’s Guide to Coprocessors
HBaseCon 2013: A Developer’s Guide to Coprocessors
Cloudera, Inc.8K views
HBase Coprocessor Introduction by Schubert Zhang
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
Schubert Zhang12K views
HBase Secondary Indexing by Gino McCarty
HBase Secondary Indexing HBase Secondary Indexing
HBase Secondary Indexing
Gino McCarty2.9K views
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase by HBaseCon
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon3.3K views
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data... by Cloudera, Inc.
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
Cloudera, Inc.3.5K views
HBaseCon 2013: 1500 JIRAs in 20 Minutes by Cloudera, Inc.
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
Cloudera, Inc.4.1K views
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase. by Cloudera, Inc.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
Cloudera, Inc.7.1K views
HBaseCon 2012 | Scaling GIS In Three Acts by Cloudera, Inc.
HBaseCon 2012 | Scaling GIS In Three ActsHBaseCon 2012 | Scaling GIS In Three Acts
HBaseCon 2012 | Scaling GIS In Three Acts
Cloudera, Inc.3.6K views
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon by Cloudera, Inc.
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
Cloudera, Inc.3.4K views
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb... by Cloudera, Inc.
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
Cloudera, Inc.3.2K views
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics by Cloudera, Inc.
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
Cloudera, Inc.4.8K views
HBaseCon 2013: Being Smarter Than the Smart Meter by Cloudera, Inc.
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
Cloudera, Inc.4.3K views
HBaseCon 2013: Apache HBase on Flash by Cloudera, Inc.
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
Cloudera, Inc.4.3K views
HBase Read High Availability Using Timeline-Consistent Region Replicas by HBaseCon
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon4.1K views
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second... by Cloudera, Inc.
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
Cloudera, Inc.4.2K views
HBaseCon 2012 | Building Mobile Infrastructure with HBase by Cloudera, Inc.
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
Cloudera, Inc.2.6K views
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC by Cloudera, Inc.
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
Cloudera, Inc.3.9K views

Similar to HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on the Cluster - Cloudera

Nov. 4, 2011 o reilly webcast-hbase- lars george by
Nov. 4, 2011 o reilly webcast-hbase- lars georgeNov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars georgeO'Reilly Media
1.9K views51 slides
Infrastructure modeling with chef by
Infrastructure modeling with chefInfrastructure modeling with chef
Infrastructure modeling with chefCharles Johnson
805 views33 slides
H base introduction & development by
H base introduction & developmentH base introduction & development
H base introduction & developmentShashwat Shriparv
1.2K views24 slides
Hadoop 20111117 by
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
1.3K views42 slides
Postgres Vienna DB Meetup 2014 by
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
1.2K views41 slides
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions by
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsHBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsMichael Stack
543 views22 slides

Similar to HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on the Cluster - Cloudera(20)

Nov. 4, 2011 o reilly webcast-hbase- lars george by O'Reilly Media
Nov. 4, 2011 o reilly webcast-hbase- lars georgeNov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars george
O'Reilly Media1.9K views
Infrastructure modeling with chef by Charles Johnson
Infrastructure modeling with chefInfrastructure modeling with chef
Infrastructure modeling with chef
Charles Johnson805 views
Hadoop 20111117 by exsuns
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
exsuns1.3K views
Postgres Vienna DB Meetup 2014 by Michael Renner
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
Michael Renner1.2K views
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions by Michael Stack
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsHBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
Michael Stack543 views
(ATS4-PLAT01) Core Architecture Changes in AEP 9.0 and their Impact on Admini... by BIOVIA
(ATS4-PLAT01) Core Architecture Changes in AEP 9.0 and their Impact on Admini...(ATS4-PLAT01) Core Architecture Changes in AEP 9.0 and their Impact on Admini...
(ATS4-PLAT01) Core Architecture Changes in AEP 9.0 and their Impact on Admini...
BIOVIA415 views
Beyond 'Set it and Forget it': Proactively managing your EZproxy server by NASIG
Beyond 'Set it and Forget it': Proactively managing your EZproxy serverBeyond 'Set it and Forget it': Proactively managing your EZproxy server
Beyond 'Set it and Forget it': Proactively managing your EZproxy server
NASIG2.2K views
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016 by Esther Kundin
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016
Esther Kundin793 views
Meet HBase 2.0 by enissoz
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0
enissoz675 views
Hortonworks HBase Meetup Presentation by Hortonworks
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup Presentation
Hortonworks1.4K views
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day by Phil Estes
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Phil Estes602 views
Introduction to Business Processes 3.7 by StephenKardian
Introduction to Business Processes 3.7Introduction to Business Processes 3.7
Introduction to Business Processes 3.7
StephenKardian304 views
HBase Coprocessors @ HUG NYC by mlai
HBase Coprocessors @ HUG NYCHBase Coprocessors @ HUG NYC
HBase Coprocessors @ HUG NYC
mlai1.2K views

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx by
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
107 views55 slides
Cloudera Data Impact Awards 2021 - Finalists by
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
6.4K views34 slides
2020 Cloudera Data Impact Awards Finalists by
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
6.3K views43 slides
Edc event vienna presentation 1 oct 2019 by
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
4.5K views67 slides
Machine Learning with Limited Labeled Data 4/3/19 by
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
3.6K views36 slides
Data Driven With the Cloudera Modern Data Warehouse 3.19.19 by
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
2.5K views21 slides

More from Cloudera, Inc.(20)

Partner Briefing_January 25 (FINAL).pptx by Cloudera, Inc.
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.107 views
Cloudera Data Impact Awards 2021 - Finalists by Cloudera, Inc.
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.6.4K views
2020 Cloudera Data Impact Awards Finalists by Cloudera, Inc.
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.6.3K views
Edc event vienna presentation 1 oct 2019 by Cloudera, Inc.
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.4.5K views
Machine Learning with Limited Labeled Data 4/3/19 by Cloudera, Inc.
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.3.6K views
Data Driven With the Cloudera Modern Data Warehouse 3.19.19 by Cloudera, Inc.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.2.5K views
Introducing Cloudera DataFlow (CDF) 2.13.19 by Cloudera, Inc.
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.4.9K views
Introducing Cloudera Data Science Workbench for HDP 2.12.19 by Cloudera, Inc.
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.2.7K views
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19 by Cloudera, Inc.
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.1.6K views
Leveraging the cloud for analytics and machine learning 1.29.19 by Cloudera, Inc.
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.1.6K views
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19 by Cloudera, Inc.
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.2.5K views
Leveraging the Cloud for Big Data Analytics 12.11.18 by Cloudera, Inc.
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.1.7K views
Modern Data Warehouse Fundamentals Part 3 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.1.3K views
Modern Data Warehouse Fundamentals Part 2 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.2.3K views
Modern Data Warehouse Fundamentals Part 1 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.1.5K views
Extending Cloudera SDX beyond the Platform by Cloudera, Inc.
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.966 views
Federated Learning: ML with Privacy on the Edge 11.15.18 by Cloudera, Inc.
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.2.2K views
Analyst Webinar: Doing a 180 on Customer 360 by Cloudera, Inc.
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.1.4K views
Build a modern platform for anti-money laundering 9.19.18 by Cloudera, Inc.
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.1K views
Introducing the data science sandbox as a service 8.30.18 by Cloudera, Inc.
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.1.2K views

Recently uploaded

RADIUS-Omnichannel Interaction System by
RADIUS-Omnichannel Interaction SystemRADIUS-Omnichannel Interaction System
RADIUS-Omnichannel Interaction SystemRADIUS
15 views21 slides
Business Analyst Series 2023 - Week 3 Session 5 by
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5DianaGray10
209 views20 slides
Voice Logger - Telephony Integration Solution at Aegis by
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at AegisNirmal Sharma
17 views1 slide
20231123_Camunda Meetup Vienna.pdf by
20231123_Camunda Meetup Vienna.pdf20231123_Camunda Meetup Vienna.pdf
20231123_Camunda Meetup Vienna.pdfPhactum Softwareentwicklung GmbH
28 views73 slides
DALI Basics Course 2023 by
DALI Basics Course  2023DALI Basics Course  2023
DALI Basics Course 2023Ivory Egg
14 views12 slides
PharoJS - Zürich Smalltalk Group Meetup November 2023 by
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023Noury Bouraqadi
120 views17 slides

Recently uploaded(20)

RADIUS-Omnichannel Interaction System by RADIUS
RADIUS-Omnichannel Interaction SystemRADIUS-Omnichannel Interaction System
RADIUS-Omnichannel Interaction System
RADIUS15 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10209 views
Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma17 views
DALI Basics Course 2023 by Ivory Egg
DALI Basics Course  2023DALI Basics Course  2023
DALI Basics Course 2023
Ivory Egg14 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi120 views
Understanding GenAI/LLM and What is Google Offering - Felix Goh by NUS-ISS
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
NUS-ISS41 views
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada121 views
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab15 views
.conf Go 2023 - Data analysis as a routine by Splunk
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
Splunk93 views
Future of Learning - Khoong Chan Meng by NUS-ISS
Future of Learning - Khoong Chan MengFuture of Learning - Khoong Chan Meng
Future of Learning - Khoong Chan Meng
NUS-ISS33 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2216 views
handbook for web 3 adoption.pdf by Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex19 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
Future of Learning - Yap Aye Wee.pdf by NUS-ISS
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdf
NUS-ISS41 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb12 views
Transcript: The Details of Description Techniques tips and tangents on altern... by BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada130 views
The Importance of Cybersecurity for Digital Transformation by NUS-ISS
The Importance of Cybersecurity for Digital TransformationThe Importance of Cybersecurity for Digital Transformation
The Importance of Cybersecurity for Digital Transformation
NUS-ISS27 views

HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on the Cluster - Cloudera

  • 1. HBaseCon, May 2012 HBase Coprocessors Lars George | Solutions Architect
  • 2. Revision History Version Revised By Description of Revision Version 1 Lars George Initial version 2 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 3. Overview •  Coprocessors were added to Bigtable –  Mentioned during LADIS 2009 talk •  Runs user code within each region of a table –  Code split and moves with region •  Defines high level call interface for clients •  Calls addressed to rows or ranges of rows •  Implicit automatic scaling, load balancing, and request routing
  • 4. Examples Use-Cases •  Bigtable uses Coprocessors –  Scalable metadata management –  Distributed language model for machine translation –  Distributed query processing for full-text index –  Regular expression search in code repository •  MapReduce jobs over HBase are often map- only jobs –  Row keys are already sorted and distinct ➜ Could be replaced by Coprocessors
  • 5. HBase Coprocessors •  Inspired by Google’s Coprocessors –  Not much information available, but general idea is understood •  Define various types of server-side code extensions –  Associated with table using a table property –  Attribute is a path to JAR file –  JAR is loaded when region is opened –  Blends new functionality with existing •  Can be chained with Priorities and Load Order ➜ Allows for dynamic RPC extensions
  • 6. Coprocessor Classes and Interfaces •  The Coprocessor Interface –  All user code must inherit from this class •  The CoprocessorEnvironment Interface –  Retains state across invocations –  Predefined classes •  The CoprocessorHost Interface –  Ties state and user code together –  Predefined classes
  • 7. Coprocessor Priority •  System or User /** Highest installation priority */ static final int PRIORITY_HIGHEST = 0; /** High (system) installation priority */ static final int PRIORITY_SYSTEM = Integer.MAX_VALUE / 4; /** Default installation prio for user coprocessors */ static final int PRIORITY_USER = Integer.MAX_VALUE / 2; /** Lowest installation priority */ static final int PRIORITY_LOWEST = Integer.MAX_VALUE;
  • 9. Coprocessor Host •  Maintains all Coprocessor instances and their environments (state) •  Concrete Classes –  MasterCoprocessorHost –  RegionCoprocessorHost –  WALCoprocessorHost •  Subclasses provide access to specialized Environment implementations
  • 11. Coprocessor Interface •  Base for all other types of Coprocessors •  start() and stop() methods for lifecycle management •  State as defined in the interface:
  • 12. Observer Classes •  Comparable to database triggers –  Callback functions/hooks for every explicit API method, but also all important internal calls •  Concrete Implementations –  MasterObserver •  Hooks into HMaster API –  RegionObserver •  Hooks into Region related operations –  WALObserver •  Hooks into write-ahead log operations
  • 13. Region Observers •  Can mediate (veto) actions –  Used by the security policy extensions –  Priority allows mediators to run first •  Hooks into all CRUD+S API calls and more –  get(), put(), delete(), scan(), increment(),… –  checkAndPut(), checkAndDelete(),… –  flush(), compact(), split(),… •  Pre/Post Hooks for every call •  Can be used to build secondary indexes, filters
  • 14. Endpoint Classes •  Define a dynamic RPC protocol, used between client and region server •  Executes arbitrary code, loaded in region server –  Future development will add code weaving/ inspection to deny any malicious code •  Steps to add your own methods –  Define and implement your own protocol –  Implement endpoint coprocessor –  Call HTable’s coprocessorExec() or coprocessorProxy()
  • 15. Coprocessor Loading •  There are two ways: dynamic or static –  Static: use configuration files and table schema –  Dynamic: not available (yet) •  For static loading from configuration: –  Order is important (defines the execution order) –  Special property key for each host type –  Region related classes are loaded for all regions and tables –  Priority is always System –  JAR must be on class path
  • 16. Loading from Configuration •  Example: <property>! <name>hbase.coprocessor.region.classes</name> ! <value>coprocessor.RegionObserverExample, ! coprocessor.AnotherCoprocessor</value>! </property>
 <property> ! <name>hbase.coprocessor.master.classes</name> ! <value>coprocessor.MasterObserverExample</value>! </property>
 <property> ! <name>hbase.coprocessor.wal.classes</name> ! <value>coprocessor.WALObserverExample, ! bar.foo.MyWALObserver</value> ! </property> ! !
  • 17. Coprocessor Loading (cont.) •  For static loading from table schema: –  Definition per table –  For all regions of the table –  Only region related classes, not WAL or Master –  Added to HTableDescriptor, when table is created or altered –  Allows to set the priority and JAR path COPROCESSOR$<num> ➜ ! <path-to-jar>|<classname>|<priority> !
  • 18. Loading from Table Schema •  Example: 'COPROCESSOR$1' => ! 'hdfs://localhost:8020/users/leon/test.jar| ! coprocessor.Test|10' ! ! 'COPROCESSOR$2' => ! '/Users/laura/test2.jar| ! coprocessor.AnotherTest|1000' ! !
  • 19. Example: Add Coprocessor public static void main(String[] args) throws IOException { ! Configuration conf = HBaseConfiguration.create(); ! FileSystem fs = FileSystem.get(conf);
 Path path = new Path(fs.getUri() + Path.SEPARATOR +! "test.jar"); ! HTableDescriptor htd = new HTableDescriptor("testtable");! htd.addFamily(new HColumnDescriptor("colfam1"));! htd.setValue("COPROCESSOR$1", path.toString() +! "|" + RegionObserverExample.class.getCanonicalName() +! "|" + Coprocessor.PRIORITY_USER); ! HBaseAdmin admin = new HBaseAdmin(conf);! admin.createTable(htd); ! System.out.println(admin.getTableDescriptor(! Bytes.toBytes("testtable"))); ! } !
  • 20. Example Output {NAME => 'testtable', COPROCESSOR$1 =>! 'file:/test.jar|coprocessor.RegionObserverExample| 1073741823', FAMILIES => [{NAME => 'colfam1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} ! !
  • 21. Region Observers •  Handles all region related events •  Hooks for two classes of operations: –  Lifecycle changes –  Client API Calls •  All client API calls have a pre/post hook –  Can be used to grant access on preGet() –  Can be used to update secondary indexes on postPut()
  • 22. Handling Region Lifecycle Events •  Hook into pending open, open, and pending close state changes •  Called implicitly by the framework –  preOpen(), postOpen(),… •  Used to piggyback or fail the process, e.g. –  Cache warm up after a region opens –  Suppress region splitting, compactions, flushes
  • 24. Special Hook Parameter public interface RegionObserver extends Coprocessor {! ! /**! * Called before the region is reported as open to the master.! * @param c the environment provided by the region server! */! void preOpen(final! ObserverContext<RegionCoprocessorEnvironment> c);! ! /**! * Called after the region is reported as open to the master.! * @param c the environment provided by the region server! */! void postOpen(final ! ObserverContext<RegionCoprocessorEnvironment> c);! !
  • 26. Chain of Command •  Especially the complete() and bypass() methods allow to change the processing chain –  complete() ends the chain at the current coprocessor –  bypass() completes the pre/post chain but uses the last value returned by the coprocessors, possibly not calling the actual API method (for pre-hooks)
  • 27. Example: Pre-Hook Complete @Override ! public void preSplit(ObserverContext! <RegionCoprocessorEnvironment> e) {! e.complete(); ! }!
  • 28. Master Observer •  Handles all HMaster related events –  DDL type calls, e.g. create table, add column –  Region management calls, e.g. move, assign •  Pre/post hooks with Context •  Specialized environment provided
  • 30. Master Services (cont.) •  Very powerful features –  Access the AssignmentManager to modify plans –  Access the MasterFileSystem to create or access resources on HDFS –  Access the ServerManager to get the list of known servers –  Use the ExecutorService to run system-wide background processes •  Be careful (for now)!
  • 31. Example: Master Post Hook public class MasterObserverExample ! extends BaseMasterObserver { ! @Override public void postCreateTable( ! ObserverContext<MasterCoprocessorEnvironment> env, ! HRegionInfo[] regions, boolean sync) ! throws IOException { ! String tableName = ! regions[0].getTableDesc().getNameAsString(); ! MasterServices services =! env.getEnvironment().getMasterServices();! MasterFileSystem masterFileSystem =! services.getMasterFileSystem(); ! FileSystem fileSystem = masterFileSystem.getFileSystem();! Path blobPath = new Path(tableName + "-blobs");! fileSystem.mkdirs(blobPath); ! }! } ! !
  • 32. Example Output hbase(main):001:0> create 'testtable', 'colfam1‘! 0 row(s) in 0.4300 seconds ! ! $ bin/hadoop dfs -ls
 Found 1 items
 drwxr-xr-x - larsgeorge supergroup 0 ... /user/ larsgeorge/testtable-blobs !
  • 33. Endpoints •  Dynamic RPC extends server-side functionality –  Useful for MapReduce like implementations –  Handles the Map part server-side, Reduce needs to be done client side •  Based on CoprocessorProtocol interface •  Routing to regions is based on either single row keys, or row key ranges –  Call is sent, no matter if row exists or not since region start and end keys are coarse grained
  • 34. Custom Endpoint Implementation •  Involves two steps: –  Extend the CoprocessorProtocol interface •  Defines the actual protocol –  Extend the BaseEndpointCoprocessor •  Provides the server-side code and the dynamic RPC method
  • 35. Example: Row Count Protocol public interface RowCountProtocol! extends CoprocessorProtocol {! long getRowCount() ! throws IOException; ! long getRowCount(Filter filter)! throws IOException; ! long getKeyValueCount() ! throws IOException; ! } ! !
  • 36. Example: Endpoint for Row Count public class RowCountEndpoint ! extends BaseEndpointCoprocessor ! implements RowCountProtocol { ! ! private long getCount(Filter filter, ! boolean countKeyValues) throws IOException {
 Scan scan = new Scan();! scan.setMaxVersions(1); ! if (filter != null) { ! scan.setFilter(filter); ! } !
  • 37. Example: Endpoint for Row Count RegionCoprocessorEnvironment environment = ! (RegionCoprocessorEnvironment)! getEnvironment();! // use an internal scanner to perform! // scanning.! InternalScanner scanner =! environment.getRegion().getScanner(scan); ! int result = 0;!
  • 38. Example: Endpoint for Row Count try { ! List<KeyValue> curVals = ! new ArrayList<KeyValue>(); ! boolean done = false;! do { ! curVals.clear(); ! done = scanner.next(curVals); ! result += countKeyValues ? curVals.size() : 1; ! } while (done); ! } finally { ! scanner.close(); ! } ! return result; ! } ! !
  • 39. Example: Endpoint for Row Count @Override! public long getRowCount() throws IOException {! return getRowCount(new FirstKeyOnlyFilter()); ! } ! ! @Override ! public long getRowCount(Filter filter) throws IOException {! return getCount(filter, false); ! } ! ! @Override! public long getKeyValueCount() throws IOException {! return getCount(null, true); ! } ! }
 ! ! !
  • 40. Endpoint Invocation •  There are two ways to invoke the call –  By Proxy, using HTable.coprocessorProxy() •  Uses a delayed model, i.e. the call is send when the proxied method is invoked –  By Exec, using HTable.coprocessorExec() •  The call is send in parallel to all regions and the results are collected immediately •  The Batch.Call class is used be coprocessorExec() to wrap the calls per region •  The optional Batch.Callback can be used to react upon completion of the remote call
  • 42. Example: Invocation by Exec public static void main(String[] args) throws IOException { ! Configuration conf = HBaseConfiguration.create(); ! HTable table = new HTable(conf, "testtable");! try { ! Map<byte[], Long> results = ! table.coprocessorExec(RowCountProtocol.class, null, null,! new Batch.Call<RowCountProtocol, Long>() { ! @Override! public Long call(RowCountProtocol counter) ! throws IOException { ! return counter.getRowCount(); ! } ! }); ! !
  • 43. Example: Invocation by Exec long total = 0;! for (Map.Entry<byte[], Long> entry : ! results.entrySet()) { ! total += entry.getValue().longValue();! System.out.println("Region: " + ! Bytes.toString(entry.getKey()) +! ", Count: " + entry.getValue()); ! } ! System.out.println("Total Count: " + total); ! } catch (Throwable throwable) { ! throwable.printStackTrace(); ! } ! } !
  • 44. Example Output Region: testtable,, 1303417572005.51f9e2251c...cbcb 0c66858f., Count: 2 ! Region: testtable,row3, 1303417572005.7f3df4dcba...dbc9 9fce5d87., Count: 3 ! Total Count: 5 ! !
  • 45. Batch Convenience •  The Batch.forMethod() helps to quickly map a protocol function into a Batch.Call •  Useful for single method calls to the servers •  Uses the Java reflection API to retrieve the named method •  Saves you from implementing the anonymous inline class
  • 46. Batch Convenience Batch.Call call =! Batch.forMethod(! RowCountProtocol.class,! "getKeyValueCount"); ! Map<byte[], Long> results =! table.coprocessorExec(! RowCountProtocol.class, ! null, null, call); ! !
  • 47. Call Multiple Endpoints •  Sometimes you need to call more than one endpoint in a single roundtrip call to the servers •  This requires an anonymous inline class, since Batch.forMethod cannot handle this
  • 48. Call Multiple Endpoints Map<byte[], Pair<Long, Long>> ! results = table.coprocessorExec( ! RowCountProtocol.class, null, null,! new Batch.Call<RowCountProtocol,! Pair<Long, Long>>() { ! public Pair<Long, Long> call(! RowCountProtocol counter) ! throws IOException {
 return new Pair(! counter.getRowCount(), ! counter.getKeyValueCount()); ! }! }); !
  • 49. Example: Invocation by Proxy RowCountProtocol protocol =! table.coprocessorProxy(! RowCountProtocol.class,! Bytes.toBytes("row4")); ! long rowsInRegion =! protocol.getRowCount(); ! System.out.println(! "Region Row Count: " +! rowsInRegion); ! !
  • 50. 50 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.