HBase – Coprocessors

 Mingjie Lai, Trend Micro
       HUG NYC,
      Oct. 11, 2010
What are Coprocessors
●   Inspired by Google Bigtable Coprocessors (Jeff
    Dean's keynote talk at LADIS 09)
    ●   Arbi...
Current Status
●   Umbrella case: HBASE-2000
●   Coprocessor framework – HBASE-2001:
    ●   Includes RegionObserver, Comm...
RegionObserver
●   If a coprocessor implements this interface, it will be
    interposed in all region actions via upcalls...
RegionObserver




     Client requests   Region server   CP framework   RegionObserver
CommandTarget
●   CommandTarget with Dynamic RPC provides a way to
    define one's own protocol communicated between
    ...
Dynamic RPC: a sample
Given CoprocessorProtocol:
public interface CountProtocol extends CoprocessorProtocol {
  int getRow...
Coprocessors Class Loading
●   Load from configuration: set coprocessors class
    names in HBase configuration
    ●   hb...
Next Steps
●   See HBASE-2000 and subtasks
●   Framework
    ●   MapReduce
        –   Runs concurrently on all regions of...
Next Steps
●   Applications
    ●   HBase access control: HBASE-3025, HBASE-3045.
    ●   Aggregate: HBASE-1512
    ●   Re...
Q&A
Upcoming SlideShare
Loading in...5
×

HBase Coprocessors @ HUG NYC

3,874

Published on

Published in: Technology, Education
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,874
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
68
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

HBase Coprocessors @ HUG NYC

  1. 1. HBase – Coprocessors Mingjie Lai, Trend Micro HUG NYC, Oct. 11, 2010
  2. 2. What are Coprocessors ● Inspired by Google Bigtable Coprocessors (Jeff Dean's keynote talk at LADIS 09) ● Arbitrary code that runs at each tablet in table server ● High-level call interface for clients – Calls addressed to rows or ranges of rows. coprocessor client library resolves to actual locations – Calls across multiple rows automatically split into multiple parallelized RPC ● Very flexible model for building distributed services – automatic scaling, load balancing, request routing for app
  3. 3. Current Status ● Umbrella case: HBASE-2000 ● Coprocessor framework – HBASE-2001: ● Includes RegionObserver, CommandTarget, CP class loading ● Code submitted for review ● Will be commited to TRUNK very soon, and 0.92 release ● Client side support – HBASE-2002: ● Dynamic RPC, between clients and region servers ● Code submitted for review ● Will be commited to TRUNK soon, and 0.92 release ● The first Coprocessor application – HBASE-3025 and 3045: Coprocessor based access control ● Code complete
  4. 4. RegionObserver ● If a coprocessor implements this interface, it will be interposed in all region actions via upcalls ● Provides hooks for client side requests: HTable.get(), put(), exists(), delete(), scannerOpen(), checkAndPut(), etc. ● Chaining of multiple observers (by priority) ● The first coprocessors application – HBase access control – is built on top of it ● More extensions can be built on top of RegionObserver – Secondary indexes – Filters ● How to develop a RegionObserver ● No new client API defined for RegionObserver ● Need to implement RegionObserver interface and override upcall methods: preGet(), postGet(), prePut(), postPut(), etc.
  5. 5. RegionObserver Client requests Region server CP framework RegionObserver
  6. 6. CommandTarget ● CommandTarget with Dynamic RPC provides a way to define one's own protocol communicated between client and region server, and execute arbitrary code at region server ● CommandTarget methods are triggered by calling dynamic RPC client side method – Htable.coprocessorExec(...), etc. ● How to develop ● Defines protocol interface (extends CoprocessorProtocol) ● Implements this protocol interface ● Extend BaseCommandTarget: protocol will be automatically registered at coprocessor load ● On client side, the CommandTarget can be triggered by: – HTable.coprocessorProxy() - single region – HTable.coprocessorExec() - region range
  7. 7. Dynamic RPC: a sample Given CoprocessorProtocol: public interface CountProtocol extends CoprocessorProtocol { int getRowCount(); }
  8. 8. Coprocessors Class Loading ● Load from configuration: set coprocessors class names in HBase configuration ● hbase.coprocessor.default.classes ● Class names are comma seperated ● They will be picked up when region is opened, as default coprocessors ● Load from table attributes ● Utilize table attribute: a path (e.g. HDFS URI) to jar file ● Loaded when region is opened ● We can utilize CommandTarget to have a way to load coprocessors on demand ● Security is the biggest concern
  9. 9. Next Steps ● See HBASE-2000 and subtasks ● Framework ● MapReduce – Runs concurrently on all regions of the table – Like Hadoop MapReduce: Mappers, reducers, partitioners, intermediates – Not table MapReduce, parallel region MapReduce ● Code weaving – Allow arbitrary code execution right now – Use a rewriting framework like ASM to weave in policies at load time – Improve fault isolation and system integrity protections – Wrap heap allocations to enforce limits – Monitor CPU time – Reject APIs considered unsafe ● On demand Coprocessors class loading
  10. 10. Next Steps ● Applications ● HBase access control: HBASE-3025, HBASE-3045. ● Aggregate: HBASE-1512 ● Region level indexing: HBASE-2038 ● Table metacolumns: HBASE-2893 ● Secondary indexing? ● New Filtering?
  11. 11. Q&A
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×