HBase Coprocessors @ HUG NYC

4,015 views
3,929 views

Published on

Published in: Technology, Education
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,015
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
68
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

HBase Coprocessors @ HUG NYC

  1. 1. HBase – Coprocessors Mingjie Lai, Trend Micro HUG NYC, Oct. 11, 2010
  2. 2. What are Coprocessors ● Inspired by Google Bigtable Coprocessors (Jeff Dean's keynote talk at LADIS 09) ● Arbitrary code that runs at each tablet in table server ● High-level call interface for clients – Calls addressed to rows or ranges of rows. coprocessor client library resolves to actual locations – Calls across multiple rows automatically split into multiple parallelized RPC ● Very flexible model for building distributed services – automatic scaling, load balancing, request routing for app
  3. 3. Current Status ● Umbrella case: HBASE-2000 ● Coprocessor framework – HBASE-2001: ● Includes RegionObserver, CommandTarget, CP class loading ● Code submitted for review ● Will be commited to TRUNK very soon, and 0.92 release ● Client side support – HBASE-2002: ● Dynamic RPC, between clients and region servers ● Code submitted for review ● Will be commited to TRUNK soon, and 0.92 release ● The first Coprocessor application – HBASE-3025 and 3045: Coprocessor based access control ● Code complete
  4. 4. RegionObserver ● If a coprocessor implements this interface, it will be interposed in all region actions via upcalls ● Provides hooks for client side requests: HTable.get(), put(), exists(), delete(), scannerOpen(), checkAndPut(), etc. ● Chaining of multiple observers (by priority) ● The first coprocessors application – HBase access control – is built on top of it ● More extensions can be built on top of RegionObserver – Secondary indexes – Filters ● How to develop a RegionObserver ● No new client API defined for RegionObserver ● Need to implement RegionObserver interface and override upcall methods: preGet(), postGet(), prePut(), postPut(), etc.
  5. 5. RegionObserver Client requests Region server CP framework RegionObserver
  6. 6. CommandTarget ● CommandTarget with Dynamic RPC provides a way to define one's own protocol communicated between client and region server, and execute arbitrary code at region server ● CommandTarget methods are triggered by calling dynamic RPC client side method – Htable.coprocessorExec(...), etc. ● How to develop ● Defines protocol interface (extends CoprocessorProtocol) ● Implements this protocol interface ● Extend BaseCommandTarget: protocol will be automatically registered at coprocessor load ● On client side, the CommandTarget can be triggered by: – HTable.coprocessorProxy() - single region – HTable.coprocessorExec() - region range
  7. 7. Dynamic RPC: a sample Given CoprocessorProtocol: public interface CountProtocol extends CoprocessorProtocol { int getRowCount(); }
  8. 8. Coprocessors Class Loading ● Load from configuration: set coprocessors class names in HBase configuration ● hbase.coprocessor.default.classes ● Class names are comma seperated ● They will be picked up when region is opened, as default coprocessors ● Load from table attributes ● Utilize table attribute: a path (e.g. HDFS URI) to jar file ● Loaded when region is opened ● We can utilize CommandTarget to have a way to load coprocessors on demand ● Security is the biggest concern
  9. 9. Next Steps ● See HBASE-2000 and subtasks ● Framework ● MapReduce – Runs concurrently on all regions of the table – Like Hadoop MapReduce: Mappers, reducers, partitioners, intermediates – Not table MapReduce, parallel region MapReduce ● Code weaving – Allow arbitrary code execution right now – Use a rewriting framework like ASM to weave in policies at load time – Improve fault isolation and system integrity protections – Wrap heap allocations to enforce limits – Monitor CPU time – Reject APIs considered unsafe ● On demand Coprocessors class loading
  10. 10. Next Steps ● Applications ● HBase access control: HBASE-3025, HBASE-3045. ● Aggregate: HBASE-1512 ● Region level indexing: HBASE-2038 ● Table metacolumns: HBASE-2893 ● Secondary indexing? ● New Filtering?
  11. 11. Q&A

×