SlideShare a Scribd company logo
HBase – Coprocessors

 Mingjie Lai, Trend Micro
       HUG NYC,
      Oct. 11, 2010
What are Coprocessors
●   Inspired by Google Bigtable Coprocessors (Jeff
    Dean's keynote talk at LADIS 09)
    ●   Arbitrary code that runs at each tablet in table server
    ●   High-level call interface for clients
         –   Calls addressed to rows or ranges of rows. coprocessor client library
             resolves to actual locations
         –   Calls across multiple rows automatically split into multiple
             parallelized RPC
    ●   Very flexible model for building distributed services
         –   automatic scaling, load balancing, request routing for app
Current Status
●   Umbrella case: HBASE-2000
●   Coprocessor framework – HBASE-2001:
    ●   Includes RegionObserver, CommandTarget, CP class
        loading
    ●   Code submitted for review
    ●   Will be commited to TRUNK very soon, and 0.92 release
●   Client side support – HBASE-2002:
    ●   Dynamic RPC, between clients and region servers
    ●   Code submitted for review
    ●   Will be commited to TRUNK soon, and 0.92 release
●   The first Coprocessor application – HBASE-3025 and
    3045: Coprocessor based access control
    ●   Code complete
RegionObserver
●   If a coprocessor implements this interface, it will be
    interposed in all region actions via upcalls
    ●   Provides hooks for client side requests: HTable.get(), put(),
        exists(), delete(), scannerOpen(), checkAndPut(), etc.
    ●   Chaining of multiple observers (by priority)
    ●   The first coprocessors application – HBase access control –
        is built on top of it
    ●   More extensions can be built on top of RegionObserver
         –   Secondary indexes
         –   Filters
●   How to develop a RegionObserver
    ●   No new client API defined for RegionObserver
    ●   Need to implement RegionObserver interface and override
        upcall methods: preGet(), postGet(), prePut(), postPut(), etc.
RegionObserver




     Client requests   Region server   CP framework   RegionObserver
CommandTarget
●   CommandTarget with Dynamic RPC provides a way to
    define one's own protocol communicated between
    client and region server, and execute arbitrary code at
    region server
●   CommandTarget methods are triggered by calling
    dynamic RPC client side method –
    Htable.coprocessorExec(...), etc.
●   How to develop
    ●   Defines protocol interface (extends CoprocessorProtocol)
    ●   Implements this protocol interface
    ●   Extend BaseCommandTarget: protocol will be automatically
        registered at coprocessor load
    ●   On client side, the CommandTarget can be triggered by:
         –   HTable.coprocessorProxy() - single region
         –   HTable.coprocessorExec() - region range
Dynamic RPC: a sample
Given CoprocessorProtocol:
public interface CountProtocol extends CoprocessorProtocol {
  int getRowCount();
}
Coprocessors Class Loading
●   Load from configuration: set coprocessors class
    names in HBase configuration
    ●   hbase.coprocessor.default.classes
    ●   Class names are comma seperated
    ●   They will be picked up when region is opened, as default
        coprocessors
●   Load from table attributes
    ●   Utilize table attribute: a path (e.g. HDFS URI) to jar file
    ●   Loaded when region is opened
●   We can utilize CommandTarget to have a way to load
    coprocessors on demand
    ●   Security is the biggest concern
Next Steps
●   See HBASE-2000 and subtasks
●   Framework
    ●   MapReduce
        –   Runs concurrently on all regions of the table
        –   Like Hadoop MapReduce: Mappers, reducers, partitioners,
            intermediates
        –   Not table MapReduce, parallel region MapReduce
    ●   Code weaving
        –   Allow arbitrary code execution right now
        –   Use a rewriting framework like ASM to weave in policies at load time
        –   Improve fault isolation and system integrity protections
        –   Wrap heap allocations to enforce limits
        –   Monitor CPU time
        –   Reject APIs considered unsafe
    ●   On demand Coprocessors class loading
Next Steps
●   Applications
    ●   HBase access control: HBASE-3025, HBASE-3045.
    ●   Aggregate: HBASE-1512
    ●   Region level indexing: HBASE-2038
    ●   Table metacolumns: HBASE-2893
    ●   Secondary indexing?
    ●   New Filtering?
Q&A

More Related Content

What's hot

Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insights
Omid Vahdaty
 
SMB Traffic Analyzer @ Samba Xp 2009
SMB Traffic Analyzer @  Samba Xp 2009SMB Traffic Analyzer @  Samba Xp 2009
SMB Traffic Analyzer @ Samba Xp 2009
hhetter
 
He Pi Xii2003
He Pi Xii2003He Pi Xii2003
He Pi Xii2003
FNian
 
Flume-Cassandra Log Processor
Flume-Cassandra Log ProcessorFlume-Cassandra Log Processor
Flume-Cassandra Log Processor
CLOUDIAN KK
 
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Ontico
 
5 Stage Pipeline Semi-custom Design
5 Stage Pipeline Semi-custom Design5 Stage Pipeline Semi-custom Design
5 Stage Pipeline Semi-custom Design
Jiang Xue
 
System performance monitoring pcp + vector
System performance monitoring   pcp + vectorSystem performance monitoring   pcp + vector
System performance monitoring pcp + vector
Sandeep Kunkunuru
 
Ftp server linux
Ftp server linuxFtp server linux
Ftp server linux
Pawan Kumar
 
File Transfer Protocol(FTP)
File Transfer Protocol(FTP)File Transfer Protocol(FTP)
File Transfer Protocol(FTP)
Varnit Yadav
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
Arinto Murdopo
 
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice PellandShared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Fuenteovejuna
 
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
Yahoo Developer Network
 
Health Check Your DB2 UDB For Z/OS System
Health Check Your DB2 UDB For Z/OS SystemHealth Check Your DB2 UDB For Z/OS System
Health Check Your DB2 UDB For Z/OS System
sjreese
 
Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an Introduction
Erik Schmiegelow
 
[Altibase] 10 replication part3 (system design)
[Altibase] 10 replication part3 (system design)[Altibase] 10 replication part3 (system design)
[Altibase] 10 replication part3 (system design)
altistory
 
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Ishan Thakkar
 
Advanced OpenVPN Concepts on pfSense 2.4 & 2.3.3 - pfSense Hangout February 2017
Advanced OpenVPN Concepts on pfSense 2.4 & 2.3.3 - pfSense Hangout February 2017Advanced OpenVPN Concepts on pfSense 2.4 & 2.3.3 - pfSense Hangout February 2017
Advanced OpenVPN Concepts on pfSense 2.4 & 2.3.3 - pfSense Hangout February 2017
Netgate
 
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
javier ramirez
 
PLNOG 13: Michał Dubiel: OpenContrail software architecture
PLNOG 13: Michał Dubiel: OpenContrail software architecturePLNOG 13: Michał Dubiel: OpenContrail software architecture
PLNOG 13: Michał Dubiel: OpenContrail software architecture
PROIDEA
 

What's hot (20)

Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insights
 
SMB Traffic Analyzer @ Samba Xp 2009
SMB Traffic Analyzer @  Samba Xp 2009SMB Traffic Analyzer @  Samba Xp 2009
SMB Traffic Analyzer @ Samba Xp 2009
 
He Pi Xii2003
He Pi Xii2003He Pi Xii2003
He Pi Xii2003
 
Flume-Cassandra Log Processor
Flume-Cassandra Log ProcessorFlume-Cassandra Log Processor
Flume-Cassandra Log Processor
 
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
Tempesta FW - Framework и Firewall для WAF и DDoS mitigation, Александр Крижа...
 
5 Stage Pipeline Semi-custom Design
5 Stage Pipeline Semi-custom Design5 Stage Pipeline Semi-custom Design
5 Stage Pipeline Semi-custom Design
 
System performance monitoring pcp + vector
System performance monitoring   pcp + vectorSystem performance monitoring   pcp + vector
System performance monitoring pcp + vector
 
Ftp server linux
Ftp server linuxFtp server linux
Ftp server linux
 
File Transfer Protocol(FTP)
File Transfer Protocol(FTP)File Transfer Protocol(FTP)
File Transfer Protocol(FTP)
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice PellandShared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
Shared Personalization Service - How To Scale to 15K RPS, Patrice Pelland
 
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
 
Health Check Your DB2 UDB For Z/OS System
Health Check Your DB2 UDB For Z/OS SystemHealth Check Your DB2 UDB For Z/OS System
Health Check Your DB2 UDB For Z/OS System
 
Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an Introduction
 
[Altibase] 10 replication part3 (system design)
[Altibase] 10 replication part3 (system design)[Altibase] 10 replication part3 (system design)
[Altibase] 10 replication part3 (system design)
 
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
 
Advanced OpenVPN Concepts on pfSense 2.4 & 2.3.3 - pfSense Hangout February 2017
Advanced OpenVPN Concepts on pfSense 2.4 & 2.3.3 - pfSense Hangout February 2017Advanced OpenVPN Concepts on pfSense 2.4 & 2.3.3 - pfSense Hangout February 2017
Advanced OpenVPN Concepts on pfSense 2.4 & 2.3.3 - pfSense Hangout February 2017
 
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
 
PLNOG 13: Michał Dubiel: OpenContrail software architecture
PLNOG 13: Michał Dubiel: OpenContrail software architecturePLNOG 13: Michał Dubiel: OpenContrail software architecture
PLNOG 13: Michał Dubiel: OpenContrail software architecture
 

Similar to HBase Coprocessors @ HUG NYC

Maintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesMaintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Paolo Corti
 
haproxy-150423120602-conversion-gate01.pdf
haproxy-150423120602-conversion-gate01.pdfhaproxy-150423120602-conversion-gate01.pdf
haproxy-150423120602-conversion-gate01.pdf
PawanVerma628806
 
HAProxy
HAProxy HAProxy
HAProxy
Arindam Nayak
 
Scalable Web Apps
Scalable Web AppsScalable Web Apps
Scalable Web Apps
Piotr Pelczar
 
HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Future
rcastain
 
The new (is it really ) api stack
The new (is it really ) api stackThe new (is it really ) api stack
The new (is it really ) api stack
Red Hat
 
Nov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars georgeNov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars george
O'Reilly Media
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache Thrift
RX-M Enterprises LLC
 
KrakenD API Gateway
KrakenD API GatewayKrakenD API Gateway
KrakenD API Gateway
Albert Lombarte
 
slides (PPT)
slides (PPT)slides (PPT)
slides (PPT)
webhostingguy
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Modern Data Stack France
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
Apache Apex
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
Terry Cho
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
Redge Technologies
 
HAProxy 1.9
HAProxy 1.9HAProxy 1.9
Always on high availability best practices for informix
Always on high availability best practices for informixAlways on high availability best practices for informix
Always on high availability best practices for informix
IBM_Info_Management
 
Informix HA Best Practices
Informix HA Best Practices Informix HA Best Practices
Informix HA Best Practices
Scott Lashley
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
Cloudera, Inc.
 
Grpc present
Grpc presentGrpc present
Grpc present
Phạm Hải Anh
 
Towards constrained semantic web
Towards constrained semantic webTowards constrained semantic web
Towards constrained semantic web
☕ Remy Rojas
 

Similar to HBase Coprocessors @ HUG NYC (20)

Maintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesMaintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queues
 
haproxy-150423120602-conversion-gate01.pdf
haproxy-150423120602-conversion-gate01.pdfhaproxy-150423120602-conversion-gate01.pdf
haproxy-150423120602-conversion-gate01.pdf
 
HAProxy
HAProxy HAProxy
HAProxy
 
Scalable Web Apps
Scalable Web AppsScalable Web Apps
Scalable Web Apps
 
HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Future
 
The new (is it really ) api stack
The new (is it really ) api stackThe new (is it really ) api stack
The new (is it really ) api stack
 
Nov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars georgeNov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars george
 
Building high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache ThriftBuilding high performance microservices in finance with Apache Thrift
Building high performance microservices in finance with Apache Thrift
 
KrakenD API Gateway
KrakenD API GatewayKrakenD API Gateway
KrakenD API Gateway
 
slides (PPT)
slides (PPT)slides (PPT)
slides (PPT)
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
 
HAProxy 1.9
HAProxy 1.9HAProxy 1.9
HAProxy 1.9
 
Always on high availability best practices for informix
Always on high availability best practices for informixAlways on high availability best practices for informix
Always on high availability best practices for informix
 
Informix HA Best Practices
Informix HA Best Practices Informix HA Best Practices
Informix HA Best Practices
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
Grpc present
Grpc presentGrpc present
Grpc present
 
Towards constrained semantic web
Towards constrained semantic webTowards constrained semantic web
Towards constrained semantic web
 

Recently uploaded

Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 

Recently uploaded (20)

Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 

HBase Coprocessors @ HUG NYC

  • 1. HBase – Coprocessors Mingjie Lai, Trend Micro HUG NYC, Oct. 11, 2010
  • 2. What are Coprocessors ● Inspired by Google Bigtable Coprocessors (Jeff Dean's keynote talk at LADIS 09) ● Arbitrary code that runs at each tablet in table server ● High-level call interface for clients – Calls addressed to rows or ranges of rows. coprocessor client library resolves to actual locations – Calls across multiple rows automatically split into multiple parallelized RPC ● Very flexible model for building distributed services – automatic scaling, load balancing, request routing for app
  • 3. Current Status ● Umbrella case: HBASE-2000 ● Coprocessor framework – HBASE-2001: ● Includes RegionObserver, CommandTarget, CP class loading ● Code submitted for review ● Will be commited to TRUNK very soon, and 0.92 release ● Client side support – HBASE-2002: ● Dynamic RPC, between clients and region servers ● Code submitted for review ● Will be commited to TRUNK soon, and 0.92 release ● The first Coprocessor application – HBASE-3025 and 3045: Coprocessor based access control ● Code complete
  • 4. RegionObserver ● If a coprocessor implements this interface, it will be interposed in all region actions via upcalls ● Provides hooks for client side requests: HTable.get(), put(), exists(), delete(), scannerOpen(), checkAndPut(), etc. ● Chaining of multiple observers (by priority) ● The first coprocessors application – HBase access control – is built on top of it ● More extensions can be built on top of RegionObserver – Secondary indexes – Filters ● How to develop a RegionObserver ● No new client API defined for RegionObserver ● Need to implement RegionObserver interface and override upcall methods: preGet(), postGet(), prePut(), postPut(), etc.
  • 5. RegionObserver Client requests Region server CP framework RegionObserver
  • 6. CommandTarget ● CommandTarget with Dynamic RPC provides a way to define one's own protocol communicated between client and region server, and execute arbitrary code at region server ● CommandTarget methods are triggered by calling dynamic RPC client side method – Htable.coprocessorExec(...), etc. ● How to develop ● Defines protocol interface (extends CoprocessorProtocol) ● Implements this protocol interface ● Extend BaseCommandTarget: protocol will be automatically registered at coprocessor load ● On client side, the CommandTarget can be triggered by: – HTable.coprocessorProxy() - single region – HTable.coprocessorExec() - region range
  • 7. Dynamic RPC: a sample Given CoprocessorProtocol: public interface CountProtocol extends CoprocessorProtocol { int getRowCount(); }
  • 8. Coprocessors Class Loading ● Load from configuration: set coprocessors class names in HBase configuration ● hbase.coprocessor.default.classes ● Class names are comma seperated ● They will be picked up when region is opened, as default coprocessors ● Load from table attributes ● Utilize table attribute: a path (e.g. HDFS URI) to jar file ● Loaded when region is opened ● We can utilize CommandTarget to have a way to load coprocessors on demand ● Security is the biggest concern
  • 9. Next Steps ● See HBASE-2000 and subtasks ● Framework ● MapReduce – Runs concurrently on all regions of the table – Like Hadoop MapReduce: Mappers, reducers, partitioners, intermediates – Not table MapReduce, parallel region MapReduce ● Code weaving – Allow arbitrary code execution right now – Use a rewriting framework like ASM to weave in policies at load time – Improve fault isolation and system integrity protections – Wrap heap allocations to enforce limits – Monitor CPU time – Reject APIs considered unsafe ● On demand Coprocessors class loading
  • 10. Next Steps ● Applications ● HBase access control: HBASE-3025, HBASE-3045. ● Aggregate: HBASE-1512 ● Region level indexing: HBASE-2038 ● Table metacolumns: HBASE-2893 ● Secondary indexing? ● New Filtering?
  • 11. Q&A