Coprocessors – Uses, Abuses
Solutions
• 26 // SEPTEMBER // 2016
COPYRIGHT 2016 BLOOMBERG FINANCE
L.P. ALL RIGHTS RESERVED.
Esther Kundin
(With guest appearance by Clay Baenziger)
Coprocessors
What is a coprocessor?
– Custom jar loaded into HBase daemon process
– Endpoint – like a stored procedure
– Observer – like a trigger
Observers
– Region Observer
• preGet, postGet
• prePut, postPut
– WAL Observer
– Master Observer
• runs in HBase master
• Create, Delete, Modify table
Why use a coprocessor?
– Simple filter or aggregation run on your data
– Reduces amount of data being sent to the client
– NOT for complex data analysis
– Ex: Apache Phoenix (“We put the SQL back in
NoSQL”)
PORT – A sample use case
Post-Get example
RegionServer
postGet
Key Col1 Col2 Col3 Col4 Col5
Key1Abc 1 4 5
Key1Def 2 2 2
Key1Xyz 10 11 12
Key1 Abc-col1 Def-col2 Abc-col3 Abc-col4 Xyz-col5
Key1 1 2 4 5 12
Table
Representation:
Coprocessor Result:
Problems and
Solutions
Coprocessors crash regionservers
– Exceptions (other than IOExceptions) in the
coprocessor bring down the RegionServer
– In other cases, the coprocessor silently unloads
Solution – catch all exceptions
public final void prePut(...) throws IOException {
try {
prePutImpl(…);
}
catch(IOException ex) {
// Allow IOExceptions to propagate
// They won't cause an unload
throw ex;
}
catch(Throwable ex) {
// Wrap other exceptions as IOException
LOG.error("prePut: caught ", ex);
throw new IOException(ex);
}
}
Coprocessors can hog memory
– Memory is shared with RegionServer memory and
coprocessor memory
– Memory hogging slows RegionServer Performance
Solutions - defensive Java code
– Profile all coprocessor code for memory usage
• Use a generic profiler with a driver for your
coprocessor
– Use common Java tricks for limiting memory usage
• Use primitive types and underlying arrays where
possible
• Use immutable objects
• StringBuilder vs String concatenation
Problems with deployment
– Manual Deployment
• disable table
• assign new coprocessor
• enable table
– Rollout of non-backward-compatible coprocessor
difficult
Solutions
– HBASE-7639 – online schema update is enabled,
perhaps it will work
– Hard-code jar path in hbase-site.xml
• Used by Apache Phoenix
• Not the best approach for user-defined coprocessors
Logging and metrics tips
– Update log4j.properties file with a separate log
parameter for coprocessors
– Use MDC context to pass parameters to all parts of
the coprocessor
(http://www.slf4j.org/api/org/slf4j/MDC.html)
– Create an extra column in a Result to pass back an
object populated with metrics
– Bad request can bring down the whole cluster
– Missing jar will bring down the RegionServer
ERROR
org.apache.hadoop.hbase.coprocessor.CoprocessorHost:
The coprocessor
fooCoprocessor threw java.io.FileNotFoundException:
File does not exist:
/path/to/coprocessor.jar
java.io.FileNotFoundException: File does not exist:
/path/to/coprocessor.jar
Unsolved issues
(Preventing)
Abuses
– Affects all region servers – one at a time
– HTable descriptors contain coprocessor class:
Clean-up can be messy HBASE-14190 - Assign system tables ahead
of user region assignment
– Set table property:
hbase.coprocessor.abortonerror to false
2016-09-24 02:32:07,366 ERROR
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Failed to load
coprocessor net.clayb.hbase.coprocessor.RegionObserver
java.io.FileNotFoundException: File does not exist:
hdfs://Test/user/foo/clayCoprocessor.jar
(Region server stays alive only table stays disabled)
Load Failures
Handler Failure
– Affect some operations and not others (e.g. scan works, not get)
– RPC starvation is simple and non-obvious failure:
public class RegionObserverInfinity extends BaseRegionObserver
{
public void preGetOp(…) throws IOException {
for(;;){ LOG.trace(“Off I go…”); }
}
– Use jstack to see what is up in a region server:
clay@hbase-regionserver:~$ sudo jstack 3990
[…]
net.clayb.RegionObserverInfinity.preGetOp(…) @bci=12,
line=28 (Compiled frame; information may be imprecise)
Coprocessor Whitelisting
– Coprocessors are key to HBase operation:
• AccessController
• TokenProvider
• SecureBulkLoadEndpoint
• MultiRowMutationEndpoint
– hbase.coprocessor.user.enabled – disables all user
coprocessors (e.g. Apache Phoenix)
– HBASE-16700 – “Allow for coprocessor whitelisting”
or abuse HBASE-15686
Recap
– Coprocessors are dangerous:
Coprocessors are an advanced feature of HBase and are intended
to be used by system developers only. – HBase Book
– Write defensive code!
– Needed from the community
• Story for coprocessor deployment
• Process isolation
• JMX metrics
Thank you!

HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions

  • 1.
    Coprocessors – Uses,Abuses Solutions • 26 // SEPTEMBER // 2016 COPYRIGHT 2016 BLOOMBERG FINANCE L.P. ALL RIGHTS RESERVED. Esther Kundin (With guest appearance by Clay Baenziger)
  • 2.
  • 3.
    What is acoprocessor? – Custom jar loaded into HBase daemon process – Endpoint – like a stored procedure – Observer – like a trigger
  • 4.
    Observers – Region Observer •preGet, postGet • prePut, postPut – WAL Observer – Master Observer • runs in HBase master • Create, Delete, Modify table
  • 5.
    Why use acoprocessor? – Simple filter or aggregation run on your data – Reduces amount of data being sent to the client – NOT for complex data analysis – Ex: Apache Phoenix (“We put the SQL back in NoSQL”)
  • 6.
    PORT – Asample use case
  • 7.
    Post-Get example RegionServer postGet Key Col1Col2 Col3 Col4 Col5 Key1Abc 1 4 5 Key1Def 2 2 2 Key1Xyz 10 11 12 Key1 Abc-col1 Def-col2 Abc-col3 Abc-col4 Xyz-col5 Key1 1 2 4 5 12 Table Representation: Coprocessor Result:
  • 8.
  • 9.
    Coprocessors crash regionservers –Exceptions (other than IOExceptions) in the coprocessor bring down the RegionServer – In other cases, the coprocessor silently unloads
  • 10.
    Solution – catchall exceptions public final void prePut(...) throws IOException { try { prePutImpl(…); } catch(IOException ex) { // Allow IOExceptions to propagate // They won't cause an unload throw ex; } catch(Throwable ex) { // Wrap other exceptions as IOException LOG.error("prePut: caught ", ex); throw new IOException(ex); } }
  • 11.
    Coprocessors can hogmemory – Memory is shared with RegionServer memory and coprocessor memory – Memory hogging slows RegionServer Performance
  • 12.
    Solutions - defensiveJava code – Profile all coprocessor code for memory usage • Use a generic profiler with a driver for your coprocessor – Use common Java tricks for limiting memory usage • Use primitive types and underlying arrays where possible • Use immutable objects • StringBuilder vs String concatenation
  • 13.
    Problems with deployment –Manual Deployment • disable table • assign new coprocessor • enable table – Rollout of non-backward-compatible coprocessor difficult
  • 14.
    Solutions – HBASE-7639 –online schema update is enabled, perhaps it will work – Hard-code jar path in hbase-site.xml • Used by Apache Phoenix • Not the best approach for user-defined coprocessors
  • 15.
    Logging and metricstips – Update log4j.properties file with a separate log parameter for coprocessors – Use MDC context to pass parameters to all parts of the coprocessor (http://www.slf4j.org/api/org/slf4j/MDC.html) – Create an extra column in a Result to pass back an object populated with metrics
  • 16.
    – Bad requestcan bring down the whole cluster – Missing jar will bring down the RegionServer ERROR org.apache.hadoop.hbase.coprocessor.CoprocessorHost: The coprocessor fooCoprocessor threw java.io.FileNotFoundException: File does not exist: /path/to/coprocessor.jar java.io.FileNotFoundException: File does not exist: /path/to/coprocessor.jar Unsolved issues
  • 17.
  • 18.
    – Affects allregion servers – one at a time – HTable descriptors contain coprocessor class: Clean-up can be messy HBASE-14190 - Assign system tables ahead of user region assignment – Set table property: hbase.coprocessor.abortonerror to false 2016-09-24 02:32:07,366 ERROR org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Failed to load coprocessor net.clayb.hbase.coprocessor.RegionObserver java.io.FileNotFoundException: File does not exist: hdfs://Test/user/foo/clayCoprocessor.jar (Region server stays alive only table stays disabled) Load Failures
  • 19.
    Handler Failure – Affectsome operations and not others (e.g. scan works, not get) – RPC starvation is simple and non-obvious failure: public class RegionObserverInfinity extends BaseRegionObserver { public void preGetOp(…) throws IOException { for(;;){ LOG.trace(“Off I go…”); } } – Use jstack to see what is up in a region server: clay@hbase-regionserver:~$ sudo jstack 3990 […] net.clayb.RegionObserverInfinity.preGetOp(…) @bci=12, line=28 (Compiled frame; information may be imprecise)
  • 20.
    Coprocessor Whitelisting – Coprocessorsare key to HBase operation: • AccessController • TokenProvider • SecureBulkLoadEndpoint • MultiRowMutationEndpoint – hbase.coprocessor.user.enabled – disables all user coprocessors (e.g. Apache Phoenix) – HBASE-16700 – “Allow for coprocessor whitelisting” or abuse HBASE-15686
  • 21.
    Recap – Coprocessors aredangerous: Coprocessors are an advanced feature of HBase and are intended to be used by system developers only. – HBase Book – Write defensive code! – Needed from the community • Story for coprocessor deployment • Process isolation • JMX metrics
  • 22.

Editor's Notes

  • #8 - HBase is a good choice for doing a random access over large data
  • #21 HBASE-15686 provides hbase.coprocessor.classloader.included.classes to exclude loading coprocessors based on Java class name (for class isolation – not operator sanity)