Coprocessors in HBase can be used to filter or aggregate data before returning results to clients. However, they also present risks if not implemented carefully. Coprocessors that crash or leak memory can bring down entire region servers. The document provides solutions to these problems, such as catching all exceptions to prevent crashes and using defensive coding practices to limit memory usage. It also discusses challenges with deploying and managing coprocessors at scale. While powerful, coprocessors require careful development and configuration to avoid potential abuses of the system.
5. Why use a coprocesor?
– Simple filter or aggregation run on your data
– Reduces amount of data being sent to the client
– NOT for complex data analysis
– Ex: Apache Phoenix (“We put the SQL back in
NoSQL”)
9. Coprocessors crash regionservers
– Exceptions (other than IOExceptions) in the
coprocessor bring down the RegionServer
– In other cases, the coprocessor silently unloads
10. Solution – catch all exceptions
public final void prePut(...)
throws IOException {
try {
prePutImpl(…);
}
catch(IOException ex) {
// Allow IOExceptions to propagate
// They won't cause an unload
throw ex;
}
catch(Throwable ex) {
// Wrap other exceptions as IOException
LOG.error("prePut: caught ", ex);
throw new IOException(ex);
}
}
11. Coprocessors can hog memory
– Memory is shared with RegionServer memory and
coprocessor memory
– Memory hogging slows RegionServer Performance
12. Solutions - defensive Java code
– Profile all coprocessor code for memory usage
• Use a generic profiler with a driver for your
coprocessor
– Use common Java tricks for limiting memory usage
• Use primitive types and underlying arrays where
possible
• Use immutable objects
• StringBuilder vs String concatenation
13. Problems with deployment
– Manual Deployment
• disable table
• assign new coprocessor
• enable table
– Rollout of non-backward-compatible coprocessor
difficult
14. Solutions
– HBASE-7639 – online schema update is enabled,
perhaps it will work
– Hard-code jar path in hbase-site.xml
• Used by Apache Phoenix
• Not the best approach for user-defined coprocessors
15. Logging and metrics tips
– Update log4j.properties file with a separate log
parameter for coprocessors
– Use MDC context to pass parameters to all parts of
the coprocessor
(http://www.slf4j.org/api/org/slf4j/MDC.html)
– Create an extra column in a Result to pass back an
object populated with metrics
16. Unsolved issues
– Bad request can bring down the whole cluster
– Missing jar will bring down the RegionServer
ERROR
org.apache.hadoop.hbase.coprocessor.CoprocessorHos
t: The coprocessor
fooCoprocessor threw
java.io.FileNotFoundException: File does not
exist:
/path/to/corprocessor.jar
java.io.FileNotFoundException: File does not
exist: /path/to/corprocessor.jar
18. Load Failures
– Affects all region servers – one at a time
– Affect some operations and not others (e.g. scan works, not get)
– HTable descriptors contain coprocessor class:
Clean-up can be messy HBASE-14190 - Assign system tables ahead of user
region assignment
– Set table property:
hbase.coprocessor.abortonerror to false
2016-09-24 02:32:07,366 ERROR
org.apache.hadoop.hbase.regionserver.RegionCoproce
ssorHost: Failed to load coprocessor
net.clayb.hbase.coprocessor.RegionObserver
java.io.FileNotFoundException: File does not
exist: hdfs://Test/user/foo/clayCoprocessor.jar
(Region server stays alive only table stays disabled)
19. Handler Failure
– RPC starvation is simple and non-obvious failure:
public class RegionObserverInfinity extends
BaseRegionObserver {
public void preGetOp(…) throws IOException {
for(;;){ LOG.trace(“Off I go…”); }
}
– Use jstack to see what is up in a region server:
clay@hbase-regionserver:~$ sudo jstack 3990
[…]
net.clayb.RegionObserverInfinity.preGetOp(…) @bci=12,
line=28 (Compiled frame; information may be imprecise)
20. Coprocessor Whitelisting
– Coprocessors are key to HBase operation:
• security.access.AccessController
• security.token.TokenProvider
• security.access.SecureBulkLoadEndpoint
• security.access.AccessController
• MultiRowMutationEndpoint
– hbase.coprocessor.user.enabled – disables all user
coprocessors (e.g. Apache Phoenix)
– HBASE-16700 – “Allow for coprocessor whitelisting” or
abuse HBASE-15686
21. Recap
– Coprocessors are dangerous:
Coprocessors are an advanced feature of HBase and
are intended to be used by system developers only. –
HBase Book
– Write defensive code!
– Needed from the community
• Story for coprocessor deployment
• Process isolation
• JMX metrics
- HBase is a good choice for doing a random access over large data
HBASE-15686 provides hbase.coprocessor.classloader.included.classes to exclude loading coprocessors based on Java class name (for class isolation – not operator sanity)