• Like
  • Save

HBaseCon 2013: A Developer’s Guide to Coprocessors

  • 2,312 views
Uploaded on

Presented by: John Weatherford, Telescope

Presented by: John Weatherford, Telescope

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,312
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
8

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Thank you for comingI am John WeatherfordThis is going to have Java code
  • We create digital products and applications for multiple devices and deliver campaigns and solutions across multiple platforms, live events, and more.Our major campaigns are Idol, Voice
  • There are two different types of Coprocessors, endpoints and observers.Observers are code that is triggered by an Hbase operation. In the relational DB model, this is logically similar to a trigger.Endpoints are code that is called explicitly as a function on the server. In the relational DB model, this is logically similar to a stored proceedureCoprocessors can be run on all regions, just the regions of a particular table or just on the master VERIFY THIS IS TRUE
  • RegionObserverMasterObserverWALObserver
  • RidiulousHbase exampleRestrict data changes after midnight? Rick Roll a random data requests
  • Coprocessor class? What does all this extend?
  • Example: AsyncHbase writer for Apache Flume doesn’t allow more than a single column write per operation. The goal of this observer is to allow flume to send all the column data we need and simply organize it when we get to the server
  • There are two ways to load the Jar, through the hbase-site.xml and altering the table. For demonstration we will be altering the table.Check the github repo for a base script that can be used to load the coprocessor through each stepSHOW: PICTURE: Insert picture of the loaded coprocessor
  • Each server has local logs that can be accessed through the master UI. Should the coprocessor have some sort of error, we can find the output here.
  • Explain what a protocol is.Endpoints aren’t triggered by actIt is important to remember endpoints run on all servers that contain any key within the start and end key passed.ions on the table, but called directly from the client.
  • Remember the endpoint runs on all the region servers so we are returned a set of results in a mapCall the endpoint in your client code.https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/coprocessor/BaseEndpointCoprocessor.html
  • Remember the endpoint runs on all the region servers so we are returned a set of results in a mapCall the endpoint in your client code.https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/coprocessor/BaseEndpointCoprocessor.html

Transcript

  • 1. A Developers Guide To Coprocessors Hbasecon 2013John Weatherford https://github.com/jweatherford
  • 2. Telescope is the leading provider of interactive television, audience participation and customer engagement solutions. Clients include TV networks, producers, digital platforms, studios, and sponsors seeking to reach, engage, and retain mass-audiences and consumers in real-time. Who Is Telescope?
  • 3. Arbitrary code that can run on each server Extendthe functionality of Hbase Avoid bothering the core committers What Is A Coprocessor
  • 4. Region 2 Endpoint Region 3 Post-Action Endpoint Endpoints Call a function explicitly Execute code on all regions Action Observers React to an event Run code before or after Two Types of Coprocessors Pre-ActionClient Region 1 Endpoint Client
  • 5. What Can I Do With Coprocessors Ideas what can be done Access Control Secondary Indexes Optimized Search Data Aggregation Control compaction times Real Time Analytics Reduce result sets Cache Request Email split alerts
  • 6. Getting Started With Code preGet(ObserverContext<RegionCoprocessorEnvironment> c, Get get, List<KeyValue> result) postGet(ObserverContext<RegionCoprocessorEnvironment> c, Get get, List<KeyValue> result) prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, boolean writeToWAL) postPut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, boolean writeToWAL) preDelete(ObserverContext<RegionCoprocessorEnvironment> c, Delete delete, WALEdit edit, boolean writeToWAL) postDelete(ObserverContext<RegionCoprocessorEnvironment> c, Delete delete, WALEdit edit, boolean writeToWAL)
  • 7. Our First Observer Intercept and modify the action Consider all circumstances that will trigger the observer Compile your jar to the same version of Java running your Hbase Regions Look for output from the coprocessor
  • 8. key: id-1332343 twitter:name: “loljk4u” twitter:message: “<3” twitter:length: 0x2 twitter:registered: 0xFF favorite:name: “Taylor” favorite:song: “I knew you were trouble” Our First Observer Motivation Apache flume only writes one column per put {twitter: { name: “loljk4u”, message: “<3”, length: 2, registered: true }, favorite: { name: “Taylor” ... JSON key: id-1332343 family: twitter qualifier: json_raw value: “{twitter: {name: “loljk4u”, message: “<3”, length: 2, registered: true ... Single Row Put preput() put
  • 9. JsonColumnExpander //get the arguments on the coprocessor public void start(CoprocessorEnvironment env) throws IOException { Configuration c = env.getConfiguration(); families = c.get("families", "").split(":"); } public void prePut(ObserverContext<…> e, Put put, WALEdit edit, boolean waL) { if(!put.has(FAMILY, JSON_COLUMN)) { return; } //check for the json_raw column String json = Bytes.toString(put.get(FAMILY, JSON_COLUMN).get(0).getValue()); for(Entry<String, ?> column : columns.entrySet()) { //loop through the json String value = (String) column.getValue(); put.add(family, Bytes.toBytes(column.getKey()), Bytes.toBytes(value)); } //remove the original json from the put put.add(FAMILY, JSON_COLUMN, "--removed--".getBytes()); }
  • 10. Loading the Coprocessor Push the jar to where your cluster can find it $>hadoop fs –put JsonColumnExpander.jar / Alter the table to enable the coprocessor $> alter „test', METHOD => 'table_att', 'coprocessor'=>'hdfs:///JsonColumnExpander.jar|telesco pe.hbase.JsonColumnExpander|1001|arg1=1,arg2=2„ Verify the load by checking the master web UI.
  • 11. Running The Code Trigger the coprocessor with a put on the table Put put = new Put(“rowkey”); Put.add(“twitter”.toBytes(), “json_raw”.toBytes(), json_data); Check each server’s local logs http://regionnode:60030/logs/ hbase-hbase-regionserver-node2. dev-hadoop.telescope.tv.out
  • 12. Creating Your First Endpoint Define the available methods a protocol Implement the protocol Extend BaseRegionEndpoint Load the endpoint on the table
  • 13. Endpoint Example public interface TrendsProtocol extends CoprocessorProtocol{ HashMap<String, Long> getData() throws IOException; } //The endpoint class implements the protocol we wrote above public class TrendsEndpoint extends BaseEndpointCoprocessor implements TrendsProtocol { @Override public HashMap<String, Long> getTrends() throws IOException { RegionCoprocessorEnvironment environment = getEnvironment(); InternalScanner scanner = environment.getRegion().getScanner(s); try { List<KeyValue> curVals = new ArrayList<KeyValue>(); do { curVals.clear(); for(KeyValue pair : curVals){ //loop through values on the region and process } }while(!done); } } }
  • 14. Endpoint Returned Results htable = HBaseDB.getTable(connection, “hbase_demo"); Map<byte[], HashMap<String, Long>> results = null; results = m_analytics.coprocessorExec( TrendsProtocol.class, null, //start row null, //end row new Batch.Call<TrendsProtocol, HashMap<String, Long>>(){ @Override public HashMap<String, Long> call(TrendsProtocol trends)throws IOException { return trends.getData(); } } ); for (Map.Entry<byte[], Boolean> entry : results.entrySet()) { //process results from each region server }
  • 15. Addendum to Endpoints 0.96 is changing Endpoints to use protobuf public static abstract class RowCountService implements com.google.protobuf.Service { ... public interface Interface { public abstract void getRowCount( com.google.protobuf.RpcController controller, CountRequest request, com.google.protobuf.RpcCallback done); public abstract void getKeyValueCount( com.google.protobuf.RpcController controller, CountRequest request, com.google.protobuf.RpcCallback done); } }
  • 16. Telescope’s Coprocessors Observers collect real time analytics data for our moderation platform as well as to create aggregate tables for the steaming data Endpoints optimize searches and transmit only the necessary data. Perform simple reporting queries that don’t need the full power of mapreduce.
  • 17. Questions? Alreadyusing coprocessors? I would love to hear about it. Curious to know more about a specific part? All code samples and table definitions can be found at https://github.com/jweatherford