Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords - June 2012

3,671 views

Published on

With the HBase 0.92 release it is now possible to embed code directly into HBase server processes. Coprocessors offer a very flexible extension model to move functionality closer to the data.
This presentation explains the key concepts and shows how coprocessors have been used in our social media monitoring system to accomplish the alert feature implemented with prospective search. It allows users to formulate a query by which they will be instantly notified via email about every newly detected document matching the terms.
This session aims to help people interested in coprocessors understand how coprocessors can be adapted to their own use cases and how to develop coprocessor extensions.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,671
On SlideShare
0
From Embeds
0
Number of Embeds
338
Actions
Shares
0
Downloads
43
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords - June 2012

  1. 1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
  2. 2. 5. Juni 2012•  Why Hadoop and HBase? 2•  Social Media Monitoring •  Prospective Search and Coprocessors•  Challenges & Lessons Learned•  Resources to get startedAgenda
  3. 3. 5. Juni 2012 Software Architect 3 @ sentric Co-founder and organizer of the Swiss HUG Contact: christian.guegi@sentric.ch http://www.sentric.ch @chrisgugiAbout me
  4. 4. 5. Juni 2012•  Spin-off of MeMo News AG, the 4 leading provider for Social Media Monitoring & Analytics in Switzerland•  Big Data expert, focused on Hadoop, HBase and Solr•  Objective: Transforming data into insightsAbout sentric
  5. 5. CC 2.0 by Pete Reed | h"p://flic.kr/p/KS9kf  
  6. 6. 5. Juni 2012 6 Information Information Analysis & Insight Gathering Processing Interpretation PresentationWhy Hadoop and HBase?Social Media Monitoring Process
  7. 7. 5. Juni 2012 7 Cost effective High SMM Reliable scalable Analytical RT Alerting capabilitiesWhy Hadoop and HBase?Requirements
  8. 8. 5. Juni 2012 8Storage HBase /HDFSSearch SolrAnalytics Hadoop MahoutEvent mechanism (MQ) HBase RowLogReal-time alerting Prospective searchWhy Hadoop and HBase?Technology Stack
  9. 9. CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
  10. 10. 5. Juni 2012 10Downloaded Articles match?Search AgentsOutput Web-UI Reports RT Alerts Icons by http://dryicons.comSocial Media MonitoringOverview
  11. 11. 5. Juni 2012 11 n Crawler REST HBase RowLog Coprocessor Web-UI MySQL Solr RT Alerts Icons by http://dryicons.comSocial Media MonitoringSolution Architecture
  12. 12. 5. Juni 2012•  Inspired by Google Bigtable 12 coprocessors•  HBase version 0.92•  Embed code directly into server processes•  High-level call interface for clients•  Automatic scaling, load balancing, request routingShort Primer on CoprocessorsOverview
  13. 13. 5. Juni 2012•  Like a database trigger 13 •  Provides event based hooks•  Concrete Implementations •  RegionObserver •  CRUD or DML type operations •  MasterObserver •  DDL or metadata operations and cluster administration •  WALObserver •  Write-ahead-log appending and restorationShort Primer on CoprocessorsObserver Classes
  14. 14. 5. Juni 2012 14 Client:Get() CP1:preGet() CP2:preGet() CP3:preGet() Hregion:Get() CP1:postGet() CP2:postGet() CP3:postGet() RegionServer client responseShort Primer on CoprocessorsObserver Execution
  15. 15. 5. Juni 2012•  Comparable to stored procedures 15 •  Custom RPC protocol, used between client and region server•  Loaded in region server•  Client call APIs over single row or a row range •  Framework translates row keys to region location •  Parallel executionShort Primer on CoprocessorsEndpoint Classes
  16. 16. 5. Juni 2012 16 Client code Batch.Call<CountProtocol,int> Region Server 1 int call(CountProtocol p) { table,,12345678 CountProtocol return p.getRowCount(); } . table,bbb,12345678 CountProtocol HTable coprocessorExec() Region Server 2 table,ccc,12345678 CountProtocol table,ddd,12345678 CountProtocol Map<byte[], Integer> countsByRegionShort Primer on CoprocessorsEndpoint Call Routine
  17. 17. 5. Juni 2012•  HBase Security (Version 0.94) 17•  Aggregate operations avg(), sum() •  AggregatorProtocol•  HBASE-3529: Embedded searchShort Primer on CoprocessorsUse Cases
  18. 18. 5. Juni 2012 18 Processing Put operations Prospective Search HRegion RT Alerts HRegionServer Icons by http://dryicons.comSocial Media MonitoringProspective Search with Coprocessors
  19. 19. 5. Juni 2012•  Standard, virtualized test cluster: 19 4RS/DN, 1HM, 1NN, 3ZK•  Test dataset created from 2h of live index (1GB)•  Drive load on RS/DNSocial Media MonitoringTesting Setup
  20. 20. 5. Juni 2012 1800 20 1600 1400 1200Writes/sec 1000 800 600 400 200 0 0 10 50 100 200 400 800 # of agents Social Media Monitoring Test Results
  21. 21. CC 2.0 by Sean Maurik | h"p://flic.kr/p/JUduu  
  22. 22. 5. Juni 2012•  Everyone is still learning 22•  Some issues only appear at scale•  Production cluster configuration •  Hardware issues •  Tuning cluster configuration to our work loads•  HBase stability•  Monitoring health of HBaseChallenges & Lessons LearnedChallenges
  23. 23. 5. Juni 2012•  Be careful with expensive operations 23 in coprocessors•  At scale, nothing works as advertised•  Monitoring/Operational tooling is most important•  Play with all the configurations and benchmark for tuningChallenges & Lessons LearnedLessons
  24. 24. 5. Juni 2012•  https://blogs.apache.org/hbase/ 24 entry/coprocessor_introduction•  http://hbase.apache.org/apidocs/ index.html•  http://www.lilyproject.org/lily/about/ playground/hbaserowlog.html•  http://www.github.com/sentric/ HBasePSResources to get started
  25. 25. 5. Juni 2012 25 Questions? Christian Gügi christian.guegi@sentric.chBerlin Buzzwords 2012Thank you!

×