Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HBaseCon 2012 | HBase Filtering - Lars George, Cloudera

2,902

Published on

This talk will run through the list of filters that are shipped with HBase and show how they are used from a client application. Filters expose varying feature sets, but also exhibit an equally …

This talk will run through the list of filters that are shipped with HBase and show how they are used from a client application. Filters expose varying feature sets, but also exhibit an equally varying impact on read performance – but neither are directly intuitive. A skilled HBase practitioner should know how to select the proper filter for a given use-case, or how to combine sets of filters to achieve what is needed. The talk will conclude with an example for a custom filter and explain how to deploy it on a cluster.

Published in: Technology, Business
0 Comments
18 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,902
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
211
Comments
0
Likes
18
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. HBaseCon, May 2012HBase FiltersLars George, Solutions Architect
  • 2. Agenda1 Introduction2 Comparison Filters3 Dedicated Filters4 Decorating Filters5 Combining Filters6 Custom Filters2 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 3. About Me •  Solutions Architect @ Cloudera •  Apache HBase & Whirr Committer •  Author of HBase – The Definitive Guide •  Working with HBase since end of 2007 •  Organizer of the Munich OpenHUG •  Speaker at Conferences (Fosdem, Hadoop World)3 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 4. Introduction to Filters •  Used in combination with get() and scan() API calls •  Steps: –  Create Filter instance –  Create Get or Scan instance –  Assign Filter to Get or Scan –  Call API and enjoy •  More fine-grained control over what is returned to the client4 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 5. Filter Features •  Allow client to further narrow down what is retrieved –  Not just per row or column key, or per column family •  Predicate Pushdown –  Move filtering from client to server to reduce network traffic •  Varying performance implications, dependent on the use-case5 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 6. Filter Pushdown6 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 7. Filter Features (cont.) •  Filters have access to the entire row to decide its fate –  Access to KeyValue instances to check row keys, column qualifiers, timestamps, or values •  Scan batching might conflict with the above and might trigger an “Incompatible Filter” exception –  Example: DependentColumnFilter •  There is no cross invocation state –  Cannot filter rows based on dependent rows7 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 8. Available Filters •  Many filters are supplied by HBase –  Based on row key, column family, or column qualifier –  Paging through rows and columns –  Based on dependencies •  Write your own filters –  Use FilterBase class to get a no-op skeleton and fill in the gaps8 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 9. Agenda1 Introduction2 Comparison Filters3 Dedicated Filters4 Decorating Filters5 Combining Filters6 Custom Filters9 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 10. Comparison Filters •  Based on CompareFilter class •  Adds the compare() method to FilterBase! •  Takes operator that defines how the comparison is performed –  Predefined by client API •  Also needs a comparator to do the actual check –  HBase supplies a large set10 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 11. Comparison Operators11 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 12. Comparators12 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 13. Comparison Filters (cont.) •  Not all combinations of operator and comparator make sense –  For example, the SubstringComparator replies only 0 (match) and 1(no match) –  Only EQUAL and NOT_EQUAL are useful –  Using other operators is allowed but will most likely yield unexpected results13 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 14. Comparison Filters (cont.) •  HBase filters are usually filtering data out •  Comparison filters work in reverse as they include matching data –  Be mindful when selecting the comparison operator!14 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 15. Available Comparison Filters •  Row Filter –  Based on row keys comparisons •  Family Filter –  Based on column family names •  Qualifier Filter –  Based on column names, aka qualifiers •  Value Filter –  Based on the actual value of a column15 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 16. Available Comparison Filters (cont.) •  Dependent Column Filter –  Based on a timestamp of a reference column –  Includes all columns that have the same timestamp –  Implies that the entire row is accessible, since batching will not have access to the reference column •  No scanner batching allowed! –  Useful for loading interdependent changes within a row16 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 17. Example CodeScan scan = new Scan();
scan.addColumn(Bytes.toBytes("colfam1"), ! Bytes.toBytes("col-0")); !Filter filter = new RowFilter(! CompareFilter.CompareOp.LESS_OR_EQUAL, !new BinaryComparator(Bytes.toBytes("row-22")));scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);for (Result res : scanner) { ! System.out.println(res); !} !scanner.close(); !!17 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 18. Example Ouput keyvalues={row-1/colfam1:col-0/1301043190260/Put/vlen=7} ! keyvalues={row-10/colfam1:col-0/1301043190908/Put/vlen=8} ! keyvalues={row-100/colfam1:col-0/1301043195275/Put/vlen=9} ! keyvalues={row-11/colfam1:col-0/1301043190982/Put/vlen=8} ! keyvalues={row-12/colfam1:col-0/1301043191040/Put/vlen=8} ! keyvalues={row-13/colfam1:col-0/1301043191172/Put/vlen=8} ! keyvalues={row-14/colfam1:col-0/1301043191318/Put/vlen=8} ! keyvalues={row-15/colfam1:col-0/1301043191429/Put/vlen=8} ! keyvalues={row-16/colfam1:col-0/1301043191509/Put/vlen=8} ! keyvalues={row-17/colfam1:col-0/1301043191593/Put/vlen=8} ! keyvalues={row-18/colfam1:col-0/1301043191673/Put/vlen=8} ! keyvalues={row-19/colfam1:col-0/1301043191771/Put/vlen=8} ! keyvalues={row-2/colfam1:col-0/1301043190346/Put/vlen=7} ! keyvalues={row-20/colfam1:col-0/1301043191841/Put/vlen=8} ! keyvalues={row-21/colfam1:col-0/1301043191933/Put/vlen=8} ! keyvalues={row-22/colfam1:col-0/1301043191998/Put/vlen=8} !18 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 19. Agenda1 Introduction2 Comparison Filters3 Dedicated Filters4 Decorating Filters5 Combining Filters6 Custom Filters19 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 20. Dedicated Filters •  Based directly on FilterBase class •  Often less useful for get() calls, since entire rows are filtered20 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 21. Available Dedicated Filters •  Single Column Value Filter –  Filter rows based on one specific column –  Extra features •  “Filter if missing” •  “Get latest version only” –  Column must be part of the scan selection •  Or else it is all or nothing –  Also needs compare operation and an optional comparator21 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 22. Available Dedicated Filters (cont.) •  Single Column Value Exclude Filter –  Same as the one before but excludes the selection column •  Prefix Filter –  Based on prefix of row keys –  Can early out the scan! •  Combine with start row for best performance22 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 23. Available Dedicated Filters (cont.) •  Page Filter –  Allows pagination through rows –  Needs to be combined with setting the start row on subsequent scans –  Can early out the scan when limit is reached •  Key Only Filter –  Drop the value for every column •  First Key Only Filter –  Return only the first column key –  Useful for row counter, or get newest post type applications –  Can early out rest of row scan23 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 24. Available Dedicated Filters (cont.) •  Inclusive Stop Filter –  As opposed to the exclusive stop row, this filter will include the final row •  Timestamp Filter –  Takes list of timestamps to include in result •  Column Count Get Filter –  Used to limit number of columns returned by a get() call24 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 25. Available Dedicated Filters (cont.) •  Column Pagination Filter –  Allows to paginate through columns within a row –  Skips to offset parameter and returns limit columns •  Column Prefix Filter –  Analog to PrefixFilter, here for matching column qualifiers •  Random Row Filter25 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 26. Agenda1 Introduction2 Comparison Filters3 Dedicated Filters4 Decorating Filters5 Combining Filters6 Custom Filters26 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 27. Decorating Filters •  Extend filters to gain additional control over the returned data •  Skip Filter –  Skip entire row when a column is filtered –  Not all filters are compatible •  While Match Filter –  Aborts entire scan once the wrapped filter indicates a row or column is omitted27 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 28. Agenda1 Introduction2 Comparison Filters3 Dedicated Filters4 Decorating Filters5 Combining Filters6 Custom Filters28 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 29. Combining Filters •  Implemented by the FilterList class –  Wraps list of filters into a Filter compatible class –  Takes optional operator to decide how to handle the results of each wrapped filter (default: MUST_PASS_ALL)29 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 30. Combining Filters •  Filter lists can contain other filter lists •  Operator is fixed per list, but hierarchy allows to create combinations •  Using the proper List implementation helps controlling filter execution order30 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 31. List<Filter> filters = new ArrayList<Filter>();
 Filter filter1 = new RowFilter(! CompareFilter.CompareOp.GREATER_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-03"))); ! filters.add(filter1); ! Filter filter2 = new RowFilter(! CompareFilter.CompareOp.LESS_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-06"))); ! filters.add(filter2); ! Filter filter3 = new QualifierFilter(! CompareFilter.CompareOp.EQUAL, ! new RegexStringComparator("col-0[03]")); ! filters.add(filter3);! FilterList filterList1 = new FilterList(filters); ! …! FilterList filterList2 = new FilterList(FilterList.Operator.MUST_PASS_ONE, filters); !31 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 32. Agenda1 Introduction2 Comparison Filters3 Dedicated Filters4 Decorating Filters5 Combining Filters6 Custom Filters32 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 33. Custom Filter •  Allows users to add missing filters •  Either implement Filter interface or use FilterBase skeleton •  Provides hooks called at different stages of the read process33 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 34. Filter Interface public interface Filter extends Writable { ! public enum ReturnCode { ! INCLUDE, SKIP, NEXT_COL, NEXT_ROW,! SEEK_NEXT_USING_HINT } ! public void reset()! public boolean filterRowKey(byte[] buffer, ! int offset, int length) ! public boolean filterAllRemaining()! public ReturnCode filterKeyValue(KeyValue v)! public void filterRow(List<KeyValue> kvs)! public boolean hasFilterRow()! public boolean filterRow()! public KeyValue getNextKeyHint(KeyValue ! currentKV) ! !34 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 35. Filter Return Codes35 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 36. Merge Reads36 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 37. Filter Flow •  Filter hooks are called at different stages •  Seeks are done initially to find the next KeyValue –  Hint from previous filter invocation might help •  Early out checks improve performance37 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 38. Example Codepublic class CustomFilter extends FilterBase{ ! private byte[] value = null; ! private boolean filterRow = true; ! public CustomFilter() { super(); }! public CustomFilter(byte[] value) { this.value = value; } ! @Override
 public void reset() { this.filterRow = true; } ! @Override ! public ReturnCode filterKeyValue(KeyValue kv) {! if (Bytes.compareTo(value, kv.getValue()) == 0) { ! filterRow = false; ! } ! return ReturnCode.INCLUDE; ! } ! @Override ! public boolean filterRow() { return filterRow; } ! ...!} !!38 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 39. Deploying Custom Filters •  Need to provide JAR file with filter class •  Deploy JAR to RegionServers •  Add JAR to HBASE_CLASSPATH •  Restart RegionServers •  Tip: Testing on cluster more involved, test on local machine first39 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 40. Summary40 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 41. Summary (cont.)41 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.

×