SlideShare a Scribd company logo
HBaseCon, May 2012

HBase Filters
Lars George, Solutions Architect
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




2                   ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
About Me

    •  Solutions Architect @ Cloudera
    •  Apache HBase & Whirr Committer
    •  Author of
           HBase – The Definitive Guide
    •  Working with HBase since end
       of 2007
    •  Organizer of the Munich OpenHUG
    •  Speaker at Conferences (Fosdem,
       Hadoop World)

3               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                     or redistribution without written permission is prohibited.
Introduction to Filters

    •  Used in combination with get() and scan()
       API calls
    •  Steps:
      –  Create Filter instance
      –  Create Get or Scan instance
      –  Assign Filter to Get or Scan
      –  Call API and enjoy
    •  More fine-grained control over what is
       returned to the client

4                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Filter Features

    •  Allow client to further narrow down what is
       retrieved
      –  Not just per row or column key, or per column
         family
    •  Predicate Pushdown
      –  Move filtering from client to server to reduce
         network traffic
    •  Varying performance implications,
       dependent on the use-case


5                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Filter Pushdown




6             ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                   or redistribution without written permission is prohibited.
Filter Features (cont.)

    •  Filters have access to the entire row to
       decide its fate
      –  Access to KeyValue instances to check row keys,
         column qualifiers, timestamps, or values
    •  Scan batching might conflict with the above
       and might trigger an “Incompatible Filter”
       exception
      –  Example: DependentColumnFilter
    •  There is no cross invocation state
      –  Cannot filter rows based on dependent rows


7                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Available Filters

    •  Many filters are supplied by HBase
      –  Based on row key, column family, or column
         qualifier
      –  Paging through rows and columns
      –  Based on dependencies

    •  Write your own filters
      –  Use FilterBase class to get a no-op
         skeleton and fill in the gaps


8                 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




9                   ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Comparison Filters

 •  Based on CompareFilter class
 •  Adds the compare() method to
    FilterBase!
 •  Takes operator that defines how the
    comparison is performed
     –  Predefined by client API
 •  Also needs a comparator to do the actual
    check
     –  HBase supplies a large set

10                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Comparison Operators




11        ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
               or redistribution without written permission is prohibited.
Comparators




12        ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
               or redistribution without written permission is prohibited.
Comparison Filters (cont.)

 •  Not all combinations of operator and
    comparator make sense
     –  For example, the SubstringComparator
        replies only 0 (match) and 1(no match)
     –  Only EQUAL and NOT_EQUAL are useful
     –  Using other operators is allowed but will most
        likely yield unexpected results




13                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Comparison Filters (cont.)

 •  HBase filters are usually filtering data out
 •  Comparison filters work in reverse as they
    include matching data
     –  Be mindful when selecting the comparison
        operator!




14               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                      or redistribution without written permission is prohibited.
Available Comparison Filters

 •  Row Filter
     –  Based on row keys comparisons
 •  Family Filter
     –  Based on column family names
 •  Qualifier Filter
     –  Based on column names, aka qualifiers
 •  Value Filter
     –  Based on the actual value of a column


15                 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Available Comparison Filters (cont.)

 •  Dependent Column Filter
     –  Based on a timestamp of a reference column
     –  Includes all columns that have the same
        timestamp
     –  Implies that the entire row is accessible, since
        batching will not have access to the reference
        column
        •  No scanner batching allowed!
     –  Useful for loading interdependent changes
        within a row


16                 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Example Code
Scan scan = new Scan();

scan.addColumn(Bytes.toBytes("colfam1"), !
  Bytes.toBytes("col-0")); !
Filter filter = new RowFilter(!
  CompareFilter.CompareOp.LESS_OR_EQUAL, !
new BinaryComparator(Bytes.toBytes("row-22")));
scan.setFilter(filter);

ResultScanner scanner = table.getScanner(scan);
for (Result res : scanner) { !
  System.out.println(res); !
} !
scanner.close(); !
!

17            ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                   or redistribution without written permission is prohibited.
Example Ouput
 keyvalues={row-1/colfam1:col-0/1301043190260/Put/vlen=7} !
 keyvalues={row-10/colfam1:col-0/1301043190908/Put/vlen=8} !
 keyvalues={row-100/colfam1:col-0/1301043195275/Put/vlen=9} !
 keyvalues={row-11/colfam1:col-0/1301043190982/Put/vlen=8} !
 keyvalues={row-12/colfam1:col-0/1301043191040/Put/vlen=8} !
 keyvalues={row-13/colfam1:col-0/1301043191172/Put/vlen=8} !
 keyvalues={row-14/colfam1:col-0/1301043191318/Put/vlen=8} !
 keyvalues={row-15/colfam1:col-0/1301043191429/Put/vlen=8} !
 keyvalues={row-16/colfam1:col-0/1301043191509/Put/vlen=8} !
 keyvalues={row-17/colfam1:col-0/1301043191593/Put/vlen=8} !
 keyvalues={row-18/colfam1:col-0/1301043191673/Put/vlen=8} !
 keyvalues={row-19/colfam1:col-0/1301043191771/Put/vlen=8} !
 keyvalues={row-2/colfam1:col-0/1301043190346/Put/vlen=7} !
 keyvalues={row-20/colfam1:col-0/1301043191841/Put/vlen=8} !
 keyvalues={row-21/colfam1:col-0/1301043191933/Put/vlen=8} !
 keyvalues={row-22/colfam1:col-0/1301043191998/Put/vlen=8} !



18                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




19                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Dedicated Filters

 •  Based directly on FilterBase class
 •  Often less useful for get() calls, since
    entire rows are filtered




20             ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                    or redistribution without written permission is prohibited.
Available Dedicated Filters

 •  Single Column Value Filter
     –  Filter rows based on one specific column
     –  Extra features
       •  “Filter if missing”
       •  “Get latest version only”
     –  Column must be part of the scan selection
       •  Or else it is all or nothing
     –  Also needs compare operation and an
        optional comparator


21                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)

 •  Single Column Value Exclude Filter
     –  Same as the one before but excludes the
        selection column
 •  Prefix Filter
     –  Based on prefix of row keys
     –  Can early out the scan!
       •  Combine with start row for best performance




22                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)
 •  Page Filter
     –  Allows pagination through rows
     –  Needs to be combined with setting the start row on
        subsequent scans
     –  Can early out the scan when limit is reached
 •  Key Only Filter
     –  Drop the value for every column
 •  First Key Only Filter
     –  Return only the first column key
     –  Useful for row counter, or get newest post type
        applications
     –  Can early out rest of row scan


23                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)

 •  Inclusive Stop Filter
     –  As opposed to the exclusive stop row, this
        filter will include the final row
 •  Timestamp Filter
     –  Takes list of timestamps to include in result
 •  Column Count Get Filter
     –  Used to limit number of columns returned by a
        get() call


24                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)

 •  Column Pagination Filter
     –  Allows to paginate through columns within a
        row
     –  Skips to offset parameter and returns
        limit columns
 •  Column Prefix Filter
     –  Analog to PrefixFilter, here for matching
        column qualifiers
 •  Random Row Filter

25               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                      or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




26                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Decorating Filters

 •  Extend filters to gain additional control
    over the returned data
 •  Skip Filter
     –  Skip entire row when a column is filtered
     –  Not all filters are compatible
 •  While Match Filter
     –  Aborts entire scan once the wrapped filter
        indicates a row or column is omitted


27                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




28                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Combining Filters

 •  Implemented by the FilterList class
     –  Wraps list of filters into a Filter compatible
        class
     –  Takes optional operator to decide how to
        handle the results of each wrapped filter
        (default: MUST_PASS_ALL)




29                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Combining Filters

 •  Filter lists can contain other filter lists
 •  Operator is fixed per list, but hierarchy
    allows to create combinations
 •  Using the proper List implementation
    helps controlling filter execution order




30              ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                     or redistribution without written permission is prohibited.
List<Filter> filters = new ArrayList<Filter>();

 Filter filter1 = new RowFilter(!
    CompareFilter.CompareOp.GREATER_OR_EQUAL, !
    new BinaryComparator(Bytes.toBytes("row-03"))); !
 filters.add(filter1); !
 Filter filter2 = new RowFilter(!
    CompareFilter.CompareOp.LESS_OR_EQUAL, !
    new BinaryComparator(Bytes.toBytes("row-06"))); !
 filters.add(filter2); !
 Filter filter3 = new QualifierFilter(!
    CompareFilter.CompareOp.EQUAL, !
    new RegexStringComparator("col-0[03]")); !
 filters.add(filter3);!
 FilterList filterList1 = new FilterList(filters); !
 …!
 FilterList filterList2 = new
 FilterList(FilterList.Operator.MUST_PASS_ONE, filters); !


31                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




32                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Custom Filter

 •  Allows users to add missing filters
 •  Either implement Filter interface or use
    FilterBase skeleton
 •  Provides hooks called at different stages
    of the read process




33            ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                   or redistribution without written permission is prohibited.
Filter Interface
 public interface Filter extends Writable { !
   public enum ReturnCode { !
     INCLUDE, SKIP, NEXT_COL, NEXT_ROW,!
     SEEK_NEXT_USING_HINT } !
   public void reset()!
   public boolean filterRowKey(byte[] buffer, !
     int offset, int length) !
   public boolean filterAllRemaining()!
   public ReturnCode filterKeyValue(KeyValue v)!
   public void filterRow(List<KeyValue> kvs)!
   public boolean hasFilterRow()!
   public boolean filterRow()!
   public KeyValue getNextKeyHint(KeyValue !
     currentKV) !
 !


34               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                      or redistribution without written permission is prohibited.
Filter Return Codes




35          ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                 or redistribution without written permission is prohibited.
Merge Reads




36        ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
               or redistribution without written permission is prohibited.
Filter Flow

 •  Filter hooks are called at
    different stages
 •  Seeks are done initially to
    find the next KeyValue
     –  Hint from previous filter
        invocation might help
 •  Early out checks improve
    performance


37      ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
             or redistribution without written permission is prohibited.
Example Code
public class CustomFilter extends FilterBase{ !
  private byte[] value = null; !
  private boolean filterRow = true; !
  public CustomFilter() { super(); }!
  public CustomFilter(byte[] value) { this.value = value; } !
  @Override

  public void reset() { this.filterRow = true; } !
  @Override !
  public ReturnCode filterKeyValue(KeyValue kv) {!
    if (Bytes.compareTo(value, kv.getValue()) == 0) { !
       filterRow = false; !
    } !
    return ReturnCode.INCLUDE; !
  } !
  @Override !
  public boolean filterRow() { return filterRow; } !
  ...!
} !
!
38                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Deploying Custom Filters

 •    Need to provide JAR file with filter class
 •    Deploy JAR to RegionServers
 •    Add JAR to HBASE_CLASSPATH
 •    Restart RegionServers

 •  Tip: Testing on cluster more involved, test
    on local machine first


39                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Summary




40         ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                or redistribution without written permission is prohibited.
Summary (cont.)




41         ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                or redistribution without written permission is prohibited.

More Related Content

What's hot

Impala presentation
Impala presentationImpala presentation
Impala presentation
trihug
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Cloudera, Inc.
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
Cloudera, Inc.
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
DataWorks Summit
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
larsgeorge
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
Swiss Big Data User Group
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
Matthew Blair
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
larsgeorge
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
enissoz
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
University of Moratuwa
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
HBaseCon
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
enissoz
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera, Inc.
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
mas4share
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
David Groozman
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
HBaseCon
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
 

What's hot (20)

Impala presentation
Impala presentationImpala presentation
Impala presentation
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 

Viewers also liked

HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
HBaseCon
 
A successful Git branching model
A successful Git branching model A successful Git branching model
A successful Git branching model abodeltae
 
Git branching-model
Git branching-modelGit branching-model
Git branching-model
Aaron Huang
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
DataWorks Summit/Hadoop Summit
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
 
Getting Git Right
Getting Git RightGetting Git Right
Getting Git Right
Sven Peters
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
Cloudera, Inc.
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
Cloudera, Inc.
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
Cloudera, Inc.
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
Cloudera, Inc.
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
Cloudera, Inc.
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
Cloudera, Inc.
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
Cloudera, Inc.
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
Cloudera, Inc.
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
Cloudera, Inc.
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
Cloudera, Inc.
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
Cloudera, Inc.
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
HBaseCon
 

Viewers also liked (20)

HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
A successful Git branching model
A successful Git branching model A successful Git branching model
A successful Git branching model
 
Git branching-model
Git branching-modelGit branching-model
Git branching-model
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Getting Git Right
Getting Git RightGetting Git Right
Getting Git Right
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 

Similar to HBaseCon 2012 | HBase Filtering - Lars George, Cloudera

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
Bryan Bende
 
Openfest15 MySQL Plugin Development
Openfest15 MySQL Plugin DevelopmentOpenfest15 MySQL Plugin Development
Openfest15 MySQL Plugin Development
Georgi Kodinov
 
Oracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version ControlOracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version Control
Chris Muir
 
OUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source CodeOUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source Code
Georgi Kodinov
 
44 Slides About 22 Modules
44 Slides About 22 Modules44 Slides About 22 Modules
44 Slides About 22 Modules
heyrocker
 
Oracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners GuideOracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners Guide
Courtney Llamas
 
MySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL FabricMySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL Fabric
Mark Swarbrick
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
 
FOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component InfrastructureFOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component Infrastructure
Georgi Kodinov
 
Oracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners GuideOracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners Guide
Courtney Llamas
 
The Power Boost of Atelier
The Power Boost of AtelierThe Power Boost of Atelier
The Power Boost of Atelier
Michelle Stolwyk
 
(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development
BIOVIA
 
Advance java session 17
Advance java session 17Advance java session 17
Advance java session 17
Smita B Kumar
 
Extending ZF & Extending With ZF
Extending ZF & Extending With ZFExtending ZF & Extending With ZF
Extending ZF & Extending With ZF
Ralph Schindler
 
Intro to Apache Kafka
Intro to Apache KafkaIntro to Apache Kafka
Intro to Apache Kafka
Jason Hubbard
 
Provisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack ManagerProvisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack Manager
Simon Haslam
 
Apache - Mod-Rewrite
Apache - Mod-RewriteApache - Mod-Rewrite
Apache - Mod-Rewrite
Marakana Inc.
 
Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated TestingMorgan Tocker
 
Developer day v2
Developer day v2Developer day v2
Developer day v2
AiougVizagChapter
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18
Cloudera, Inc.
 

Similar to HBaseCon 2012 | HBase Filtering - Lars George, Cloudera (20)

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Openfest15 MySQL Plugin Development
Openfest15 MySQL Plugin DevelopmentOpenfest15 MySQL Plugin Development
Openfest15 MySQL Plugin Development
 
Oracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version ControlOracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version Control
 
OUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source CodeOUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source Code
 
44 Slides About 22 Modules
44 Slides About 22 Modules44 Slides About 22 Modules
44 Slides About 22 Modules
 
Oracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners GuideOracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners Guide
 
MySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL FabricMySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL Fabric
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
FOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component InfrastructureFOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component Infrastructure
 
Oracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners GuideOracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners Guide
 
The Power Boost of Atelier
The Power Boost of AtelierThe Power Boost of Atelier
The Power Boost of Atelier
 
(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development
 
Advance java session 17
Advance java session 17Advance java session 17
Advance java session 17
 
Extending ZF & Extending With ZF
Extending ZF & Extending With ZFExtending ZF & Extending With ZF
Extending ZF & Extending With ZF
 
Intro to Apache Kafka
Intro to Apache KafkaIntro to Apache Kafka
Intro to Apache Kafka
 
Provisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack ManagerProvisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack Manager
 
Apache - Mod-Rewrite
Apache - Mod-RewriteApache - Mod-Rewrite
Apache - Mod-Rewrite
 
Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated Testing
 
Developer day v2
Developer day v2Developer day v2
Developer day v2
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

HBaseCon 2012 | HBase Filtering - Lars George, Cloudera

  • 1. HBaseCon, May 2012 HBase Filters Lars George, Solutions Architect
  • 2. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 2 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 3. About Me •  Solutions Architect @ Cloudera •  Apache HBase & Whirr Committer •  Author of HBase – The Definitive Guide •  Working with HBase since end of 2007 •  Organizer of the Munich OpenHUG •  Speaker at Conferences (Fosdem, Hadoop World) 3 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 4. Introduction to Filters •  Used in combination with get() and scan() API calls •  Steps: –  Create Filter instance –  Create Get or Scan instance –  Assign Filter to Get or Scan –  Call API and enjoy •  More fine-grained control over what is returned to the client 4 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 5. Filter Features •  Allow client to further narrow down what is retrieved –  Not just per row or column key, or per column family •  Predicate Pushdown –  Move filtering from client to server to reduce network traffic •  Varying performance implications, dependent on the use-case 5 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 6. Filter Pushdown 6 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 7. Filter Features (cont.) •  Filters have access to the entire row to decide its fate –  Access to KeyValue instances to check row keys, column qualifiers, timestamps, or values •  Scan batching might conflict with the above and might trigger an “Incompatible Filter” exception –  Example: DependentColumnFilter •  There is no cross invocation state –  Cannot filter rows based on dependent rows 7 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 8. Available Filters •  Many filters are supplied by HBase –  Based on row key, column family, or column qualifier –  Paging through rows and columns –  Based on dependencies •  Write your own filters –  Use FilterBase class to get a no-op skeleton and fill in the gaps 8 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 9. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 9 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 10. Comparison Filters •  Based on CompareFilter class •  Adds the compare() method to FilterBase! •  Takes operator that defines how the comparison is performed –  Predefined by client API •  Also needs a comparator to do the actual check –  HBase supplies a large set 10 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 11. Comparison Operators 11 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 12. Comparators 12 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 13. Comparison Filters (cont.) •  Not all combinations of operator and comparator make sense –  For example, the SubstringComparator replies only 0 (match) and 1(no match) –  Only EQUAL and NOT_EQUAL are useful –  Using other operators is allowed but will most likely yield unexpected results 13 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 14. Comparison Filters (cont.) •  HBase filters are usually filtering data out •  Comparison filters work in reverse as they include matching data –  Be mindful when selecting the comparison operator! 14 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 15. Available Comparison Filters •  Row Filter –  Based on row keys comparisons •  Family Filter –  Based on column family names •  Qualifier Filter –  Based on column names, aka qualifiers •  Value Filter –  Based on the actual value of a column 15 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 16. Available Comparison Filters (cont.) •  Dependent Column Filter –  Based on a timestamp of a reference column –  Includes all columns that have the same timestamp –  Implies that the entire row is accessible, since batching will not have access to the reference column •  No scanner batching allowed! –  Useful for loading interdependent changes within a row 16 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 17. Example Code Scan scan = new Scan();
 scan.addColumn(Bytes.toBytes("colfam1"), ! Bytes.toBytes("col-0")); ! Filter filter = new RowFilter(! CompareFilter.CompareOp.LESS_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-22"))); scan.setFilter(filter);
 ResultScanner scanner = table.getScanner(scan); for (Result res : scanner) { ! System.out.println(res); ! } ! scanner.close(); ! ! 17 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 18. Example Ouput keyvalues={row-1/colfam1:col-0/1301043190260/Put/vlen=7} ! keyvalues={row-10/colfam1:col-0/1301043190908/Put/vlen=8} ! keyvalues={row-100/colfam1:col-0/1301043195275/Put/vlen=9} ! keyvalues={row-11/colfam1:col-0/1301043190982/Put/vlen=8} ! keyvalues={row-12/colfam1:col-0/1301043191040/Put/vlen=8} ! keyvalues={row-13/colfam1:col-0/1301043191172/Put/vlen=8} ! keyvalues={row-14/colfam1:col-0/1301043191318/Put/vlen=8} ! keyvalues={row-15/colfam1:col-0/1301043191429/Put/vlen=8} ! keyvalues={row-16/colfam1:col-0/1301043191509/Put/vlen=8} ! keyvalues={row-17/colfam1:col-0/1301043191593/Put/vlen=8} ! keyvalues={row-18/colfam1:col-0/1301043191673/Put/vlen=8} ! keyvalues={row-19/colfam1:col-0/1301043191771/Put/vlen=8} ! keyvalues={row-2/colfam1:col-0/1301043190346/Put/vlen=7} ! keyvalues={row-20/colfam1:col-0/1301043191841/Put/vlen=8} ! keyvalues={row-21/colfam1:col-0/1301043191933/Put/vlen=8} ! keyvalues={row-22/colfam1:col-0/1301043191998/Put/vlen=8} ! 18 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 19. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 19 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 20. Dedicated Filters •  Based directly on FilterBase class •  Often less useful for get() calls, since entire rows are filtered 20 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 21. Available Dedicated Filters •  Single Column Value Filter –  Filter rows based on one specific column –  Extra features •  “Filter if missing” •  “Get latest version only” –  Column must be part of the scan selection •  Or else it is all or nothing –  Also needs compare operation and an optional comparator 21 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 22. Available Dedicated Filters (cont.) •  Single Column Value Exclude Filter –  Same as the one before but excludes the selection column •  Prefix Filter –  Based on prefix of row keys –  Can early out the scan! •  Combine with start row for best performance 22 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 23. Available Dedicated Filters (cont.) •  Page Filter –  Allows pagination through rows –  Needs to be combined with setting the start row on subsequent scans –  Can early out the scan when limit is reached •  Key Only Filter –  Drop the value for every column •  First Key Only Filter –  Return only the first column key –  Useful for row counter, or get newest post type applications –  Can early out rest of row scan 23 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 24. Available Dedicated Filters (cont.) •  Inclusive Stop Filter –  As opposed to the exclusive stop row, this filter will include the final row •  Timestamp Filter –  Takes list of timestamps to include in result •  Column Count Get Filter –  Used to limit number of columns returned by a get() call 24 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 25. Available Dedicated Filters (cont.) •  Column Pagination Filter –  Allows to paginate through columns within a row –  Skips to offset parameter and returns limit columns •  Column Prefix Filter –  Analog to PrefixFilter, here for matching column qualifiers •  Random Row Filter 25 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 26. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 26 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 27. Decorating Filters •  Extend filters to gain additional control over the returned data •  Skip Filter –  Skip entire row when a column is filtered –  Not all filters are compatible •  While Match Filter –  Aborts entire scan once the wrapped filter indicates a row or column is omitted 27 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 28. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 28 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 29. Combining Filters •  Implemented by the FilterList class –  Wraps list of filters into a Filter compatible class –  Takes optional operator to decide how to handle the results of each wrapped filter (default: MUST_PASS_ALL) 29 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 30. Combining Filters •  Filter lists can contain other filter lists •  Operator is fixed per list, but hierarchy allows to create combinations •  Using the proper List implementation helps controlling filter execution order 30 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 31. List<Filter> filters = new ArrayList<Filter>();
 Filter filter1 = new RowFilter(! CompareFilter.CompareOp.GREATER_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-03"))); ! filters.add(filter1); ! Filter filter2 = new RowFilter(! CompareFilter.CompareOp.LESS_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-06"))); ! filters.add(filter2); ! Filter filter3 = new QualifierFilter(! CompareFilter.CompareOp.EQUAL, ! new RegexStringComparator("col-0[03]")); ! filters.add(filter3);! FilterList filterList1 = new FilterList(filters); ! …! FilterList filterList2 = new FilterList(FilterList.Operator.MUST_PASS_ONE, filters); ! 31 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 32. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 32 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 33. Custom Filter •  Allows users to add missing filters •  Either implement Filter interface or use FilterBase skeleton •  Provides hooks called at different stages of the read process 33 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 34. Filter Interface public interface Filter extends Writable { ! public enum ReturnCode { ! INCLUDE, SKIP, NEXT_COL, NEXT_ROW,! SEEK_NEXT_USING_HINT } ! public void reset()! public boolean filterRowKey(byte[] buffer, ! int offset, int length) ! public boolean filterAllRemaining()! public ReturnCode filterKeyValue(KeyValue v)! public void filterRow(List<KeyValue> kvs)! public boolean hasFilterRow()! public boolean filterRow()! public KeyValue getNextKeyHint(KeyValue ! currentKV) ! ! 34 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 35. Filter Return Codes 35 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 36. Merge Reads 36 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 37. Filter Flow •  Filter hooks are called at different stages •  Seeks are done initially to find the next KeyValue –  Hint from previous filter invocation might help •  Early out checks improve performance 37 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 38. Example Code public class CustomFilter extends FilterBase{ ! private byte[] value = null; ! private boolean filterRow = true; ! public CustomFilter() { super(); }! public CustomFilter(byte[] value) { this.value = value; } ! @Override
 public void reset() { this.filterRow = true; } ! @Override ! public ReturnCode filterKeyValue(KeyValue kv) {! if (Bytes.compareTo(value, kv.getValue()) == 0) { ! filterRow = false; ! } ! return ReturnCode.INCLUDE; ! } ! @Override ! public boolean filterRow() { return filterRow; } ! ...! } ! ! 38 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 39. Deploying Custom Filters •  Need to provide JAR file with filter class •  Deploy JAR to RegionServers •  Add JAR to HBASE_CLASSPATH •  Restart RegionServers •  Tip: Testing on cluster more involved, test on local machine first 39 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 40. Summary 40 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 41. Summary (cont.) 41 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.