HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures

WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
HBase SEP & Indexer
Mining needles from massive haystacks
Steven Noels
HBaseCon, 2013-06-13, San Francisco

HBase is a great haystack
(but where are the needles?)
What HBase Offers
• rows of column family-contained
columns containing timestamp-
versioned cells
• rowkey-based random access
through sorted row order
• get / put / delete / scan
operations
• scale-out across region servers
What Most People Need
• sorted rows of column family-
contained columns containing
timestamp-versioned cells
• rowkey-based random access
through sorted row order
• get / put / delete / scan
operations
• scale-out across region servers
• fast (indexed) random access
using secondary column keys
• index generation and
maintenance

• Lily RowLog
• hbase-solr-dataimport
Import HBase data into Solr using the DataImportHandler
https://code.google.com/p/hbase-solr-dataimport/
• HBasene
HBase as the backing store for the TF-IDF representations for Lucene
https://github.com/akkumar/hbasene
• hbase-secondary-index
https://github.com/mayanhui/hbase-secondary-index
• hbase-indexed
https://github.com/danix800/hbase-indexed
• Culvert
A Robust Framework for Secondary Indexing
https://github.com/jyates/culvert
• Co-processors
Earlier attempts
HBase Indexing and Search
1. many data prerequisites
2. leaky abstractions
3. no drop-in approach

• maintaining alternate data views
• aggregates
• counts
• general side-effects to updates
• keeping secondary systems in lock-step sync with updates
Indexing isn’t just about Search
1.
HBase update
2.
trigger
3.
process

HBase ‘Side-Effect Processor’
• A mechanism for triggering and
processing side-effect events, based
upon HBase updates
Companion project: HBase Indexer
• Maps HBase row updates into Solr
index updates
The Solution: HBase SEP + Indexer
Open Source, Apache License
http://github.com/NGDATA/hbase-sep
http://github.com/NGDATA/hbase-indexer

• structured ad-hoc search of HBase-backed Solr indexes
• faceted search
• auxiliary index or view structures
• observation matrices for CF-style recommendations
• maintenance of auxiliary cross-reference tables (link mgmt)
• computing data aggregates, counter maintenance
Use cases for HBase SEP & Indexing
What about co-processors? Sysadmins
don’t like running application code on
HBase region servers.

Use Case: Faceted Search in Lily
facets
resultsetcount
facet counts
HBase
Solr
Cloud

Approach:
• SEP = fake HBase region
servers, pass on update
events to Indexer
• light-weight, embeddable
process
• piggybacks on HBase
replication mechanism
• Indexer = maps HBase HLog
update events into Solr
updates
• no impact on write path
SEP / Trigger fundamentals
Using HBase replication for Indexing triggering
Fake HBase
‘Cluster’
SEP + Indexer
Index
(Solr)

SEP & Indexer data flow anatomy

• option 1: co-locate with HBase Region Servers
Deployment
HBase RS SEP+IDX Solr
ZooKeeper arbitration

• option 2: co-locate with Solr index engine nodes
Deployment
HBase RS SEP+IDX Solr
ZooKeeperarbitration

HBase Indexer: two options

• row- and column-based mapping
HBase Indexer features
rowkey col1 col2 col3 col4
1
1
42 3
2 3 4
rowkey row content
1
3
5
2
4
HBase Solr(Cloud)
row:
column:

• configurable data extraction mechanisms
• HBase Bytes
• Tika / SolrCell (+ content extraction)
• optional formatters
• non-programmatic
indexer configuration
• index mgmt CLI
HBase Indexer features

• http://github.com/NGDATA/hbase-sep and hbase-indexer
• easy setup:
1. switch on HBase replication, and …
2. profit.
• few prerequisites on data model
• multiple approaches for mapping HBase rows to Solr
• can be used for other secondary operations
• open source, Apache license
Questions? stevenn@ngdata.com
Wrap-up
HBase SEP & Indexer

HBase SEP & Indexer are part of Cloudera Search
➜ joint development between Cloudera & NGDATA
➜ try it out: www.cloudera.com/downloads

HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures

More Related Content

What's hot

Viewers also liked

Similar to HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures

More from Cloudera, Inc.

Recently uploaded

HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures