WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
HBase SEP & ...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
HBase is a g...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
• Lily RowLo...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
• maintainin...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
HBase ‘Side-...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
• structured...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
Use Case: Fa...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
Approach:
• ...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
SEP & Indexe...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
• option 1: ...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
• option 2: ...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
HBase Indexe...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
• row- and c...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
• configurab...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
• http://git...
WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential
HBase SEP & ...
Upcoming SlideShare
Loading in …5
×

HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures

2,777 views

Published on

Presented by: Steven Noels, NGDATA

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,777
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures

  1. 1. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential HBase SEP & Indexer Mining needles from massive haystacks Steven Noels HBaseCon, 2013-06-13, San Francisco
  2. 2. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential HBase is a great haystack (but where are the needles?) What HBase Offers • rows of column family-contained columns containing timestamp- versioned cells • rowkey-based random access through sorted row order • get / put / delete / scan operations • scale-out across region servers What Most People Need • sorted rows of column family- contained columns containing timestamp-versioned cells • rowkey-based random access through sorted row order • get / put / delete / scan operations • scale-out across region servers • fast (indexed) random access using secondary column keys • index generation and maintenance
  3. 3. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • Lily RowLog • hbase-solr-dataimport Import HBase data into Solr using the DataImportHandler https://code.google.com/p/hbase-solr-dataimport/ • HBasene HBase as the backing store for the TF-IDF representations for Lucene https://github.com/akkumar/hbasene • hbase-secondary-index https://github.com/mayanhui/hbase-secondary-index • hbase-indexed https://github.com/danix800/hbase-indexed • Culvert A Robust Framework for Secondary Indexing https://github.com/jyates/culvert • Co-processors Earlier attempts HBase Indexing and Search 1. many data prerequisites 2. leaky abstractions 3. no drop-in approach
  4. 4. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • maintaining alternate data views • aggregates • counts • general side-effects to updates • keeping secondary systems in lock-step sync with updates Indexing isn’t just about Search 1. HBase update 2. trigger 3. process
  5. 5. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential HBase ‘Side-Effect Processor’ • A mechanism for triggering and processing side-effect events, based upon HBase updates Companion project: HBase Indexer • Maps HBase row updates into Solr index updates The Solution: HBase SEP + Indexer Open Source, Apache License http://github.com/NGDATA/hbase-sep http://github.com/NGDATA/hbase-indexer
  6. 6. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • structured ad-hoc search of HBase-backed Solr indexes • faceted search • auxiliary index or view structures • observation matrices for CF-style recommendations • maintenance of auxiliary cross-reference tables (link mgmt) • computing data aggregates, counter maintenance Use cases for HBase SEP & Indexing What about co-processors? Sysadmins don’t like running application code on HBase region servers.
  7. 7. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential Use Case: Faceted Search in Lily facets resultsetcount facet counts HBase Solr Cloud
  8. 8. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential Approach: • SEP = fake HBase region servers, pass on update events to Indexer • light-weight, embeddable process • piggybacks on HBase replication mechanism • Indexer = maps HBase HLog update events into Solr updates • no impact on write path SEP / Trigger fundamentals Using HBase replication for Indexing triggering Fake HBase ‘Cluster’ SEP + Indexer Index (Solr)
  9. 9. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential SEP & Indexer data flow anatomy
  10. 10. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • option 1: co-locate with HBase Region Servers Deployment HBase RS SEP+IDX Solr ZooKeeper arbitration
  11. 11. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • option 2: co-locate with Solr index engine nodes Deployment HBase RS SEP+IDX Solr ZooKeeperarbitration
  12. 12. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential HBase Indexer: two options
  13. 13. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • row- and column-based mapping HBase Indexer features rowkey col1 col2 col3 col4 1 1 42 3 2 3 4 rowkey row content 1 3 5 2 4 HBase Solr(Cloud) row: column:
  14. 14. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • configurable data extraction mechanisms • HBase Bytes • Tika / SolrCell (+ content extraction) • optional formatters • non-programmatic indexer configuration • index mgmt CLI HBase Indexer features
  15. 15. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • http://github.com/NGDATA/hbase-sep and hbase-indexer • easy setup: 1. switch on HBase replication, and … 2. profit. • few prerequisites on data model • multiple approaches for mapping HBase rows to Solr • can be used for other secondary operations • open source, Apache license Questions? stevenn@ngdata.com Wrap-up HBase SEP & Indexer
  16. 16. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential HBase SEP & Indexer are part of Cloudera Search ➜ joint development between Cloudera & NGDATA ➜ try it out: www.cloudera.com/downloads

×