Your SlideShare is downloading. ×
0
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures

940

Published on

Presented by: Steven Noels, NGDATA

Presented by: Steven Noels, NGDATA

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
940
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential HBase SEP & Indexer Mining needles from massive haystacks Steven Noels HBaseCon, 2013-06-13, San Francisco
  • 2. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential HBase is a great haystack (but where are the needles?) What HBase Offers • rows of column family-contained columns containing timestamp- versioned cells • rowkey-based random access through sorted row order • get / put / delete / scan operations • scale-out across region servers What Most People Need • sorted rows of column family- contained columns containing timestamp-versioned cells • rowkey-based random access through sorted row order • get / put / delete / scan operations • scale-out across region servers • fast (indexed) random access using secondary column keys • index generation and maintenance
  • 3. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • Lily RowLog • hbase-solr-dataimport Import HBase data into Solr using the DataImportHandler https://code.google.com/p/hbase-solr-dataimport/ • HBasene HBase as the backing store for the TF-IDF representations for Lucene https://github.com/akkumar/hbasene • hbase-secondary-index https://github.com/mayanhui/hbase-secondary-index • hbase-indexed https://github.com/danix800/hbase-indexed • Culvert A Robust Framework for Secondary Indexing https://github.com/jyates/culvert • Co-processors Earlier attempts HBase Indexing and Search 1. many data prerequisites 2. leaky abstractions 3. no drop-in approach
  • 4. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • maintaining alternate data views • aggregates • counts • general side-effects to updates • keeping secondary systems in lock-step sync with updates Indexing isn’t just about Search 1. HBase update 2. trigger 3. process
  • 5. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential HBase ‘Side-Effect Processor’ • A mechanism for triggering and processing side-effect events, based upon HBase updates Companion project: HBase Indexer • Maps HBase row updates into Solr index updates The Solution: HBase SEP + Indexer Open Source, Apache License http://github.com/NGDATA/hbase-sep http://github.com/NGDATA/hbase-indexer
  • 6. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • structured ad-hoc search of HBase-backed Solr indexes • faceted search • auxiliary index or view structures • observation matrices for CF-style recommendations • maintenance of auxiliary cross-reference tables (link mgmt) • computing data aggregates, counter maintenance Use cases for HBase SEP & Indexing What about co-processors? Sysadmins don’t like running application code on HBase region servers.
  • 7. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential Use Case: Faceted Search in Lily facets resultsetcount facet counts HBase Solr Cloud
  • 8. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential Approach: • SEP = fake HBase region servers, pass on update events to Indexer • light-weight, embeddable process • piggybacks on HBase replication mechanism • Indexer = maps HBase HLog update events into Solr updates • no impact on write path SEP / Trigger fundamentals Using HBase replication for Indexing triggering Fake HBase ‘Cluster’ SEP + Indexer Index (Solr)
  • 9. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential SEP & Indexer data flow anatomy
  • 10. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • option 1: co-locate with HBase Region Servers Deployment HBase RS SEP+IDX Solr ZooKeeper arbitration
  • 11. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • option 2: co-locate with Solr index engine nodes Deployment HBase RS SEP+IDX Solr ZooKeeperarbitration
  • 12. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential HBase Indexer: two options
  • 13. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • row- and column-based mapping HBase Indexer features rowkey col1 col2 col3 col4 1 1 42 3 2 3 4 rowkey row content 1 3 5 2 4 HBase Solr(Cloud) row: column:
  • 14. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • configurable data extraction mechanisms • HBase Bytes • Tika / SolrCell (+ content extraction) • optional formatters • non-programmatic indexer configuration • index mgmt CLI HBase Indexer features
  • 15. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential • http://github.com/NGDATA/hbase-sep and hbase-indexer • easy setup: 1. switch on HBase replication, and … 2. profit. • few prerequisites on data model • multiple approaches for mapping HBase rows to Solr • can be used for other secondary operations • open source, Apache license Questions? stevenn@ngdata.com Wrap-up HBase SEP & Indexer
  • 16. WWW.NGDATA.COMThe information herein is the property of NGDATA and is considered proprietary and confidential HBase SEP & Indexer are part of Cloudera Search ➜ joint development between Cloudera & NGDATA ➜ try it out: www.cloudera.com/downloads

×