SlideShare a Scribd company logo

HBaseCon 2013: Full-Text Indexing for Apache HBase

Presented by: Maryann Xue, Intel

1 of 13
Full-text indexing for HBase
Mary-Ann Xue
wei.xue@intel.com
Overview
• Why Full-text Indexing for HBase
• Organize Indices
• Index Building
• Index Splitting
• Index Searching
• Performance Results
Why Full-text Indexing for HBase?
• Fast retrieval by more than one column
– HBase: lack of indexing for non-key columns
• Effective search for non-exact matches:
containing, starting-with, ending-with, etc.
– HBase: supports only byte-order indexing; no text
structure awareness
HBase table “t1”
Organize Indices
• One Lucene directory for one HBase region
Region t1,,…
…
Region t1,aa,…
Region t1,bb,…
Region t1,xx,…
Lucene indices on HDFS
Dir:
t1,,…
Dir:
t1,xx,…
Dir:
t1,aa,…
Dir:
t1,bb,…
Organize Indices
• One Lucene document for one HBase record
rowkey  document field “ID”
indexed column(s)  a user-specified field
r1 (rowkey) v1 (f:c1) v2 (f:c2) v3 (f:c3) v4 (f:c4)HBase record
ID field_a field_bLucene document
Index Building
• Implemented with HBase Coprocessors
(Region Observers)
• The main hooks are:
Updates (Put / Delete)
WAL restore
Region open/close
Memstore flush
Region splitting
Ad

Recommended

HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_finalasterix_smartplatf
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardMatthew Blair
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 

More Related Content

What's hot

HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseCloudera, Inc.
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesHBaseCon
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseenissoz
 
HBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBaseHBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBaseHBaseCon
 
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase Cloudera, Inc.
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future HBaseCon
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache HiveHBaseCon
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicasenissoz
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017larsgeorge
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWSHBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWSHBaseCon
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesData Con LA
 

What's hot (20)

HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
 
HBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBaseHBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBase
 
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWSHBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use Cases
 

Viewers also liked

20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case StudiesEvan Liu
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems Cloudera, Inc.
 
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...Cloudera, Inc.
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...Cloudera, Inc.
 
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase Cloudera, Inc.
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - SematextHBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - SematextCloudera, Inc.
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseCloudera, Inc.
 
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...Cloudera, Inc.
 
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...Cloudera, Inc.
 
Real-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudReal-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudHBaseCon
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBaseHBaseCon
 

Viewers also liked (20)

20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
 
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
 
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - SematextHBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBase
 
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
 
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
 
Real-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the CloudReal-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the Cloud
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
 

Similar to HBaseCon 2013: Full-Text Indexing for Apache HBase

Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base智杰 付
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxBhavanaHotchandani
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Rohit Agrawal
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Aman Sinha
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxvishwasgarade1
 
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseJesse Yates
 
Ils on a shoe string budget
Ils on a shoe string budgetIls on a shoe string budget
Ils on a shoe string budgetJolene81
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetuprcmuir
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6Rohit Agrawal
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashingAmi Ranjit
 
Bi publisher for hyperion planning
Bi publisher for hyperion planningBi publisher for hyperion planning
Bi publisher for hyperion planningMaxim Levko
 

Similar to HBaseCon 2013: Full-Text Indexing for Apache HBase (20)

Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Apache hive
Apache hiveApache hive
Apache hive
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 
ACS DataMart_ppt
ACS DataMart_pptACS DataMart_ppt
ACS DataMart_ppt
 
ACS DataMart_ppt
ACS DataMart_pptACS DataMart_ppt
ACS DataMart_ppt
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptx
 
Musings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBaseMusings on Secondary Indexing in HBase
Musings on Secondary Indexing in HBase
 
Ils on a shoe string budget
Ils on a shoe string budgetIls on a shoe string budget
Ils on a shoe string budget
 
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr MeetupImproved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
Improved Search With Lucene 4.0 - NOVA Lucene/Solr Meetup
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashing
 
HBase
HBaseHBase
HBase
 
01 hbase
01 hbase01 hbase
01 hbase
 
Bi publisher for hyperion planning
Bi publisher for hyperion planningBi publisher for hyperion planning
Bi publisher for hyperion planning
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Breaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologyBreaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologySafe Software
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, GoogleISPMAIndia
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stackSummit
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1Inbay UK
 
Bit N Build Poland
Bit N Build PolandBit N Build Poland
Bit N Build PolandGDSC PJATK
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaISPMAIndia
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringMassimo Talia
 
Bringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptxBringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptxMaarten Balliauw
 
H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxMemory Fabric Forum
 
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...DianaGray10
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdfLLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdfThomas Poetter
 
Artificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdfArtificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdfIsidro Navarro
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...ISPMAIndia
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsEvangelia Mitsopoulou
 
Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?MENGSAYLOEM1
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERNRonnelBaroc
 
AI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementAI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementMimmo Squillace
 
Importance of magazines in education ppt
Importance of magazines in education pptImportance of magazines in education ppt
Importance of magazines in education pptsafnarafeek2002
 
My self introduction to know others abut me
My self  introduction to know others abut meMy self  introduction to know others abut me
My self introduction to know others abut meManoj Prabakar B
 
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes", Volodymyr TsapFwdays
 

Recently uploaded (20)

Breaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologyBreaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI Technology
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stack
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1
 
Bit N Build Poland
Bit N Build PolandBit N Build Poland
Bit N Build Poland
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineering
 
Bringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptxBringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptx
 
H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptx
 
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdfLLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
 
Artificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdfArtificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdf
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applications
 
Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
 
AI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementAI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvement
 
Importance of magazines in education ppt
Importance of magazines in education pptImportance of magazines in education ppt
Importance of magazines in education ppt
 
My self introduction to know others abut me
My self  introduction to know others abut meMy self  introduction to know others abut me
My self introduction to know others abut me
 
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
 

HBaseCon 2013: Full-Text Indexing for Apache HBase

  • 1. Full-text indexing for HBase Mary-Ann Xue wei.xue@intel.com
  • 2. Overview • Why Full-text Indexing for HBase • Organize Indices • Index Building • Index Splitting • Index Searching • Performance Results
  • 3. Why Full-text Indexing for HBase? • Fast retrieval by more than one column – HBase: lack of indexing for non-key columns • Effective search for non-exact matches: containing, starting-with, ending-with, etc. – HBase: supports only byte-order indexing; no text structure awareness
  • 4. HBase table “t1” Organize Indices • One Lucene directory for one HBase region Region t1,,… … Region t1,aa,… Region t1,bb,… Region t1,xx,… Lucene indices on HDFS Dir: t1,,… Dir: t1,xx,… Dir: t1,aa,… Dir: t1,bb,…
  • 5. Organize Indices • One Lucene document for one HBase record rowkey  document field “ID” indexed column(s)  a user-specified field r1 (rowkey) v1 (f:c1) v2 (f:c2) v3 (f:c3) v4 (f:c4)HBase record ID field_a field_bLucene document
  • 6. Index Building • Implemented with HBase Coprocessors (Region Observers) • The main hooks are: Updates (Put / Delete) WAL restore Region open/close Memstore flush Region splitting
  • 7. Index Building: updates and WAL restore • In prePut(), postDelete() or preWALRestore(), a generated Lucene document is added, updated or deleted accordingly. put request “row1” add record into memstore add Lucene document Y update-mode && missing columns? look up HBase table for those missing columns of “row1” compose Lucene document with new & existing values compose Lucene document ignoring missing columns N
  • 8. Index Building: updates and WAL restore • For deletes, always look up the HBase table to see if all columns for index building have been deleted. delete request “row1” add delete mark into memstore update Lucene document Y all columns for index deleted? delete Lucene documents having field “ID” as “row1” compose Lucene document with existing and deleted values N look up HBase table for any index column of “row1”
  • 9. Index Building: memstore flush • Fork a tread to do Lucene index commit as memstore flush starts; join this thread in postFlush() Memstore flush starts Flush memstore to storefiles Commit Lucene index segments Memstore flush completes 1 1 2
  • 10. Index Splitting • Split Lucene indices on region splitting: Copy indices from parent dir to daughter dirs • Problem: Index splitting is time-consuming Should not block updates to new daughter regions
  • 11. Daughter region indices Index Splitting (optimized for non update-mode) • Split indices in a background thread • Temp directory for new updates • Merge old and new dirs when copying is done Main index dir (under copying) Index dir of parent region Temp index dir for new updates HBase updates Will be merged after index copying is done
  • 12. Index Searching • coprocessor endpoint (enhanced with local result combiner) Index-search Client HRegionServer … HDFS Endpoint dispatcher & combiner HRegion Index- searcher HRegion Index- searcher HRegion Index- searcher HRegionServer … Endpoint dispatcher & combiner HRegion Index- searcher HRegion Index- searcher HRegion Index- searcher Index DIR Index DIR Index DIR … Index DIR Index DIR Index DIR …
  • 13. Performance Results • Total record count: 500,000,000 • Record size: 1KB • Memstore: 128MB • 6 nodes: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (24 cores), memory 48G • ~1s search time out of 10 billion records 42735 37537 1 Insertion without pre-split regions W/O Index W/ Index 63613 55928 1 Insertion with pre-split regions W/O Index W/ Index record / sec record / sec

Editor's Notes

  1. Consistency, locality, high throughput
  2. Define: one or multiple columns to one fieldOne doc for each recordAdditional ID field
  3. Use zookeeper nodes for track of index splitting status, so that when opening an indexed region, we know if was undergoing index splitting before being moved.Parent directory cleanup performed in Hmaster: a chore thread IndexSplitCleanerWill need more Lucene capabilities for this optimization to be enabled in update-mode