SlideShare a Scribd company logo
1 of 16
HBase Secondary Indexing
Implementation and analysis of secondary index in HBase
Gino McCarty, Ranjan Kumar
What’s next
● HBase Introduction
● HBase Scan
● Secondary index
● CoProcessors
● Secondary Index using CoProcessors
● Testing Infrastructure
● Benchmarks
● Challenges and conclusions
HBase
● Open source, non-relational, distributed database built on top of Hadoop.
● Columnar data store, also called Tabular data store.
● Architecture - Hadoop, HDFS, Row keys, Column Families
Image source: http://blogs.igalia.com/dpino/2012/10/31/introduction-to-hbase-and-nosql-systems/
EmpId Lastname Firstname Salary
1 Smith Joe 40000
2 Jones Mary 50000
3 Johnson Cathy 44000
Row-oriented
1,Smith,Joe,40000;
2,Jones,Mary,50000;
3,Johnson,Cathy,44000;
Column-oriented
1,2,3;
Smith,Jones,Johnson;
Joe,Mary,Cathy;
40000,50000,44000;
HBase
HBase Scan
Scan - API for data retrieval
● Scan scan = new Scan(startRow, stopRow)
● scan.addFamily(f);
● scan.addColumn(c);
Filters - control the amount of returned to the client
● scan.setFilter(new ValueFilter(CompareOp.EQUAL, new
SubStringComparator(“3”)));
Optimizations - caching, batching (still scans the entire table- client timeouts,
lease expiring)
● scan.setCaching(1000);
● scan.setBatch(1000);
Secondary Index
● The way you design your row key affects everything
● Secondary index to avoid table scans. Indexes can be
stored in another index table.
● Create the index on a column and store the row keys
corresponding to that column in a separate table.
● RegionServers use the index table to perform selective
scan.
HBase and CoProcessor Architecture
Trigger Based Observer
System
● Region Observer
● Master Observer
● Log Observer
EndPoints
(Stored Procedures)
● Server Side Execution
● Distributed
● Can be called by Clients
or Observers
Secondary Indexing utilizing CoProcessors
● For every index we
create
o Regions have
their own
assigned index
o Indexes and
regions are kept
together
o Immune to
Splits and
Server outages
Sample Code Implementation
● Optimizing for Queries based
on Cartesian Products of Key
Values
● Example, what was the air time
of all carriers that flew on a
wednesday.
HbaseAdmin admin = new IndexAdmin(conf);
HTableDescriptor htd = new
HTableDescriptor(TableName.valueOf(tableName));
HColumnDescriptor hcd = new HColumnDescriptor(columnFamily);
htd.addFamily(hcd);
IndexSpecification iSpec = new IndexSpecification(indexName);
iSpec.addIndexColumn(hcd,indexColumnQualifier,
ValueType.String, 10);
TableIndices tableIndices = new TableIndices();
tableIndices.addIndex(iSpec);
htd.setValue(Constants.INDEX_SPEC_KEY,
tableIndices.toByteArray());
admin.createTable(htd);
Testing Environment
● Testing Platforms:
o Hadoop-2.2.0
o HBase-0.98.8
 Single Laptop Benchmark
● HBase Pseudo Distributed Mode
 QEMU Based 3 Node Virtual Cluster
● 2GB Ram and 2 Core Intel i7 per node
● Data Set: 7 Million Rows DataSet
- Airline On-Time Statistics and Delay Causes
- http://stat-computing.org/dataexpo/2009/the-data.html
- Roughly 80GB DataStore Size Per Node
HIndex
● Secondary Index for HBase
● Implementation by Huawei developers
● Uses coprocessors to inject code into master and region
servers.
● Creates an additional index table for every column on which
an index is desired.
HIndex conclusions
● Limited to HBase .94 version.
● Need to build the code with hbase source i.e. it is not
available independently as a jar.
● Very little documentation and support.
● Unstable at many many edgecases
Benchmarks
Query1:
new RowFilter(CompareOp.EQUAL, new SubstringComparator(“WN”)
38ms per record
Query2:
new ValueFilter(CompareOp.GREATER,new BinaryComparator(60));
3.9ms per record
Query Time taken on
single node(ms)
Total Records
on single node
Time taken on
cluster(ms)
Total Records
on cluster
Query1 50910 346435 62291 689409
Query2 207757 3807652 125777 7785038
Average Performance over 3 Runs
Table Name Keys Column Families and Values
flight_data year,
month,
dayofMonth,
dayOfWeek,
Departure
Time,
Carrier
(flight)
Month,
dayOfWeek, Carrier,
Flight Number, Origin,
Destination,
(trip)
Distance,
AirTime,
Arrival Delay
Challenges & Conclusions
Indexing data across regions
● Co-locating index with data in the same region
● Make column family a part of the index
Handling region split
● Split the index table by having a custom splitter as per the row key
distribution of the data table
Secondary indexes improve query performance at the expense of extra space.
CoProcessors add extra overhead for each query processing.
Questions
References:
HBase - CoProcessors - https://blogs.apache.org/hbase/entry/coprocessor_introduction
HIndex - https://github.com/Huawei-Hadoop/hindex
HIndex Overview - http://www.slideshare.net/rajeshbabuchintaguntla/apache-con-hindex

More Related Content

What's hot

How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 

What's hot (20)

How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
Background Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbitBackground Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbit
 
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky HaryadiPGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
Automating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQLAutomating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQL
 
Hadoop enhancements using next gen IA technologies
Hadoop enhancements using next gen IA technologiesHadoop enhancements using next gen IA technologies
Hadoop enhancements using next gen IA technologies
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication Cheatsheet
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
 
Upgrading To The New Map Reduce API
Upgrading To The New Map Reduce APIUpgrading To The New Map Reduce API
Upgrading To The New Map Reduce API
 
GitLab PostgresMortem: Lessons Learned
GitLab PostgresMortem: Lessons LearnedGitLab PostgresMortem: Lessons Learned
GitLab PostgresMortem: Lessons Learned
 
Aerospike Go Language Client
Aerospike Go Language ClientAerospike Go Language Client
Aerospike Go Language Client
 
Ansible with AWS
Ansible with AWSAnsible with AWS
Ansible with AWS
 
Feed Burner Scalability
Feed Burner ScalabilityFeed Burner Scalability
Feed Burner Scalability
 
GlusterFS As an Object Storage
GlusterFS As an Object StorageGlusterFS As an Object Storage
GlusterFS As an Object Storage
 
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
 

Viewers also liked

NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Carol McDonald
 
HGrid A Data Model for Large Geospatial Data Sets in HBase
HGrid A Data Model for Large Geospatial Data Sets in HBaseHGrid A Data Model for Large Geospatial Data Sets in HBase
HGrid A Data Model for Large Geospatial Data Sets in HBase
Dan Han
 

Viewers also liked (14)

Hindex: Secondary indexes for faster HBase queries
Hindex: Secondary indexes for faster HBase queriesHindex: Secondary indexes for faster HBase queries
Hindex: Secondary indexes for faster HBase queries
 
HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
HBaseCon 2013: A Developer’s Guide to Coprocessors
HBaseCon 2013: A Developer’s Guide to CoprocessorsHBaseCon 2013: A Developer’s Guide to Coprocessors
HBaseCon 2013: A Developer’s Guide to Coprocessors
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use Cases
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
HGrid A Data Model for Large Geospatial Data Sets in HBase
HGrid A Data Model for Large Geospatial Data Sets in HBaseHGrid A Data Model for Large Geospatial Data Sets in HBase
HGrid A Data Model for Large Geospatial Data Sets in HBase
 

Similar to HBase Secondary Indexing

Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
Henk van der Valk
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
Sperasoft
 

Similar to HBase Secondary Indexing (20)

Valtech - Big Data & NoSQL : au-delà du nouveau buzz
Valtech  - Big Data & NoSQL : au-delà du nouveau buzzValtech  - Big Data & NoSQL : au-delà du nouveau buzz
Valtech - Big Data & NoSQL : au-delà du nouveau buzz
 
מיכאל
מיכאלמיכאל
מיכאל
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - Polybase
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoop
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Introduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe SeilerIntroduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe Seiler
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 
Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBase
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data
 

Recently uploaded

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 

Recently uploaded (20)

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

HBase Secondary Indexing

  • 1. HBase Secondary Indexing Implementation and analysis of secondary index in HBase Gino McCarty, Ranjan Kumar
  • 2. What’s next ● HBase Introduction ● HBase Scan ● Secondary index ● CoProcessors ● Secondary Index using CoProcessors ● Testing Infrastructure ● Benchmarks ● Challenges and conclusions
  • 3. HBase ● Open source, non-relational, distributed database built on top of Hadoop. ● Columnar data store, also called Tabular data store. ● Architecture - Hadoop, HDFS, Row keys, Column Families Image source: http://blogs.igalia.com/dpino/2012/10/31/introduction-to-hbase-and-nosql-systems/ EmpId Lastname Firstname Salary 1 Smith Joe 40000 2 Jones Mary 50000 3 Johnson Cathy 44000 Row-oriented 1,Smith,Joe,40000; 2,Jones,Mary,50000; 3,Johnson,Cathy,44000; Column-oriented 1,2,3; Smith,Jones,Johnson; Joe,Mary,Cathy; 40000,50000,44000;
  • 5. HBase Scan Scan - API for data retrieval ● Scan scan = new Scan(startRow, stopRow) ● scan.addFamily(f); ● scan.addColumn(c); Filters - control the amount of returned to the client ● scan.setFilter(new ValueFilter(CompareOp.EQUAL, new SubStringComparator(“3”))); Optimizations - caching, batching (still scans the entire table- client timeouts, lease expiring) ● scan.setCaching(1000); ● scan.setBatch(1000);
  • 6. Secondary Index ● The way you design your row key affects everything ● Secondary index to avoid table scans. Indexes can be stored in another index table. ● Create the index on a column and store the row keys corresponding to that column in a separate table. ● RegionServers use the index table to perform selective scan.
  • 7. HBase and CoProcessor Architecture Trigger Based Observer System ● Region Observer ● Master Observer ● Log Observer EndPoints (Stored Procedures) ● Server Side Execution ● Distributed ● Can be called by Clients or Observers
  • 8.
  • 9. Secondary Indexing utilizing CoProcessors ● For every index we create o Regions have their own assigned index o Indexes and regions are kept together o Immune to Splits and Server outages
  • 10. Sample Code Implementation ● Optimizing for Queries based on Cartesian Products of Key Values ● Example, what was the air time of all carriers that flew on a wednesday. HbaseAdmin admin = new IndexAdmin(conf); HTableDescriptor htd = new HTableDescriptor(TableName.valueOf(tableName)); HColumnDescriptor hcd = new HColumnDescriptor(columnFamily); htd.addFamily(hcd); IndexSpecification iSpec = new IndexSpecification(indexName); iSpec.addIndexColumn(hcd,indexColumnQualifier, ValueType.String, 10); TableIndices tableIndices = new TableIndices(); tableIndices.addIndex(iSpec); htd.setValue(Constants.INDEX_SPEC_KEY, tableIndices.toByteArray()); admin.createTable(htd);
  • 11. Testing Environment ● Testing Platforms: o Hadoop-2.2.0 o HBase-0.98.8  Single Laptop Benchmark ● HBase Pseudo Distributed Mode  QEMU Based 3 Node Virtual Cluster ● 2GB Ram and 2 Core Intel i7 per node ● Data Set: 7 Million Rows DataSet - Airline On-Time Statistics and Delay Causes - http://stat-computing.org/dataexpo/2009/the-data.html - Roughly 80GB DataStore Size Per Node
  • 12. HIndex ● Secondary Index for HBase ● Implementation by Huawei developers ● Uses coprocessors to inject code into master and region servers. ● Creates an additional index table for every column on which an index is desired.
  • 13. HIndex conclusions ● Limited to HBase .94 version. ● Need to build the code with hbase source i.e. it is not available independently as a jar. ● Very little documentation and support. ● Unstable at many many edgecases
  • 14. Benchmarks Query1: new RowFilter(CompareOp.EQUAL, new SubstringComparator(“WN”) 38ms per record Query2: new ValueFilter(CompareOp.GREATER,new BinaryComparator(60)); 3.9ms per record Query Time taken on single node(ms) Total Records on single node Time taken on cluster(ms) Total Records on cluster Query1 50910 346435 62291 689409 Query2 207757 3807652 125777 7785038 Average Performance over 3 Runs Table Name Keys Column Families and Values flight_data year, month, dayofMonth, dayOfWeek, Departure Time, Carrier (flight) Month, dayOfWeek, Carrier, Flight Number, Origin, Destination, (trip) Distance, AirTime, Arrival Delay
  • 15. Challenges & Conclusions Indexing data across regions ● Co-locating index with data in the same region ● Make column family a part of the index Handling region split ● Split the index table by having a custom splitter as per the row key distribution of the data table Secondary indexes improve query performance at the expense of extra space. CoProcessors add extra overhead for each query processing.
  • 16. Questions References: HBase - CoProcessors - https://blogs.apache.org/hbase/entry/coprocessor_introduction HIndex - https://github.com/Huawei-Hadoop/hindex HIndex Overview - http://www.slideshare.net/rajeshbabuchintaguntla/apache-con-hindex