SlideShare a Scribd company logo
Secondary Indexing in Phoenix
Jesse Yates
HBase Committer
Software Engineer
HBase BoF – June 25, 2013
Outline
• Motivation
• History
• HBase Consistent Indexing
– Index Management
– Recovery Mechanism
• Conclusion
HBase BoF - June 20132
A quick note…
HBase BoF - June 20133
Outline
• Motivation
• History
• HBase Consistent Indexing
– Index Management
– Recovery Mechanism
• Conclusion
HBase BoF - June 20134
Why do we need them?
• Sorted by key
– Great for accessing on that key
HBase BoF - June 20135
What if we want to access by another dimension!?
A short example
HBase BoF - June 20136
• Easy to search by name of food
• Hard to search on another dimension
Name Type Date Received Manufacturer Current Count
Apple Macintosh 6/23/13 Good Farm Inc. 200
Turkey Breast 6/23/13 Tasty Meat Co. 42
Chicken Drumstick 6/18/13 Pretty Ok Food 3
Jam Strawberry 6/18/10 Mash It Up Inc. 700
A short example
Name Type Date Received Manufacturer Current Count
Apple Macintosh 6/23/13 Good Farm Inc. 200
Turkey Breast 6/23/13 Tasty Meat Co. 42
Chicken Drumstick 6/18/13 Pretty Ok Food 3
Jam Strawberry 6/18/10 Mash It Up Inc. 700
HBase BoF - June 20137
Date Received Name Type Manufacturer Current Count
6/18/13 Jam Strawberry Mash It Up Inc. 700
6/18/13 Chicken Drumstick Pretty Ok Food 3
6/23/13 Apple Macintosh Good Farm Inc. 200
6/23/13 Turkey Breast Tasty Meat Co. 42
Outline
• Motivation
• History
• HBase Consistent Indexing
– Index Management
– Recovery Mechanism
• Conclusion
HBase BoF - June 20138
HBase is “Special”…
• Partitioned Keys (“HRegion”)
• Scales because regions are independent
• Built-in data recovery mechanisms
HBase BoF - June 20139
Hasn’t someone tried this?
• Omid
• Percolator
• Culvert
• Lily
• TrendMicro
• Client-coordinated
HBase BoF - June 201310
We’ve gotten better…
• NGData
– HBase-SEP
– HBase-Indexer
• Intel
– Lucene Full Text Indexing
HBase BoF - June 201311
Still missing some things
• In-HBase index storage
– Just another table in HBase
• Simple consistency guarantees
– If X fails, then Y
• Minimal overhead for covered indexes
– Network roundtrips
HBase BoF - June 201312
Outline
• Motivation
• History
• HBase Consistent Indexing
– Index Management
– Recovery Mechanism
• Conclusion
HBase BoF - June 201313
Two Major Components
• Index Management
– Build index updates
– Ensures index is ‘cleaned up’
• Recovery Mechanism
– Ensures index updates are “ACID”
HBase BoF - June 201314
Index Management
HBase BoF - June 201315
• Lives within a RegionCoprocesorObesrver
• Access to the local Hregion
• Specifies the mutations to apply to the index
tables
public interface IndexBuilder {
public void setup(RegionCoprocessorEnvironment env);
public Map<Mutation, String> getIndexUpdate(Put put);
public Map<Mutation, String> getIndexUpdate(Delete delete);
}
Index Management
HBase BoF - June 201316
Key Observation #1
“We shouldn’t need to provide stronger
guarantees than HBase - that is just asking for
a bad time.”
- Jon Hsieh
HBase BoF - June 201317
* Paraphrased
*
HBase ACID
• Does NOT give you:
– Cross-row consistency
– Cross-table consistency
• Does give you:
– Durable data on success
– Visibility on success without partial rows
HBase BoF - June 201318
Key Observation #2
“Secondary indexing is inherently an easier
problem than full transactions… secondary
index updates are idempotent.”
- Lars Hofhansl
HBase BoF - June 201319
Idempotent Index Updates
• Doesn’t need full transactions
• Replay as many times as needed
• Can tolerate a little lag
– As long as we get the order right
HBase BoF - June 201320
Taking a little ACID…
HBase BoF - June 201321
HBase BoF - June 201322
Durable Indexing: Standard Write Path
HBase BoF - June 201323
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
Durable Indexing: Standard Write Path
HBase BoF - June 201324
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
Durable Indexing
HBase BoF - June 201325
Region
Coprocessor
Host
WAL
RegionCoprocessorHost
Indexer Index
Builder
WAL Updater
Durable!
Indexer
Index Table
Index Table
Index Table
Failure Situations
• Before writing the WAL
– Nothing is durable, nothing is visible
HBase BoF - June 201326
Durable Indexing
HBase BoF - June 201327
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
Indexer
Indexer
Index
TableIndex
TableIndex
Table
Failure Situations
• Before writing the WAL
– Nothing is durable, nothing is visible
HBase BoF - June 201328
✔
Failure Situations
• Before writing the WAL
– Nothing is durable, nothing is visible
• After writing WAL, before index update
– WAL Replay updates the index table and the
primary table
HBase BoF - June 201329
✔
Durable Indexing
HBase BoF - June 201330
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
Indexer
Indexer
Index
TableIndex
TableIndex
Table
Failure Situations
• Before writing the WAL
– Nothing is durable, nothing is visible
• After writing WAL, before index update
– WAL Replay updates the index table and the
primary table
HBase BoF - June 201331
✔
✔
Failure Situations
• Before writing the WAL
– Nothing is durable, nothing is visible
• After writing WAL, before index update
– WAL Replay updates the index table and the
primary table
• Mid-index update
– WAL Replay finishes index update, primary table
update
HBase BoF - June 201332
✔
✔
Durable Indexing
HBase BoF - June 201333
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
Indexer
Indexer
Index
TableIndex
TableIndex
Table
Failure Situations
• Before writing the WAL
– Nothing is durable, nothing is visible
• After writing WAL, before index update
– WAL Replay updates the index table and the primary
table
• Mid-index update
– WAL Replay finishes index update, primary table
update
HBase BoF - June 201334
✔
✔
✔
Failure Situations
• Before writing the WAL
– Nothing is durable, nothing is visible
• After writing WAL, before index update
– WAL Replay updates the index table and the primary
table
• Mid-index update
– WAL Replay finishes index update, primary table
update
• After index updates, before primary
– WAL Replay restores primary state, idempotently
applies index updates
HBase BoF - June 201335
✔
✔
✔
Durable Indexing
HBase BoF - June 201336
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
Indexer
Indexer
Index
TableIndex
TableIndex
Table
Failure Situations
• Before writing the WAL
– Nothing is durable, nothing is visible
• After writing WAL, before index update
– WAL Replay updates the index table and the primary
table
• Mid-index update
– WAL Replay finishes index update, primary table
update
• After index updates, before primary
– WAL Replay restores primary state, idempotently
applies index updates
HBase BoF - June 201337
✔
✔
✔
✔
Special Note: Failed Index Updates
• Index is corrupted
– Index Table does not exist
– Index table does not have write schema
– Etc.
• Fail-fast behavior
– Kill the whole server
– Forces WAL Replay to enforce correctness
– Modular enough to support alternative schemes
HBase BoF - June 201338
Key Points
• Custom KeyValues to enable index durability
in primary table WAL
• Custom WALEdit Codec for index update with
WAL Replay
• Will see index updates before primary
– Only a little bit of lag and never ‘wrong’
– Matches HBase consistency
• Fail-fast behavior to enforce correctness
HBase BoF - June 201339
Upcoming Work
• Performance testing
• Standard covered index managers
• Index cleanup on compaction
HBase BoF - June 201340
Outline
• Motivation
• History
• HBase Consistent Indexing
– Index Management
– Recovery Mechanism
• Conclusion
HBase BoF - June 201341
Conclusion
• Fully transparent to client
• Easy to build custom index maintenance
• Meets current HBase consistency guarantees
• Supports HBase 0.94.9+
– Coming to 0.96/0.98 soon!
HBase BoF - June 201342
hbase-index
HBase BoF - June 201343
https://github.com/forcedotcom/phoenix/tre
e/master/contrib/hbase-index
Detailed Blog Post
HBase BoF - June 201344
http://jyates.github.io/2013/06/11/hbase-
consistent-secondary-indexing.html
Bonus!
• Usable as a standalone module
• Coming to phoenix*
– Built-in support
• Future: added to HBase core (?)
HBase BoF - June 201345
* https://github.com/forcedotcom/phoenix
Thanks! Questions!
HBase BoF - June 201346
@jesse_yates
jesse.k.yates@gmail.com

More Related Content

What's hot

SharePoint Saturday Silicon Valley - Upgrading from SharePoint 2010 to 2013
SharePoint Saturday Silicon Valley - Upgrading from SharePoint 2010 to 2013SharePoint Saturday Silicon Valley - Upgrading from SharePoint 2010 to 2013
SharePoint Saturday Silicon Valley - Upgrading from SharePoint 2010 to 2013
Shereen Qumsieh
 
Why Java Professionals Should Learn Hadoop
Why Java Professionals Should Learn HadoopWhy Java Professionals Should Learn Hadoop
Why Java Professionals Should Learn Hadoop
BigClasses Com
 
S4 HANA Simplification
S4 HANA SimplificationS4 HANA Simplification
S4 HANA Simplification
Salient ERP
 
Hadoop course content
Hadoop course contentHadoop course content
Hadoop course content
BigClasses Com
 
SharePoint 2010 Upgrade Best Practices Teched Brazil by Joel Oleson
SharePoint 2010 Upgrade Best Practices Teched Brazil by Joel OlesonSharePoint 2010 Upgrade Best Practices Teched Brazil by Joel Oleson
SharePoint 2010 Upgrade Best Practices Teched Brazil by Joel Oleson
Joel Oleson
 
Efficient transaction processing in sap hana
Efficient transaction processing in sap hanaEfficient transaction processing in sap hana
Efficient transaction processing in sap hana
Mysa Vijay
 
Sap hana training in hyderabad
Sap hana training in hyderabadSap hana training in hyderabad
Sap hana training in hyderabad
Rajitha D
 
Converting your e resource records to rda-guajardo
Converting your e resource records to rda-guajardoConverting your e resource records to rda-guajardo
Converting your e resource records to rda-guajardo
NASIG
 

What's hot (8)

SharePoint Saturday Silicon Valley - Upgrading from SharePoint 2010 to 2013
SharePoint Saturday Silicon Valley - Upgrading from SharePoint 2010 to 2013SharePoint Saturday Silicon Valley - Upgrading from SharePoint 2010 to 2013
SharePoint Saturday Silicon Valley - Upgrading from SharePoint 2010 to 2013
 
Why Java Professionals Should Learn Hadoop
Why Java Professionals Should Learn HadoopWhy Java Professionals Should Learn Hadoop
Why Java Professionals Should Learn Hadoop
 
S4 HANA Simplification
S4 HANA SimplificationS4 HANA Simplification
S4 HANA Simplification
 
Hadoop course content
Hadoop course contentHadoop course content
Hadoop course content
 
SharePoint 2010 Upgrade Best Practices Teched Brazil by Joel Oleson
SharePoint 2010 Upgrade Best Practices Teched Brazil by Joel OlesonSharePoint 2010 Upgrade Best Practices Teched Brazil by Joel Oleson
SharePoint 2010 Upgrade Best Practices Teched Brazil by Joel Oleson
 
Efficient transaction processing in sap hana
Efficient transaction processing in sap hanaEfficient transaction processing in sap hana
Efficient transaction processing in sap hana
 
Sap hana training in hyderabad
Sap hana training in hyderabadSap hana training in hyderabad
Sap hana training in hyderabad
 
Converting your e resource records to rda-guajardo
Converting your e resource records to rda-guajardoConverting your e resource records to rda-guajardo
Converting your e resource records to rda-guajardo
 

Viewers also liked

April 2014 HUG : Apache Phoenix
April 2014 HUG : Apache PhoenixApril 2014 HUG : Apache Phoenix
April 2014 HUG : Apache Phoenix
Yahoo Developer Network
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
Cloudera, Inc.
 
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
Evan Liu
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
Cloudera, Inc.
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
alexbaranau
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseDataWorks Summit
 

Viewers also liked (7)

April 2014 HUG : Apache Phoenix
April 2014 HUG : Apache PhoenixApril 2014 HUG : Apache Phoenix
April 2014 HUG : Apache Phoenix
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
 

Similar to Secondary Indexing in Phoenix - Hadoop Summit 2012 - HBase BoF

suresh resume BI ABAP
suresh resume  BI ABAPsuresh resume  BI ABAP
suresh resume BI ABAPsuresh m
 
suresh resume BI ABAP
suresh resume  BI ABAPsuresh resume  BI ABAP
suresh resume BI ABAPsuresh m
 
Time Series Vs Order based Planning in SAP IBP
Time Series Vs Order based Planning in SAP IBPTime Series Vs Order based Planning in SAP IBP
Time Series Vs Order based Planning in SAP IBP
AYAN BISHNU
 
Joy SAP BW/HANA
Joy SAP BW/HANAJoy SAP BW/HANA
Joy SAP BW/HANA
Sai Teja
 
South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...
South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...
South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...
Vishal Pawar
 
XWBI_Migrating to BusinessObjects 4.2
XWBI_Migrating to BusinessObjects 4.2XWBI_Migrating to BusinessObjects 4.2
XWBI_Migrating to BusinessObjects 4.2Nicolas Henry
 
Bi4.1 and beyond
Bi4.1 and beyondBi4.1 and beyond
Bi4.1 and beyond
sapbisignz
 
Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...
Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...
Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...
Vishal Pawar
 
ASUG82313 - Preparing to Migrate Data to SAP S4HANA Finance.pptx
ASUG82313 - Preparing to Migrate Data to SAP S4HANA Finance.pptxASUG82313 - Preparing to Migrate Data to SAP S4HANA Finance.pptx
ASUG82313 - Preparing to Migrate Data to SAP S4HANA Finance.pptx
VSKrishnaAchanta
 
Monitoring on premise biz talk applications using cloud based power bi saas
Monitoring on premise biz talk applications using cloud based power bi saasMonitoring on premise biz talk applications using cloud based power bi saas
Monitoring on premise biz talk applications using cloud based power bi saas
BizTalk360
 
Intro to Report Developer Role
Intro to Report Developer RoleIntro to Report Developer Role
Intro to Report Developer Role
Jonathan Bloom
 
Mirza SAP BW_BO Consultant
Mirza  SAP BW_BO ConsultantMirza  SAP BW_BO Consultant
Mirza SAP BW_BO ConsultantMirza Beg
 
SAP #BOBJ #BI 4.1 Upgrade Webcast Series 3: BI 4.1 Sizing and Virtualization
SAP #BOBJ #BI 4.1 Upgrade Webcast Series 3: BI 4.1 Sizing and VirtualizationSAP #BOBJ #BI 4.1 Upgrade Webcast Series 3: BI 4.1 Sizing and Virtualization
SAP #BOBJ #BI 4.1 Upgrade Webcast Series 3: BI 4.1 Sizing and Virtualization
SAP Analytics
 
SQL Server 2017 - Adaptive Query Processing and Automatic Query Tuning
SQL Server 2017 - Adaptive Query Processing and Automatic Query TuningSQL Server 2017 - Adaptive Query Processing and Automatic Query Tuning
SQL Server 2017 - Adaptive Query Processing and Automatic Query Tuning
Javier Villegas
 
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
Mark Rittman
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
Cloudera, Inc.
 
Earl Shaffer Oracle Performance Tuning pre12c 11g AWR uses
Earl Shaffer Oracle Performance Tuning pre12c 11g AWR usesEarl Shaffer Oracle Performance Tuning pre12c 11g AWR uses
Earl Shaffer Oracle Performance Tuning pre12c 11g AWR uses
oramanc
 
AVATA Webinar: Upgrading ASCP - The New Face of ASCP is Here! www.avata.com
AVATA Webinar:  Upgrading ASCP - The New Face of ASCP is Here! www.avata.comAVATA Webinar:  Upgrading ASCP - The New Face of ASCP is Here! www.avata.com
AVATA Webinar: Upgrading ASCP - The New Face of ASCP is Here! www.avata.com
AVATA
 

Similar to Secondary Indexing in Phoenix - Hadoop Summit 2012 - HBase BoF (20)

suresh resume BI ABAP
suresh resume  BI ABAPsuresh resume  BI ABAP
suresh resume BI ABAP
 
suresh resume BI ABAP
suresh resume  BI ABAPsuresh resume  BI ABAP
suresh resume BI ABAP
 
Time Series Vs Order based Planning in SAP IBP
Time Series Vs Order based Planning in SAP IBPTime Series Vs Order based Planning in SAP IBP
Time Series Vs Order based Planning in SAP IBP
 
Joy SAP BW/HANA
Joy SAP BW/HANAJoy SAP BW/HANA
Joy SAP BW/HANA
 
South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...
South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...
South Florida SQL Saturday - Power BI Report Server Enterprise Architecture, ...
 
XWBI_Migrating to BusinessObjects 4.2
XWBI_Migrating to BusinessObjects 4.2XWBI_Migrating to BusinessObjects 4.2
XWBI_Migrating to BusinessObjects 4.2
 
Resume _Tulasi Krishna Bimana
Resume _Tulasi Krishna BimanaResume _Tulasi Krishna Bimana
Resume _Tulasi Krishna Bimana
 
Bi4.1 and beyond
Bi4.1 and beyondBi4.1 and beyond
Bi4.1 and beyond
 
Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...
Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...
Power BI Report Server Enterprise Architecture, Tools to Publish reports and ...
 
ASUG82313 - Preparing to Migrate Data to SAP S4HANA Finance.pptx
ASUG82313 - Preparing to Migrate Data to SAP S4HANA Finance.pptxASUG82313 - Preparing to Migrate Data to SAP S4HANA Finance.pptx
ASUG82313 - Preparing to Migrate Data to SAP S4HANA Finance.pptx
 
Monitoring on premise biz talk applications using cloud based power bi saas
Monitoring on premise biz talk applications using cloud based power bi saasMonitoring on premise biz talk applications using cloud based power bi saas
Monitoring on premise biz talk applications using cloud based power bi saas
 
Intro to Report Developer Role
Intro to Report Developer RoleIntro to Report Developer Role
Intro to Report Developer Role
 
Mirza SAP BW_BO Consultant
Mirza  SAP BW_BO ConsultantMirza  SAP BW_BO Consultant
Mirza SAP BW_BO Consultant
 
SAP #BOBJ #BI 4.1 Upgrade Webcast Series 3: BI 4.1 Sizing and Virtualization
SAP #BOBJ #BI 4.1 Upgrade Webcast Series 3: BI 4.1 Sizing and VirtualizationSAP #BOBJ #BI 4.1 Upgrade Webcast Series 3: BI 4.1 Sizing and Virtualization
SAP #BOBJ #BI 4.1 Upgrade Webcast Series 3: BI 4.1 Sizing and Virtualization
 
SQL Server 2017 - Adaptive Query Processing and Automatic Query Tuning
SQL Server 2017 - Adaptive Query Processing and Automatic Query TuningSQL Server 2017 - Adaptive Query Processing and Automatic Query Tuning
SQL Server 2017 - Adaptive Query Processing and Automatic Query Tuning
 
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
 
SAP BI:BW and FICO Res
SAP BI:BW and FICO ResSAP BI:BW and FICO Res
SAP BI:BW and FICO Res
 
Earl Shaffer Oracle Performance Tuning pre12c 11g AWR uses
Earl Shaffer Oracle Performance Tuning pre12c 11g AWR usesEarl Shaffer Oracle Performance Tuning pre12c 11g AWR uses
Earl Shaffer Oracle Performance Tuning pre12c 11g AWR uses
 
AVATA Webinar: Upgrading ASCP - The New Face of ASCP is Here! www.avata.com
AVATA Webinar:  Upgrading ASCP - The New Face of ASCP is Here! www.avata.comAVATA Webinar:  Upgrading ASCP - The New Face of ASCP is Here! www.avata.com
AVATA Webinar: Upgrading ASCP - The New Face of ASCP is Here! www.avata.com
 

Recently uploaded

Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdfMeas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
dylandmeas
 
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
AnnySerafinaLove
 
Creative Web Design Company in Singapore
Creative Web Design Company in SingaporeCreative Web Design Company in Singapore
Creative Web Design Company in Singapore
techboxsqauremedia
 
Tata Group Dials Taiwan for Its Chipmaking Ambition in Gujarat’s Dholera
Tata Group Dials Taiwan for Its Chipmaking Ambition in Gujarat’s DholeraTata Group Dials Taiwan for Its Chipmaking Ambition in Gujarat’s Dholera
Tata Group Dials Taiwan for Its Chipmaking Ambition in Gujarat’s Dholera
Avirahi City Dholera
 
Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...
dylandmeas
 
3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx
tanyjahb
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
bosssp10
 
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Lviv Startup Club
 
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challengesEvent Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Holger Mueller
 
amptalk_RecruitingDeck_english_2024.06.05
amptalk_RecruitingDeck_english_2024.06.05amptalk_RecruitingDeck_english_2024.06.05
amptalk_RecruitingDeck_english_2024.06.05
marketing317746
 
The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...
Adam Smith
 
Understanding User Needs and Satisfying Them
Understanding User Needs and Satisfying ThemUnderstanding User Needs and Satisfying Them
Understanding User Needs and Satisfying Them
Aggregage
 
LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024
Lital Barkan
 
Brand Analysis for an artist named Struan
Brand Analysis for an artist named StruanBrand Analysis for an artist named Struan
Brand Analysis for an artist named Struan
sarahvanessa51503
 
BeMetals Investor Presentation_June 1, 2024.pdf
BeMetals Investor Presentation_June 1, 2024.pdfBeMetals Investor Presentation_June 1, 2024.pdf
BeMetals Investor Presentation_June 1, 2024.pdf
DerekIwanaka1
 
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdfikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
agatadrynko
 
The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...
Adam Smith
 
In the Adani-Hindenburg case, what is SEBI investigating.pptx
In the Adani-Hindenburg case, what is SEBI investigating.pptxIn the Adani-Hindenburg case, what is SEBI investigating.pptx
In the Adani-Hindenburg case, what is SEBI investigating.pptx
Adani case
 
Training my puppy and implementation in this story
Training my puppy and implementation in this storyTraining my puppy and implementation in this story
Training my puppy and implementation in this story
WilliamRodrigues148
 
Project File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdfProject File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdf
RajPriye
 

Recently uploaded (20)

Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdfMeas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
Meas_Dylan_DMBS_PB1_2024-05XX_Revised.pdf
 
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
 
Creative Web Design Company in Singapore
Creative Web Design Company in SingaporeCreative Web Design Company in Singapore
Creative Web Design Company in Singapore
 
Tata Group Dials Taiwan for Its Chipmaking Ambition in Gujarat’s Dholera
Tata Group Dials Taiwan for Its Chipmaking Ambition in Gujarat’s DholeraTata Group Dials Taiwan for Its Chipmaking Ambition in Gujarat’s Dholera
Tata Group Dials Taiwan for Its Chipmaking Ambition in Gujarat’s Dholera
 
Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...
 
3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx3.0 Project 2_ Developing My Brand Identity Kit.pptx
3.0 Project 2_ Developing My Brand Identity Kit.pptx
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
 
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
Evgen Osmak: Methods of key project parameters estimation: from the shaman-in...
 
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challengesEvent Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
 
amptalk_RecruitingDeck_english_2024.06.05
amptalk_RecruitingDeck_english_2024.06.05amptalk_RecruitingDeck_english_2024.06.05
amptalk_RecruitingDeck_english_2024.06.05
 
The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...
 
Understanding User Needs and Satisfying Them
Understanding User Needs and Satisfying ThemUnderstanding User Needs and Satisfying Them
Understanding User Needs and Satisfying Them
 
LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024LA HUG - Video Testimonials with Chynna Morgan - June 2024
LA HUG - Video Testimonials with Chynna Morgan - June 2024
 
Brand Analysis for an artist named Struan
Brand Analysis for an artist named StruanBrand Analysis for an artist named Struan
Brand Analysis for an artist named Struan
 
BeMetals Investor Presentation_June 1, 2024.pdf
BeMetals Investor Presentation_June 1, 2024.pdfBeMetals Investor Presentation_June 1, 2024.pdf
BeMetals Investor Presentation_June 1, 2024.pdf
 
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdfikea_woodgreen_petscharity_cat-alogue_digital.pdf
ikea_woodgreen_petscharity_cat-alogue_digital.pdf
 
The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...The Influence of Marketing Strategy and Market Competition on Business Perfor...
The Influence of Marketing Strategy and Market Competition on Business Perfor...
 
In the Adani-Hindenburg case, what is SEBI investigating.pptx
In the Adani-Hindenburg case, what is SEBI investigating.pptxIn the Adani-Hindenburg case, what is SEBI investigating.pptx
In the Adani-Hindenburg case, what is SEBI investigating.pptx
 
Training my puppy and implementation in this story
Training my puppy and implementation in this storyTraining my puppy and implementation in this story
Training my puppy and implementation in this story
 
Project File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdfProject File Report BBA 6th semester.pdf
Project File Report BBA 6th semester.pdf
 

Secondary Indexing in Phoenix - Hadoop Summit 2012 - HBase BoF

  • 1. Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer HBase BoF – June 25, 2013
  • 2. Outline • Motivation • History • HBase Consistent Indexing – Index Management – Recovery Mechanism • Conclusion HBase BoF - June 20132
  • 3. A quick note… HBase BoF - June 20133
  • 4. Outline • Motivation • History • HBase Consistent Indexing – Index Management – Recovery Mechanism • Conclusion HBase BoF - June 20134
  • 5. Why do we need them? • Sorted by key – Great for accessing on that key HBase BoF - June 20135 What if we want to access by another dimension!?
  • 6. A short example HBase BoF - June 20136 • Easy to search by name of food • Hard to search on another dimension Name Type Date Received Manufacturer Current Count Apple Macintosh 6/23/13 Good Farm Inc. 200 Turkey Breast 6/23/13 Tasty Meat Co. 42 Chicken Drumstick 6/18/13 Pretty Ok Food 3 Jam Strawberry 6/18/10 Mash It Up Inc. 700
  • 7. A short example Name Type Date Received Manufacturer Current Count Apple Macintosh 6/23/13 Good Farm Inc. 200 Turkey Breast 6/23/13 Tasty Meat Co. 42 Chicken Drumstick 6/18/13 Pretty Ok Food 3 Jam Strawberry 6/18/10 Mash It Up Inc. 700 HBase BoF - June 20137 Date Received Name Type Manufacturer Current Count 6/18/13 Jam Strawberry Mash It Up Inc. 700 6/18/13 Chicken Drumstick Pretty Ok Food 3 6/23/13 Apple Macintosh Good Farm Inc. 200 6/23/13 Turkey Breast Tasty Meat Co. 42
  • 8. Outline • Motivation • History • HBase Consistent Indexing – Index Management – Recovery Mechanism • Conclusion HBase BoF - June 20138
  • 9. HBase is “Special”… • Partitioned Keys (“HRegion”) • Scales because regions are independent • Built-in data recovery mechanisms HBase BoF - June 20139
  • 10. Hasn’t someone tried this? • Omid • Percolator • Culvert • Lily • TrendMicro • Client-coordinated HBase BoF - June 201310
  • 11. We’ve gotten better… • NGData – HBase-SEP – HBase-Indexer • Intel – Lucene Full Text Indexing HBase BoF - June 201311
  • 12. Still missing some things • In-HBase index storage – Just another table in HBase • Simple consistency guarantees – If X fails, then Y • Minimal overhead for covered indexes – Network roundtrips HBase BoF - June 201312
  • 13. Outline • Motivation • History • HBase Consistent Indexing – Index Management – Recovery Mechanism • Conclusion HBase BoF - June 201313
  • 14. Two Major Components • Index Management – Build index updates – Ensures index is ‘cleaned up’ • Recovery Mechanism – Ensures index updates are “ACID” HBase BoF - June 201314
  • 15. Index Management HBase BoF - June 201315 • Lives within a RegionCoprocesorObesrver • Access to the local Hregion • Specifies the mutations to apply to the index tables public interface IndexBuilder { public void setup(RegionCoprocessorEnvironment env); public Map<Mutation, String> getIndexUpdate(Put put); public Map<Mutation, String> getIndexUpdate(Delete delete); }
  • 16. Index Management HBase BoF - June 201316
  • 17. Key Observation #1 “We shouldn’t need to provide stronger guarantees than HBase - that is just asking for a bad time.” - Jon Hsieh HBase BoF - June 201317 * Paraphrased *
  • 18. HBase ACID • Does NOT give you: – Cross-row consistency – Cross-table consistency • Does give you: – Durable data on success – Visibility on success without partial rows HBase BoF - June 201318
  • 19. Key Observation #2 “Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.” - Lars Hofhansl HBase BoF - June 201319
  • 20. Idempotent Index Updates • Doesn’t need full transactions • Replay as many times as needed • Can tolerate a little lag – As long as we get the order right HBase BoF - June 201320
  • 21. Taking a little ACID… HBase BoF - June 201321
  • 22. HBase BoF - June 201322
  • 23. Durable Indexing: Standard Write Path HBase BoF - June 201323 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore
  • 24. Durable Indexing: Standard Write Path HBase BoF - June 201324 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore
  • 25. Durable Indexing HBase BoF - June 201325 Region Coprocessor Host WAL RegionCoprocessorHost Indexer Index Builder WAL Updater Durable! Indexer Index Table Index Table Index Table
  • 26. Failure Situations • Before writing the WAL – Nothing is durable, nothing is visible HBase BoF - June 201326
  • 27. Durable Indexing HBase BoF - June 201327 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore Indexer Indexer Index TableIndex TableIndex Table
  • 28. Failure Situations • Before writing the WAL – Nothing is durable, nothing is visible HBase BoF - June 201328 ✔
  • 29. Failure Situations • Before writing the WAL – Nothing is durable, nothing is visible • After writing WAL, before index update – WAL Replay updates the index table and the primary table HBase BoF - June 201329 ✔
  • 30. Durable Indexing HBase BoF - June 201330 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore Indexer Indexer Index TableIndex TableIndex Table
  • 31. Failure Situations • Before writing the WAL – Nothing is durable, nothing is visible • After writing WAL, before index update – WAL Replay updates the index table and the primary table HBase BoF - June 201331 ✔ ✔
  • 32. Failure Situations • Before writing the WAL – Nothing is durable, nothing is visible • After writing WAL, before index update – WAL Replay updates the index table and the primary table • Mid-index update – WAL Replay finishes index update, primary table update HBase BoF - June 201332 ✔ ✔
  • 33. Durable Indexing HBase BoF - June 201333 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore Indexer Indexer Index TableIndex TableIndex Table
  • 34. Failure Situations • Before writing the WAL – Nothing is durable, nothing is visible • After writing WAL, before index update – WAL Replay updates the index table and the primary table • Mid-index update – WAL Replay finishes index update, primary table update HBase BoF - June 201334 ✔ ✔ ✔
  • 35. Failure Situations • Before writing the WAL – Nothing is durable, nothing is visible • After writing WAL, before index update – WAL Replay updates the index table and the primary table • Mid-index update – WAL Replay finishes index update, primary table update • After index updates, before primary – WAL Replay restores primary state, idempotently applies index updates HBase BoF - June 201335 ✔ ✔ ✔
  • 36. Durable Indexing HBase BoF - June 201336 Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore Indexer Indexer Index TableIndex TableIndex Table
  • 37. Failure Situations • Before writing the WAL – Nothing is durable, nothing is visible • After writing WAL, before index update – WAL Replay updates the index table and the primary table • Mid-index update – WAL Replay finishes index update, primary table update • After index updates, before primary – WAL Replay restores primary state, idempotently applies index updates HBase BoF - June 201337 ✔ ✔ ✔ ✔
  • 38. Special Note: Failed Index Updates • Index is corrupted – Index Table does not exist – Index table does not have write schema – Etc. • Fail-fast behavior – Kill the whole server – Forces WAL Replay to enforce correctness – Modular enough to support alternative schemes HBase BoF - June 201338
  • 39. Key Points • Custom KeyValues to enable index durability in primary table WAL • Custom WALEdit Codec for index update with WAL Replay • Will see index updates before primary – Only a little bit of lag and never ‘wrong’ – Matches HBase consistency • Fail-fast behavior to enforce correctness HBase BoF - June 201339
  • 40. Upcoming Work • Performance testing • Standard covered index managers • Index cleanup on compaction HBase BoF - June 201340
  • 41. Outline • Motivation • History • HBase Consistent Indexing – Index Management – Recovery Mechanism • Conclusion HBase BoF - June 201341
  • 42. Conclusion • Fully transparent to client • Easy to build custom index maintenance • Meets current HBase consistency guarantees • Supports HBase 0.94.9+ – Coming to 0.96/0.98 soon! HBase BoF - June 201342
  • 43. hbase-index HBase BoF - June 201343 https://github.com/forcedotcom/phoenix/tre e/master/contrib/hbase-index
  • 44. Detailed Blog Post HBase BoF - June 201344 http://jyates.github.io/2013/06/11/hbase- consistent-secondary-indexing.html
  • 45. Bonus! • Usable as a standalone module • Coming to phoenix* – Built-in support • Future: added to HBase core (?) HBase BoF - June 201345 * https://github.com/forcedotcom/phoenix
  • 46. Thanks! Questions! HBase BoF - June 201346 @jesse_yates jesse.k.yates@gmail.com