Secondary Indexing in PhoenixJesse YatesHBase CommitterSoftware EngineerHBase BoF – June 25, 2013
Outline• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism• ConclusionHBase BoF - June ...
A quick note…HBase BoF - June 20133
Outline• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism• ConclusionHBase BoF - June ...
Why do we need them?• Sorted by key– Great for accessing on that keyHBase BoF - June 20135What if we want to access by ano...
A short exampleHBase BoF - June 20136• Easy to search by name of food• Hard to search on another dimensionName Type Date R...
A short exampleName Type Date Received Manufacturer Current CountApple Macintosh 6/23/13 Good Farm Inc. 200Turkey Breast 6...
Outline• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism• ConclusionHBase BoF - June ...
HBase is “Special”…• Partitioned Keys (“HRegion”)• Scales because regions are independent• Built-in data recovery mechanis...
Hasn’t someone tried this?• Omid• Percolator• Culvert• Lily• TrendMicro• Client-coordinatedHBase BoF - June 201310
We’ve gotten better…• NGData– HBase-SEP– HBase-Indexer• Intel– Lucene Full Text IndexingHBase BoF - June 201311
Still missing some things• In-HBase index storage– Just another table in HBase• Simple consistency guarantees– If X fails,...
Outline• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism• ConclusionHBase BoF - June ...
Two Major Components• Index Management– Build index updates– Ensures index is ‘cleaned up’• Recovery Mechanism– Ensures in...
Index ManagementHBase BoF - June 201315• Lives within a RegionCoprocesorObesrver• Access to the local Hregion• Specifies t...
Index ManagementHBase BoF - June 201316
Key Observation #1“We shouldn’t need to provide strongerguarantees than HBase - that is just asking fora bad time.”- Jon H...
HBase ACID• Does NOT give you:– Cross-row consistency– Cross-table consistency• Does give you:– Durable data on success– V...
Key Observation #2“Secondary indexing is inherently an easierproblem than full transactions… secondaryindex updates are id...
Idempotent Index Updates• Doesn’t need full transactions• Replay as many times as needed• Can tolerate a little lag– As lo...
Taking a little ACID…HBase BoF - June 201321
HBase BoF - June 201322
Durable Indexing: Standard Write PathHBase BoF - June 201323Client HRegionRegionCoprocessorHostWALRegionCoprocessorHostMem...
Durable Indexing: Standard Write PathHBase BoF - June 201324Client HRegionRegionCoprocessorHostWALRegionCoprocessorHostMem...
Durable IndexingHBase BoF - June 201325RegionCoprocessorHostWALRegionCoprocessorHostIndexer IndexBuilderWAL UpdaterDurable...
Failure Situations• Before writing the WAL– Nothing is durable, nothing is visibleHBase BoF - June 201326
Durable IndexingHBase BoF - June 201327Client HRegionRegionCoprocessorHostWALRegionCoprocessorHostMemStoreIndexerIndexerIn...
Failure Situations• Before writing the WAL– Nothing is durable, nothing is visibleHBase BoF - June 201328✔
Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible• After writing WAL, before index update...
Durable IndexingHBase BoF - June 201330Client HRegionRegionCoprocessorHostWALRegionCoprocessorHostMemStoreIndexerIndexerIn...
Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible• After writing WAL, before index update...
Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible• After writing WAL, before index update...
Durable IndexingHBase BoF - June 201333Client HRegionRegionCoprocessorHostWALRegionCoprocessorHostMemStoreIndexerIndexerIn...
Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible• After writing WAL, before index update...
Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible• After writing WAL, before index update...
Durable IndexingHBase BoF - June 201336Client HRegionRegionCoprocessorHostWALRegionCoprocessorHostMemStoreIndexerIndexerIn...
Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible• After writing WAL, before index update...
Special Note: Failed Index Updates• Index is corrupted– Index Table does not exist– Index table does not have write schema...
Key Points• Custom KeyValues to enable index durabilityin primary table WAL• Custom WALEdit Codec for index update withWAL...
Upcoming Work• Performance testing• Standard covered index managers• Index cleanup on compactionHBase BoF - June 201340
Outline• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism• ConclusionHBase BoF - June ...
Conclusion• Fully transparent to client• Easy to build custom index maintenance• Meets current HBase consistency guarantee...
hbase-indexHBase BoF - June 201343https://github.com/forcedotcom/phoenix/tree/master/contrib/hbase-index
Detailed Blog PostHBase BoF - June 201344http://jyates.github.io/2013/06/11/hbase-consistent-secondary-indexing.html
Bonus!• Usable as a standalone module• Coming to phoenix*– Built-in support• Future: added to HBase core (?)HBase BoF - Ju...
Thanks! Questions!HBase BoF - June 201346@jesse_yatesjesse.k.yates@gmail.com
Upcoming SlideShare
Loading in...5
×

Secondary Indexing in Phoenix - Hadoop Summit 2012 - HBase BoF

2,029

Published on

Overview of the secondary indexing implementation coming soon in Phoenix (https://github.com/forcedotcom/phoenix)

Published in: Business, Technology
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,029
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
59
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide

Transcript of "Secondary Indexing in Phoenix - Hadoop Summit 2012 - HBase BoF"

  1. 1. Secondary Indexing in PhoenixJesse YatesHBase CommitterSoftware EngineerHBase BoF – June 25, 2013
  2. 2. Outline• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism• ConclusionHBase BoF - June 20132
  3. 3. A quick note…HBase BoF - June 20133
  4. 4. Outline• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism• ConclusionHBase BoF - June 20134
  5. 5. Why do we need them?• Sorted by key– Great for accessing on that keyHBase BoF - June 20135What if we want to access by another dimension!?
  6. 6. A short exampleHBase BoF - June 20136• Easy to search by name of food• Hard to search on another dimensionName Type Date Received Manufacturer Current CountApple Macintosh 6/23/13 Good Farm Inc. 200Turkey Breast 6/23/13 Tasty Meat Co. 42Chicken Drumstick 6/18/13 Pretty Ok Food 3Jam Strawberry 6/18/10 Mash It Up Inc. 700
  7. 7. A short exampleName Type Date Received Manufacturer Current CountApple Macintosh 6/23/13 Good Farm Inc. 200Turkey Breast 6/23/13 Tasty Meat Co. 42Chicken Drumstick 6/18/13 Pretty Ok Food 3Jam Strawberry 6/18/10 Mash It Up Inc. 700HBase BoF - June 20137Date Received Name Type Manufacturer Current Count6/18/13 Jam Strawberry Mash It Up Inc. 7006/18/13 Chicken Drumstick Pretty Ok Food 36/23/13 Apple Macintosh Good Farm Inc. 2006/23/13 Turkey Breast Tasty Meat Co. 42
  8. 8. Outline• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism• ConclusionHBase BoF - June 20138
  9. 9. HBase is “Special”…• Partitioned Keys (“HRegion”)• Scales because regions are independent• Built-in data recovery mechanismsHBase BoF - June 20139
  10. 10. Hasn’t someone tried this?• Omid• Percolator• Culvert• Lily• TrendMicro• Client-coordinatedHBase BoF - June 201310
  11. 11. We’ve gotten better…• NGData– HBase-SEP– HBase-Indexer• Intel– Lucene Full Text IndexingHBase BoF - June 201311
  12. 12. Still missing some things• In-HBase index storage– Just another table in HBase• Simple consistency guarantees– If X fails, then Y• Minimal overhead for covered indexes– Network roundtripsHBase BoF - June 201312
  13. 13. Outline• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism• ConclusionHBase BoF - June 201313
  14. 14. Two Major Components• Index Management– Build index updates– Ensures index is ‘cleaned up’• Recovery Mechanism– Ensures index updates are “ACID”HBase BoF - June 201314
  15. 15. Index ManagementHBase BoF - June 201315• Lives within a RegionCoprocesorObesrver• Access to the local Hregion• Specifies the mutations to apply to the indextablespublic interface IndexBuilder {public void setup(RegionCoprocessorEnvironment env);public Map<Mutation, String> getIndexUpdate(Put put);public Map<Mutation, String> getIndexUpdate(Delete delete);}
  16. 16. Index ManagementHBase BoF - June 201316
  17. 17. Key Observation #1“We shouldn’t need to provide strongerguarantees than HBase - that is just asking fora bad time.”- Jon HsiehHBase BoF - June 201317* Paraphrased*
  18. 18. HBase ACID• Does NOT give you:– Cross-row consistency– Cross-table consistency• Does give you:– Durable data on success– Visibility on success without partial rowsHBase BoF - June 201318
  19. 19. Key Observation #2“Secondary indexing is inherently an easierproblem than full transactions… secondaryindex updates are idempotent.”- Lars HofhanslHBase BoF - June 201319
  20. 20. Idempotent Index Updates• Doesn’t need full transactions• Replay as many times as needed• Can tolerate a little lag– As long as we get the order rightHBase BoF - June 201320
  21. 21. Taking a little ACID…HBase BoF - June 201321
  22. 22. HBase BoF - June 201322
  23. 23. Durable Indexing: Standard Write PathHBase BoF - June 201323Client HRegionRegionCoprocessorHostWALRegionCoprocessorHostMemStore
  24. 24. Durable Indexing: Standard Write PathHBase BoF - June 201324Client HRegionRegionCoprocessorHostWALRegionCoprocessorHostMemStore
  25. 25. Durable IndexingHBase BoF - June 201325RegionCoprocessorHostWALRegionCoprocessorHostIndexer IndexBuilderWAL UpdaterDurable!IndexerIndex TableIndex TableIndex Table
  26. 26. Failure Situations• Before writing the WAL– Nothing is durable, nothing is visibleHBase BoF - June 201326
  27. 27. Durable IndexingHBase BoF - June 201327Client HRegionRegionCoprocessorHostWALRegionCoprocessorHostMemStoreIndexerIndexerIndexTableIndexTableIndexTable
  28. 28. Failure Situations• Before writing the WAL– Nothing is durable, nothing is visibleHBase BoF - June 201328✔
  29. 29. Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible• After writing WAL, before index update– WAL Replay updates the index table and theprimary tableHBase BoF - June 201329✔
  30. 30. Durable IndexingHBase BoF - June 201330Client HRegionRegionCoprocessorHostWALRegionCoprocessorHostMemStoreIndexerIndexerIndexTableIndexTableIndexTable
  31. 31. Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible• After writing WAL, before index update– WAL Replay updates the index table and theprimary tableHBase BoF - June 201331✔✔
  32. 32. Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible• After writing WAL, before index update– WAL Replay updates the index table and theprimary table• Mid-index update– WAL Replay finishes index update, primary tableupdateHBase BoF - June 201332✔✔
  33. 33. Durable IndexingHBase BoF - June 201333Client HRegionRegionCoprocessorHostWALRegionCoprocessorHostMemStoreIndexerIndexerIndexTableIndexTableIndexTable
  34. 34. Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible• After writing WAL, before index update– WAL Replay updates the index table and the primarytable• Mid-index update– WAL Replay finishes index update, primary tableupdateHBase BoF - June 201334✔✔✔
  35. 35. Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible• After writing WAL, before index update– WAL Replay updates the index table and the primarytable• Mid-index update– WAL Replay finishes index update, primary tableupdate• After index updates, before primary– WAL Replay restores primary state, idempotentlyapplies index updatesHBase BoF - June 201335✔✔✔
  36. 36. Durable IndexingHBase BoF - June 201336Client HRegionRegionCoprocessorHostWALRegionCoprocessorHostMemStoreIndexerIndexerIndexTableIndexTableIndexTable
  37. 37. Failure Situations• Before writing the WAL– Nothing is durable, nothing is visible• After writing WAL, before index update– WAL Replay updates the index table and the primarytable• Mid-index update– WAL Replay finishes index update, primary tableupdate• After index updates, before primary– WAL Replay restores primary state, idempotentlyapplies index updatesHBase BoF - June 201337✔✔✔✔
  38. 38. Special Note: Failed Index Updates• Index is corrupted– Index Table does not exist– Index table does not have write schema– Etc.• Fail-fast behavior– Kill the whole server– Forces WAL Replay to enforce correctness– Modular enough to support alternative schemesHBase BoF - June 201338
  39. 39. Key Points• Custom KeyValues to enable index durabilityin primary table WAL• Custom WALEdit Codec for index update withWAL Replay• Will see index updates before primary– Only a little bit of lag and never ‘wrong’– Matches HBase consistency• Fail-fast behavior to enforce correctnessHBase BoF - June 201339
  40. 40. Upcoming Work• Performance testing• Standard covered index managers• Index cleanup on compactionHBase BoF - June 201340
  41. 41. Outline• Motivation• History• HBase Consistent Indexing– Index Management– Recovery Mechanism• ConclusionHBase BoF - June 201341
  42. 42. Conclusion• Fully transparent to client• Easy to build custom index maintenance• Meets current HBase consistency guarantees• Supports HBase 0.94.9+– Coming to 0.96/0.98 soon!HBase BoF - June 201342
  43. 43. hbase-indexHBase BoF - June 201343https://github.com/forcedotcom/phoenix/tree/master/contrib/hbase-index
  44. 44. Detailed Blog PostHBase BoF - June 201344http://jyates.github.io/2013/06/11/hbase-consistent-secondary-indexing.html
  45. 45. Bonus!• Usable as a standalone module• Coming to phoenix*– Built-in support• Future: added to HBase core (?)HBase BoF - June 201345* https://github.com/forcedotcom/phoenix
  46. 46. Thanks! Questions!HBase BoF - June 201346@jesse_yatesjesse.k.yates@gmail.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×