Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Strongly Consistent Global Indexes for Phoenix

202 views

Published on

Without transactional tables, the global indexes can get easily out of sync with their data tables in Phoenix. Transactional tables require a separate transaction manager, have some restrictions and performance penalties, are still in beta. This technical talk lays out a design to have strongly consistent global indexes without the need for an external transaction manager. In addition to having strongly consistent indexing, the proposed design aims to have minimal impact on read performance, minimal code changes, and significant operational simplification by eliminating index rebuilds. Our implementation of the design and initial performance testing has been very promising towards achieving these goals.

In Phoenix, global indexing is implemented using a separate table for each secondary index of a table. Updating a table with one or more global index requires updating multiple table regions likely distributed over multiple region servers. Translating a single table update operation into a multi-table write operation poses consistency issues as Phoenix does not provide a reliable multi-table update capability without using transactional tables.

View this presentation to learn more...

Published in: Technology
  • DOWNLOAD FULL eBOOK INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc eBook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookeBOOK Crime, eeBOOK Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Strongly Consistent Global Indexes for Phoenix

  1. 1. STRONGLY CONSISTENT GLOBAL INDEXES for Nontransactional Tables Designed by: Kadir Ozdemir Presenter: Gokcen Iskender
  2. 2. Outline ● Background ● What is new for mutable global indexes ● What is new for immutable global indexes ● Correctness of the new approach ● Performance implications
  3. 3. Terminology ● Global - Indexed data is stored in a separate physical table from the base table ● Immutable - Once data is written to the base table (and automatically persisted to the index), no indexed column in a row will ever change (though it may be deleted or age out due to a TTL setting) ● Mutable - Data can be freely changed. ● Mutation - Upserts and Deletes
  4. 4. Background - Global Mutable Indexes Application Server Application Phoenix Client HBase Client Upsert / Delete Batch of Mutations Region WAL Region Server (for a data table region) 1 HFile Indexer Region WAL Region Server (for an index table region) 4 HFile 2 3 3
  5. 5. Background - Global Immutable Indexes Application Server Application Phoenix Client HBase Client Upsert / Delete Batch of Mutations Region WAL Region Server (for a data table region) Region Servers (for an index table region) HFile Region WAL HFile
  6. 6. Global Indexes Can Get Out-of-Sync Easily! MUTABLE Global Indexes 1. Indexer goes through data table mutations and prepares corresponding mutations for index tables 1. Applies mutations to data table 1. Applies mutations on index table. --> These are likely to be done remotely as index table regions are likely to be on other region servers. Likely to fail due to RPC timeout, network, region server failures, etc Indexer for IMMUTABLE Global Indexes 1. Mutations are prepared on the client side 1. Data table and Index table mutations are sent to region servers in parallel 1. There is no deterministic order in which mutations are applied. Index and table can get out of sync.
  7. 7. Consistent Global Index Design Objectives ● Global indexes should be always in sync with their data tables ● Consistency should not result in significant performance or latency impact ● Redesign should not require rewriting of existing Phoenix modules ● Consistent indexes should result in operational simplification by eliminating index rebuilds Phoenix JIRAs (PHOENIX-5156 and PHOENIX-5211)
  8. 8. Observations ● An index table row can always be reconstructed from the corresponding data table row ● In HBase writes are fast -- we can add extra write phase without severely impacting write performance ● Distributed two-phase commit protocols, i.e., transactions, are known to be expensive. Existing solutions are in Beta.
  9. 9. New Design ● VERIFIED column on Index rows ● Reordered operations ● Extra write phase
  10. 10. Design Change for Mutable Global Indexes Current Design Write Path ● Update the data table ● Update the index tables (and wish for the best) Read Path ● Read the index rows (and assume they are all good) New Design Write Path ● Update the index table rows with unverified status ● Update the data table ● Update the index table rows with verified status Read Path ● Read the index rows and check their verify flag ● If a row is unverified, reconstruct the row from the data table
  11. 11. Design Change for Immutable Global Indexes Current Design Write Path ● Update the data table and the index tables in parallel (and wish for the best) Read Path ● Read the index rows (and assume they are all good) New Design (same as Mutable) Write Path ● Update the index tables rows with unverified status ● Update the data table ● Update the index table rows with verified status Read Path ● Read the index rows and check their verify flag ● If a row is unverified, reconstruct the row from the data table
  12. 12. Global Mutable Indexes - Mutate Application Server Application Phoenix Client HBase Client Upsert / Delete Batch of Mutations Region WAL Region Server (for a data table region) 0 3 HFile Indexer 1, 2, 4, 6, 8 5, 9 Region Server (for a index table region) Region WAL HFile Region Server (for a index table region) Region WAL HFile 5, 9 7
  13. 13. Global Mutable Indexes Batch Example - Update Data Table: Pk C1 C2 C3 1 A X Y Index (on C1, include C3): Pk C3 A, 1 Y Update C1 from A to B 1. Index tables are updated in parallel Update - Put {{A, 1}, VERIFIED=false} Insert - Put {{B, 1}, VERIFIED=false} 1. Data table write 2. Index tables set to verified/deleted Delete {A, 1} ---> Delete is done in third phase so that if it fails in first phase we can't recover without rebuild. Put {{B, 1}, VERIFIED = true}
  14. 14. Global Mutable Indexes Batch Example - Delete Data Table: Pk C1 C2 C3 1 A X Y Index (on C1, has C3): Pk C3 A, 1 Y Delete row with Pk = 1: 1. Index tables are updated in parallel) Update - Put {{A, 1}, VERIFIED=false} 1. Delete data table row Delete {1} 1. Delete index table row Delete {A, 1}
  15. 15. Global Immutable Indexes - Mutate Application Server Application Phoenix Client HBase Client Upsert / Delete Batch of Mutations Region WAL Region Server (for a data table region) Region Servers (for an index table region) HFile Region WAL HFile 1, 3 2 2 1, 3 1,2, 3
  16. 16. Global Mutable & Immutable Indexes - Read Application Server Application Phoenix Client HBase Client Select Scan Region Region Server (for a data table region) Region Servers (for an index table region) HFile Region WAL HFile 2, 7 Region HFileWAL A Scan Region Observer Global Index Checker Ungroupped Aggregate Region Observer Indexer 0 1 3 4 5 5 6 6 6
  17. 17. Correctness - Without concurrent updates ● VERIFIED = true => index update happened after data table update ● VERIFIED = false => data is read from data table ● Missing index row cases: Not possible. Because ○ Index table is updated first before that the data table in strict order, having the row in the data table implies that the index table update has been attempted. ○ If the index update is failed then the data table update will not be attempted and therefore, it is not possible to have a data table row but not the corresponding index row because of index update failures. ○ Since an index row is deleted only after the corresponding data table row is deleted, there cannot be missing row because data row deletes.
  18. 18. Correctness - With concurrent updates ● Detect it and not proceed with Phase 3 ● Read-repair reconstructs index from the data table
  19. 19. Upgrade ● No schema change since the VERIFIED column is an existing empty column. ● It is advised to rebuild indexes after PHOENIX-5156 to make sure that Index is always consistent for both old and new data.
  20. 20. Performance Preliminary results: ● Increase in 25% in write latency ● No noticeable increase in read latency Test Env: ● Data table with two indexes. ● 200K large rows on data table. ● 10 node AWS cluster ○ 4 core nodes, 2.3 Ghz, 10 GB disk, 32 GB memory VMs
  21. 21. Resources Phoenix Secondary Indexing: http://phoenix.apache.org/secondary_indexing.html PHOENIX-5018, PHOENIX-5190, PHOENIX-5156, PHOENIX-5211 Design doc: https://docs.google.com/document/d/1Vsf23GCT0_CK4q8g_xaXyE_4Dw 3aH71BfZypEy3T9iQ/edit?usp=sharing kozdemir@salesforce.com
  22. 22. Thank You!

×