Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon2017 Transactions in HBase

1,087 views

Published on

In the age of NoSQL, big data storage engines such as HBase have given up ACID semantics of traditional relational databases, in exchange for high scalability and availability. However, it turns out that in practice, many applications require consistency guarantees to protect data from concurrent modification in a massively parallel environment. In the past few years, several transaction engines have been proposed as add-ons to HBase; three different engines, namely Omid, Tephra, and Trafodion were open-sourced in Apache alone. In this talk, we will introduce and compare the different approaches from various perspectives including scalability, efficiency, operability and portability, and make recommendations pertaining to different use cases.

Published in: Technology
  • Be the first to comment

HBaseCon2017 Transactions in HBase

  1. 1. Transactions in HBase Andreas Neumann
 Gokul Gunasekaran HbaseCon June 2017 gokul at cask.co
 anew at apache.org @caskoid
  2. 2. Goals of this Talk - Why transactions? - Optimistic Concurrency Control - Three Apache projects: Omid, Tephra, Trafodion - How are they different? 2
  3. 3. Transactions in noSQL? History • SQL: RDBMS, EDW, … • noSQL: MapReduce, HDFS, HBase, … • n(ot)o(nly)SQL: Hive, Phoenix, … Motivation: • Data consistency under highly concurrent loads • Partial outputs after failure • Consistent view of data for long-running jobs • (Near) real-time processing 3
  4. 4. Stream Processing 4 HBase Table ...Queue ... ... Flowlet ... ...
  5. 5. HBase Table ...Queue ... ... Flowlet ... ... Write Conflict! 5
  6. 6. Transactions to the Rescue 6 HBase Table ...Queue ... ... Flowlet - Atomicity of all writes involved - Protection from concurrent update
  7. 7. ACID Properties From good old SQL: • Atomic - Entire transaction is committed as one • Consistent - No partial state change due to failure • Isolated - No dirty reads, transaction is only visible after commit • Durable - Once committed, data is persisted reliably 7
  8. 8. What is HBase? 8 … Client Region Server Region Region… Coprocessor Region Server Region Region… Coprocessor
  9. 9. What is HBase? 9 Simplified: • Distributed Key-Value Store • Key = <row>.<family>.<column>.<timestamp> • Partitioned into Regions (= continuous range of rows) • Each Region Server hosts multiple regions • Optional: Coprocessor in Region Server • Durable writes
  10. 10. ACID Properties in HBase • Atomic • At cell, row, and region level • Not across regions, tables or multiple calls • Consistent - No built-in rollback mechanism • Isolated - Timestamp filters provide some level of isolation • Durable - Once committed, data is persisted reliably How to implement full ACID? 10
  11. 11. Implementing Transactions • Traditional approach (RDBMS): locking • May produce deadlocks • Causes idle wait • complex and expensive in a distributed env • Optimistic Concurrency Control • lockless: allow concurrent writes to go forward • on commit, detect conflicts with other transactions • on conflict, roll back all changes and retry • Snapshot Isolation • Similar to repeatable read • Take snapshot of all data at transaction start • Read isolation 11
  12. 12. Optimistic Concurrency Control 12 time x=10client1: start fail/rollback client2: start read x commit must see the old value of x
  13. 13. Optimistic Concurrency Control 13 time incr xclient1: start commit client2: start incr x commit x=10 rollback x=11 sees the old 
 value of x=10
  14. 14. Conflicting Transactions 14 time tx:A
  15. 15. Conflicting Transactions 14 time tx:A tx:B
  16. 16. Conflicting Transactions 14 time tx:A tx:B tx:C (A fails)
  17. 17. Conflicting Transactions 14 time tx:A tx:B tx:C (A fails) tx:D (A fails)
  18. 18. Conflicting Transactions 14 time tx:A tx:B tx:C (A fails) tx:D (A fails) tx:E (E fails)
  19. 19. Conflicting Transactions 14 time tx:A tx:B tx:C (A fails) tx:D (A fails) tx:E (E fails) tx:F (F fails)
  20. 20. Conflicting Transactions 14 time tx:A tx:B tx:C (A fails) tx:D (A fails) tx:E (E fails) tx:F (F fails) tx:G
  21. 21. Conflicting Transactions • Two transactions have a conflict if • they write to the same cell • they overlap in time
 • If two transactions conflict, the one that commits later rolls back • Active change set = set of transactions t such that: • t is committed, and • there is at least one in-flight tx t’ that started before t’s commit time
 • This change set is needed in order to perform conflict detection. 15
  22. 22. HBase Transactions in Apache 16 Apache Omid (incubating) (incubating) (incubating)
  23. 23. In Common • Optimistic Concurrency Control must: • maintain Transaction State: • what tx are in flight and committed? • what is the change set of each tx? (for conflict detection, rollback) • what transactions are invalid (failed to roll back due to crash etc.) • generate unique transaction IDs • coordinate the life cycle of a transaction • start, detect conflicts, commit, rollback • All of { Omid, Tephra, Trafodion } implement this • but vary in how they do it 17
  24. 24. Apache Tephra • Based on the original Omid paper: Daniel Gómez Ferro, Flavio Junqueira, Ivan Kelly, Benjamin Reed, Maysam Yabandeh:
 Omid: Lock-free transactional support for distributed data stores. ICDE 2014.
 • Transaction Manager: • Issues unique, monotonic transaction IDs • Maintains the set of excluded (in-flight and invalid) transactions • Maintains change sets for active transactions • Performs conflict detection • Client: • Uses transaction ID as timestamp for writes • Filters excluded transactions for isolation • Performs rollback 18
  25. 25. Transaction Lifecycle 19
  26. 26. Transaction Lifecycle 19 in progress start new tx
  27. 27. Transaction Lifecycle 19 in progress start new tx write to HBase
  28. 28. Transaction Lifecycle 19 in progress start new tx write to HBase detect conflicts
  29. 29. Transaction Lifecycle 19 in progress start new tx write to HBase detect conflicts ok complete make visible
  30. 30. Transaction Lifecycle 19 in progress start new tx write to HBase aborting conflicts detect conflicts ok complete make visible
  31. 31. Transaction Lifecycle 19 in progress start new tx write to HBase aborting conflicts roll back in HBase ok detect conflicts ok complete make visible
  32. 32. Transaction Lifecycle 19 in progress start new tx write to HBase aborting conflicts invalid failure roll back in HBase ok detect conflicts ok complete make visible
  33. 33. Transaction Lifecycle 19 in progress start new tx write to HBase aborting conflicts invalid failure roll back in HBase ok time out detect conflicts ok complete make visible
  34. 34. Transaction Lifecycle 19 in progress start new tx write to HBase aborting conflicts invalid failure roll back in HBase ok time out detect conflicts ok complete make visible • Transaction consists of: • transaction ID (unique timestamp) • exclude list (in-flight and invalid tx)
 • Transactions that do complete • must still participate in conflict detection • disappear from transaction state
 when they do not overlap with in-flight tx
 • Transactions that do not complete • time out (by transaction manager) • added to invalid list
  35. 35. Apache Tephra 20 Tx
 ManagerClient A HBase Region Server x:10 37 Region Server in-flight: …
  36. 36. Apache Tephra 20 Tx
 ManagerClient A HBase Region Server x:10 37 Region Server in-flight: … start()
 id: 42, excludes = {…} ,42
  37. 37. Apache Tephra 20 Tx
 ManagerClient A HBase Region Server x:10 37 write 
 x=11 x:11 42 Region Server in-flight: … start()
 id: 42, excludes = {…} ,42
  38. 38. Apache Tephra 20 Tx
 ManagerClient A HBase Region Server x:10 37 write 
 x=11 x:11 42 Region Server write: 
 y=17 y:17 42 in-flight: … start()
 id: 42, excludes = {…} ,42
  39. 39. HBase Apache Tephra 21 Tx
 Manager Client B Region Server x:10 37 x:11 42 Region Server y:17 42 in-flight: …,42
  40. 40. HBase Apache Tephra 21 Tx
 Manager Client B Region Server x:10 37 x:11 42 Region Server y:17 42 in-flight: …,42 start()
 id: 48, excludes = {…,42} ,48
  41. 41. HBase Apache Tephra 21 Tx
 Manager read x Client B Region Server x:10 37 x:11 42 Region Server y:17 42 in-flight: …,42 start()
 id: 48, excludes = {…,42} ,48
  42. 42. HBase Apache Tephra 21 Tx
 Manager read x Client B x:10 Region Server x:10 37 x:11 42 Region Server y:17 42 in-flight: …,42 start()
 id: 48, excludes = {…,42} ,48
  43. 43. Region Server HBase Region Server x:10 37 y:17 42 Apache Tephra 22 Tx
 ManagerClient A x:11 42 in-flight: …,42
  44. 44. Region Server HBase Region Server x:10 37 y:17 42 Apache Tephra 22 Tx
 ManagerClient A x:11 42 commit()
 conflict in-flight: …,42
  45. 45. Region Server HBase Region Server x:10 37 y:17 42 Apache Tephra 22 Tx
 ManagerClient A x:11 42 roll back commit()
 conflict in-flight: …,42
  46. 46. Region Server HBase Region Server x:10 37 y:17 42 Apache Tephra 22 Tx
 ManagerClient A x:11 42 roll back commit()
 conflict x:10 37 in-flight: …,42
  47. 47. Region Server HBase Region Server x:10 37 y:17 42 Apache Tephra 22 Tx
 ManagerClient A x:11 42 roll back commit()
 conflict x:10 37 in-flight: …,42 in-flight: … make
 visible
  48. 48. HBase Apache Tephra 23 Region Server x:10 37 x:11 42 Region Server y:17 42 Tx
 ManagerClient A in-flight: …,42
  49. 49. HBase Apache Tephra 23 Region Server x:10 37 x:11 42 Region Server y:17 42 Tx
 ManagerClient A in-flight: …,42 commit()
 success in-flight: …
  50. 50. HBase Apache Tephra 23 Region Server x:10 37 x:11 42 Region Server y:17 42 Tx
 ManagerClient A in-flight: …,42 commit()
 success in-flight: …Client C start()
 id: 52, excludes: {…} in-flight: …,52
  51. 51. HBase Apache Tephra 23 Region Server x:10 37 x:11 42 Region Server y:17 42 read x x:11 Tx
 ManagerClient A in-flight: …,42 commit()
 success in-flight: …Client C start()
 id: 52, excludes: {…} in-flight: …,52
  52. 52. Apache Tephra 24 … Client Region Server Region Region… Coprocessor Region Server Region Region… Coprocessor HBase Tx
 Manager Tx id generation Tx lifecycle
 rollback Tx state lifecycle
 transitions data
 operations
  53. 53. Apache Tephra • HBase coprocessors • For efficient visibility filtering (on region-server side) • For eliminating invalid cells on flush and compaction • Programming Abstraction • TransactionalHTable: • Implements HTable interface • Existing code is easy to port • TransactionContext: • Implements transaction lifecycle 25
  54. 54. Apache Tephra - Example txTable = new TransactionAwareHTable(table);
 txContext = new TransactionContext(txClient, txTable);
 txContext.start(); try {
 // perform Hbase operations in txTable txTable.put(…); ... } catch (Exception e) { // throws TransactionFailureException(e)
 txContext.abort(e); } // throws TransactionConflictException if so
 txContext.finish(); 26
  55. 55. Apache Tephra - Strengths • Compatible with existing, non-tx data in HBase • Programming model • Same API as HTable, keep existing client code • Conflict detection granularity • Row, Column, Off • Special “long-running tx” for MapReduce and similar jobs • HA and Fault Tolerance • Checkpoints and WAL for transaction state, Standby Tx Manager • Replication compatible • Checkpoint to HBase, use HBase replication • Secure, Multi-tenant 27
  56. 56. Apache Tephra - Not-So Strengths • Exclude list can grow large over time • RPC, post-filtering overhead • Solution: Invalid tx pruning on compaction - complex! • Single Transaction Manager • performs all lifecycle state transitions, including conflict detection • conflict detection requires lock on the transaction state • becomes a bottleneck • Solution: distributed Transaction Manager with consensus protocol 28
  57. 57. Apache Trafodion • A complete distributed database (RDBMS) • transaction system is not available by itself • APIs: jdbc, SQL • Inspired by original HBase TRX (transactional region server • migrated transaction logic into coprocessors • coprocessors cache in-flight data in-memory • transaction state (change sets) in coprocessors • conflict detection with 2-phase commit • Transaction Manager • orchestrates transaction lifecycle across involved region servers • multiple instances, but one per client 29 (incubating)
  58. 58. Apache Trafodion 30
  59. 59. Apache Trafodion 31 Tx
 ManagerClient A HBase Region Server x:10 Region Server in-flight: …
  60. 60. Apache Trafodion 31 Tx
 ManagerClient A HBase Region Server x:10 Region Server in-flight: … start()
 id:42 ,42
  61. 61. Apache Trafodion 31 Tx
 ManagerClient A HBase Region Server x:10 Region Server in-flight: … start()
 id:42 ,42write 
 x=11 x:11 region:
 … ,42
  62. 62. Apache Trafodion 31 Tx
 ManagerClient A HBase Region Server x:10 Region Server in-flight: … start()
 id:42 ,42 write: 
 y=17 y:17 write 
 x=11 x:11 region:
 … ,42
  63. 63. Apache Trafodion 32 Tx
 Manager Client B in-flight: …,42 HBase Region Server x:10 Region Server x:11 y:17
  64. 64. Apache Trafodion 32 Tx
 Manager Client B in-flight: …,42 start()
 id: 48 ,48 HBase Region Server x:10 Region Server x:11 y:17
  65. 65. Apache Trafodion 32 Tx
 Manager read x Client B in-flight: …,42 start()
 id: 48 ,48 HBase Region Server x:10 Region Server x:11 y:17
  66. 66. Apache Trafodion 32 Tx
 Manager read x Client B x:10 in-flight: …,42 start()
 id: 48 ,48 HBase Region Server x:10 Region Server x:11 y:17
  67. 67. HBase Apache Trafodion 33 Tx
 ManagerClient A in-flight: …,42 Region Server x:10 Region Server x:11 y:17
  68. 68. HBase Apache Trafodion 33 Tx
 ManagerClient A commit()
 in-flight: …,42 Region Server x:10 Region Server x:11 y:17
  69. 69. HBase Apache Trafodion 33 Tx
 ManagerClient A 1. conflicts? commit()
 in-flight: …,42 Region Server x:10 Region Server x:11 y:17
  70. 70. HBase Apache Trafodion 33 Tx
 ManagerClient A 1. conflicts? commit()
 in-flight: …,42 Region Server x:10 Region Server x:11 y:17
  71. 71. HBase Apache Trafodion 33 Tx
 ManagerClient A 1. conflicts? commit()
 in-flight: …,42 Region Server x:10 Region Server x:11 y:17 2. roll back
  72. 72. HBase Apache Trafodion 33 Tx
 ManagerClient A 1. conflicts? commit()
 in-flight: …,42 Region Server x:10 Region Server x:11 y:17 2. roll back
  73. 73. HBase Apache Trafodion 33 Tx
 ManagerClient A 1. conflicts? commit()
 in-flight: …,42 in-flight: … Region Server x:10 Region Server x:11 y:17 2. roll back
  74. 74. HBase Apache Trafodion 34 Tx
 ManagerClient A in-flight: …,42 Region Server x:10 Region Server x:11 y:17
  75. 75. HBase Apache Trafodion 34 Tx
 ManagerClient A commit()
 in-flight: …,42 Region Server x:10 Region Server x:11 y:17
  76. 76. HBase Apache Trafodion 34 Tx
 ManagerClient A 1. conflicts? commit()
 in-flight: …,42 Region Server x:10 Region Server x:11 y:17
  77. 77. HBase Apache Trafodion 34 Tx
 ManagerClient A 1. conflicts? commit()
 in-flight: …,42 Region Server x:10 Region Server x:11 y:17
  78. 78. HBase Apache Trafodion 34 Tx
 ManagerClient A 1. conflicts? commit()
 in-flight: …,42 Region Server x:10 Region Server x:11 y:17 2. commit!
  79. 79. HBase Apache Trafodion 34 Tx
 ManagerClient A 1. conflicts? commit()
 in-flight: …,42 Region Server x:10 Region Server x:11 y:17 2. commit! x:11 y:17
  80. 80. HBase Apache Trafodion 34 Tx
 ManagerClient A 1. conflicts? commit()
 in-flight: …,42 in-flight: … Region Server x:10 Region Server x:11 y:17 2. commit! x:11 y:17
  81. 81. HBase Apache Trafodion 35 Client Region Server Region Region… Coprocessor Region Server Region Region… Coprocessor Tx
 Manager Tx id generation conflicts Tx state Tx life cycle (commit) transitions
 region ids 2-phase
 commit data
 operations Tx lifecycle In-flight data Client 2 Tx 2 Manager
  82. 82. Apache Trafodion • Scales well: • Conflict detection is distributed: no single bottleneck • Commit coordination by multiple transaction managers • Optimization: bypass 2-hase commit if single region • Coprocessors cache in-flight data in Memory • Flushed to HBase only on commit • Committed read (not snapshot, not repeatable read) • Option: cause conflicts for reads, too • HA and Fault Tolerance • WAL for all state • All services are redundant and take over for each other • Replication: Only in paid (non-Apache) add-on 36
  83. 83. Apache Trafodion - Strengths • Very good scalability • Scales almost linearly • Especially for very small transactions • Familiar SQL/jdbc interface for RDB programmers • Redundant and fault-tolerant • Secure and multi-tenant: • Trafodion/SQL layer provides authn+authz 37
  84. 84. Apache Trafodion - Not-So Strengths • Monolithic, not available as standalone transaction system • Heavy load on coprocessors • memory and compute • Large transactions (e.g., MapReduce) will cause Out-of-memory • no special support for long-running transactions 38
  85. 85. Apache Omid • Evolution of Omid based on the Google Percolator paper: Daniel Peng, Frank Dabek: Large-scale Incremental Processing Using Distributed Transactions and Notifications, USENIX 2010.
 • Idea: Move as much transaction state as possible into HBase • Shadow cells represent the state of a transaction • One shadow cell for every data cell written • Track committed transactions in an HBase table • Transaction Manager (TSO) has only 3 tasks • issue transaction IDs • conflict detection • write to commit table 39
  86. 86. Apache Omid 40
  87. 87. Apache Omid 41 Tx
 Manager Client A HBase Region Server x:10 37: commit.40 Region Server Commits
 37: 40
  88. 88. Apache Omid 41 Tx
 Manager Client A start()
 id: 42 HBase Region Server x:10 37: commit.40 Region Server Commits
 37: 40
  89. 89. Apache Omid 41 Tx
 Manager Client A start()
 id: 42 HBase Region Server x:10 37: commit.40 write 
 x=11 x:11 42: in-flight Region Server Commits
 37: 40
  90. 90. Apache Omid 41 Tx
 Manager Client A start()
 id: 42 HBase Region Server x:10 37: commit.40 write 
 x=11 x:11 42: in-flight Region Server Commits
 37: 40 write: 
 y=17 y:17 42: in-flight
  91. 91. HBase Apache Omid 42 Tx
 Manager Client B Region Server x:10 37: commit.40 x:11 42: in-flight Region Server y:17 Commits
 37: 4042: in-flight
  92. 92. HBase Apache Omid 42 Tx
 Managerstart()
 id: 48 Client B Region Server x:10 37: commit.40 x:11 42: in-flight Region Server y:17 Commits
 37: 4042: in-flight
  93. 93. HBase Apache Omid 42 Tx
 Managerstart()
 id: 48 read x Client B Region Server x:10 37: commit.40 x:11 42: in-flight Region Server y:17 Commits
 37: 4042: in-flight
  94. 94. HBase Apache Omid 42 Tx
 Managerstart()
 id: 48 read x Client B x:10 Region Server x:10 37: commit.40 x:11 42: in-flight Region Server y:17 Commits
 37: 4042: in-flight
  95. 95. Region Server HBase Region Server x:10 37: commit.40 y:17 42: in-flight Apache Omid 43 Tx
 Manager Client A Commits
 37: 40 x:11 42: in-flight
  96. 96. Region Server HBase Region Server x:10 37: commit.40 y:17 42: in-flight Apache Omid 43 Tx
 Manager Client A Commits
 37: 40 x:11 42: in-flight commit()
 conflict
  97. 97. Region Server HBase Region Server x:10 37: commit.40 y:17 42: in-flight Apache Omid 43 Tx
 Manager Client A Commits
 37: 40 x:11 42: in-flight roll back commit()
 conflict
  98. 98. Region Server HBase Region Server x:10 37: commit.40 y:17 42: in-flight Apache Omid 43 Tx
 Manager Client A Commits
 37: 40 x:11 42: in-flight roll back commit()
 conflict x:10 37: commit.40
  99. 99. HBase Apache Omid 44 Region Server Tx
 Manager Client A x:10 37: commit.40 x:11 42: in-flight Region Server y:17 Commits
 37: 4042: in-flight
  100. 100. HBase Apache Omid 44 Region Server Tx
 Manager Client A x:10 37: commit.40 x:11 42: in-flight Region Server y:17 Commits
 37: 4042: in-flight commit()
 success:50 42: 50
  101. 101. HBase Apache Omid 44 Region Server Tx
 Manager Client A x:10 37: commit.40 x:11 42: in-flight Region Server y:17 Commits
 37: 4042: in-flight mark as
 committed 42: commit.50 42: commit.50 commit()
 success:50 42: 50
  102. 102. HBase Apache Omid 44 Region Server Tx
 Manager Client A Client C start()
 id: 52 x:10 37: commit.40 x:11 42: in-flight Region Server y:17 Commits
 37: 4042: in-flight mark as
 committed 42: commit.50 42: commit.50 commit()
 success:50 42: 50
  103. 103. HBase Apache Omid 44 Region Server Tx
 Manager Client A Client C start()
 id: 52 x:10 37: commit.40 x:11 42: in-flight Region Server y:17 Commits
 37: 4042: in-flight mark as
 committed 42: commit.50 42: commit.50 read x x:11 commit()
 success:50 42: 50
  104. 104. Apache Omid - Future • Atomic commit with linking? • Eliminate need for commit table 45 HBase Region Server x:10 37: commit.40 x:11 42: in-flight Region Server Commits
 37: 40y:17
  105. 105. HBase Apache Omid 46 Client Region Server Region Region… Coprocessor Region Server Region Region… Coprocessor Tx
 Manager Tx id generation Conflict detection start
 commit data
 operations
 + shadow cells Tx state Tx lifecycle
 rollback commit commit
 table
  106. 106. Apache Omid - Strengths • Transaction state is in the database • Shadow cells plus commit table • Scales with the size of the cluster • Transaction Manager is lightweight • Generation of tx IDs delegated to timestamp oracle • Conflict detection • Writing to commit table • Fault Tolerance: • After failure, fail all existing transactions attempting to commit • Self-correcting: Read clients can delete invalid cells 47
  107. 107. Apache Omid - Not So Strengths • Storage intensive - shadow cells double the space • I/O intensive - every cell requires two writes 1. write data and shadow cell 2. record commit in shadow cell • Reads may also require two reads from HBase (commit table) • Producer/Consumer: will often find the (uncommitted) shadow cell • Scans: high througput sequential read disrupted by frequent lookups • Security/Multi-tenancy: • All clients need access to commit table • Read clients need write access to repair invalid data • Replication: Not implemented 48
  108. 108. Summary 49 Apache Tephra Apache Trafodion Apache Omid Tx State Tx Manager Distributed to 
 region servers Tx Manager (changes) HBase (shadows/commits) Conflict detection Tx Manager Distributed to regions, 2- phase commit Tx Manager ID generation Tx Manager Distributed to multiple Tx Managers Tx Manager API HTable SQL Custom Multi-tenant Yes Yes No Strength Scans, Large Tx, API
 Scalable, full SQL Scale, throughput So so Scale, Throughput API not Hbase, Large Tx Scans, Producer/Consumer
  109. 109. Links Join the community: 50 Apache Omid (incubating)
 http://omid.apache.org/ (incubating)
 http://trafodion.apache.org/ (incubating)
 http://tephra.apache.org/
  110. 110. Thank you … for listening to my talk. Credits: - Sean Broeder, Narendra Goyal (Trafodion) - Francisco Perez-Sorrosal (Omid) 51
  111. 111. Thank you … for listening to my talk. Credits: - Sean Broeder, Narendra Goyal (Trafodion) - Francisco Perez-Sorrosal (Omid) 51 Questions?

×