TRANSACTIONS OVER HBASE
Alex Baranau @abaranau	

Gary Helmling @gario	

Continuuity
WHO WE ARE
• We’ve built Continuuity Reactor: the world’s first scale-out
application server for Hadoop
• Fast, easy development, deployment and management of
Hadoop and HBase apps
• Continuuity team has years of experience in using and contributing
to Open Source, and we intend to continue doing so.
2
AGENDA
• Transactions in stream processing: Why? What?
• Implementation: How?
• Omid-style transactions explained
• Transaction Manager
• What’s next?
3
THE REACTOR
• Continuuity Reactor is an app platform built on Hadoop and HBase
• Collect, Process, Store, and Query data.
• A Flow is a real-time processor with exactly-once guarantee
• A flow is composed of flowlets, connected via queues
• All processing happens with ACID guarantees in transactions
4
HBase
Table
PROCESSING IN A FLOW
5
...Queue ...
...
Flowlet
... ...
HBase
Table
PROCESSING IN A FLOW
6
...Queue ...
...
Flowlet
... ...
HBase
Table
PROCESSING IN A FLOW
7
...Queue ...
...
Flowlet
TRANSACTIONS: WHAT?
• Atomic - Entire transaction is committed as one
• Consistent - No partial state change due to failure
• Isolated - No dirty reads, transaction is only visible
after commit
• Durable - Once committed, data is persisted reliably
8
WHAT ABOUT HBASE?
• Atomic operations on cell value: 

checkAndPut, checkAndDelete, increment, append
• Atomic batch of operations on rows within region
9
• No cross region atomic operations support
• No cross table atomic operations support
• No multi-RPC atomic operations support
IMPLEMENTATION
OVERVIEW
10
OMID-STYLE TRANSACTIONS
• Multi-Version Concurrency Control
• Cell version (timestamp) = transaction ID
• All writes in the same transaction use the transaction ID as timestamp
• Reads exclude other, uncommitted transactions (for isolation)
• Optimistic Concurrency Control
• Conflict detection at commit of transaction
• Write Conflict: two overlapping transactions write the same row
• Rollback of one transaction in case of conflict (whichever commits later)
11
OPTIMISTIC CONCURRENCY
CONTROL
• Avoids cost of locking rows and tables
• No deadlocks or lock escalations
• Cost of conflict detection and possible rollback
is higher
• Good if conflicts are rare: short transaction,
disjoint partitioning of work
12
ZooKeeper
TRANSACTIONS IN CONTEXT
13
Tx Manager	

(standby)
HBase
Master 1
Master 2	

RS 1
RS 2 RS 4
RS 3
Client 1
Client 2
Client N
Tx Manager	

(active)
TRANSACTION LIFE CYCLE
time
out
try abort
failed
roll back
in HBase
write
to
HBase
do work
Client Tx Manager
none
complete V
abortsucceeded
in progress
start tx
start
start tx
commit
try commit check conflicts
RPC API
invalid X
invalidate
failed
HBase
CLIENT SIDE: TX AWARE
15
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
Client 2
write = 1002
read = 1001
HBase
CLIENT SIDE: TX AWARE
16
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
start
write = 1002
read = 1001
Client 2
write = 1002
read = 1001
HBase
CLIENT SIDE: TX AWARE
17
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
start
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
18
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
increment
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
19
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1
increment
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
20
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1 start
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
21
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1 start
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002, 1003]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
22
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
increment
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002, 1003]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
23
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002, 1003]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
24
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
25
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
26
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
conflict!
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
27
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
28
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
29
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
30
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[]
HBase
CLIENT SIDE: TX AWARE
31
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002
read = 1001
Client 2
write = 1004
read = 1003
inprogress=[]
HBase
CLIENT SIDE: TX AWARE
32
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1 start
Client 2
write = 1005
read = 1003
inprogress=[]
write = 1004
read = 1003
HBase
CLIENT SIDE: TX AWARE
33
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
read
Client 2
write = 1005
read = 1003
inprogress=[]
write = 1004
read = 1003
HBase
CLIENT SIDE: TX AWARE
34
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
conflict!
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
35
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
36
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback failed
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
37
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
invalidate
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
38
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
invalidate
write = 1002
read = 1001
Client 2
write = 1004
read = 1003
inprogress=[]
invalid=[1002]
HBase
CLIENT SIDE: TX AWARE
39
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 start
Client 2
write = 1005
read = 1003
inprogress=[]
invalid=[1002]
write = 1004
read = 1003
exclude = [1002]
HBase
CLIENT SIDE: TX AWARE
40
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
read
Client 2
write = 1005
read = 1003
inprogress=[]
invalid=[1002]
write = 1004
read = 1003
exclude = [1002]
invisible!
TRANSACTION MANAGER
• Create new transactions
• Provides monotonically increasing write pointers
• Maintains all in-progress, committed, and invalid transactions
• Detect conflicts
• Transaction =
	 	 	 Write Pointer: Timestamp for HBase writes
	 	 	 Read pointer: Upper bound timestamp for reads
	 	 	 Excludes: List of timestamps to exclude from reads
41
TRANSACTION MANAGER
• Simple  Fast
• All required state is in-memory
• Single point of failure?
• Persist all state to a write-ahead log
• Secondary Tx Manager watches for failure of Primary
• Failover can happen quickly
42
TRANSACTION MANAGER
43
Tx Manager
Current State
in progress
committed
invalid
read point
write point
start()
TRANSACTION MANAGER
44
Tx Manager
Current State
in progress (+)
committed
invalid
read point
write point ++
start()
Tx Log
started, write pt
HDFS
TRANSACTION MANAGER
45
Tx Manager
Current State
in progress (-)
committed (+)
invalid
read point
write point
commit()
Tx Log
start, write pt
commit, write pt
HDFS
TRANSACTION SNAPSHOTS
• Write-ahead log provides persistence
• Guarantees point-in-time recovery
• Longer the log grows, longer recovery takes
• Periodically write snapshot of full transaction state
• Snapshot + all new logs provides full state
46
Tx Manager
Current State
TRANSACTION SNAPSHOTS
47
Tx Log A
in progress
committed
invalid
read point
write point
HDFS
Tx Manager
Current State
TRANSACTION SNAPSHOTS
48
Tx Log A
in progress
committed
invalid
read point
write point Tx Log B1
HDFS
TRANSACTION SNAPSHOTS
49
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B2
HDFS
TRANSACTION SNAPSHOTS
50
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B
Tx Snapshot
in progress
committed
invalid
read point
write point
3
HDFS
TRANSACTION SNAPSHOTS
51
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B
Tx Snapshot
in progress
committed
invalid
read point
write point
4
HDFS
HBase
TRANSACTION CLEANUP
52
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback failed
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
TRANSACTION CLEANUP:
DATA JANITOR
• RegionObserver coprocessor
• Maintains in-memory snapshot of recent invalid 
in-progress sets
• Periodically updates from transaction snapshot in
HDFS
• Purges data from invalid transactions and older
versions on flush  compaction
53
HBase
TRANSACTION CLEANUP:
DATA JANITOR
54
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
refresh
Data Janitor	

(RegionObserver)
MemStore
preFlush()
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
Custom RegionScanner
HFile
Cell TS Value
HBase
TRANSACTION CLEANUP:
DATA JANITOR
55
Data Janitor	

(RegionObserver)
HFile
Cell TS Value
Custom RegionScanner
MemStore
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
56
Data Janitor	

(RegionObserver)
HFile
Cell TS Value
row1:col1 1004 12Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
57
Data Janitor	

(RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
HBase
TRANSACTION CLEANUP:
DATA JANITOR
58
Data Janitor	

(RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
1003 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
59
Data Janitor	

(RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
1003 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
60
Data Janitor	

(RegionObserver)
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Custom RegionScanner
preFlush()
MemStore
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
1003 11
WHAT’S NEXT?
• Open Source
• Continue Scaling Tx Manager
• Transaction Groups?
• Integration across other transactional stores
61
QS?
Looking for the chance to work with a team that is
defining a new category within Big Data?
!
We are hiring!
http://continuuity.com/careers
careers@continuuity.com
62

Transactions Over Apache HBase

  • 1.
    TRANSACTIONS OVER HBASE AlexBaranau @abaranau Gary Helmling @gario Continuuity
  • 2.
    WHO WE ARE •We’ve built Continuuity Reactor: the world’s first scale-out application server for Hadoop • Fast, easy development, deployment and management of Hadoop and HBase apps • Continuuity team has years of experience in using and contributing to Open Source, and we intend to continue doing so. 2
  • 3.
    AGENDA • Transactions instream processing: Why? What? • Implementation: How? • Omid-style transactions explained • Transaction Manager • What’s next? 3
  • 4.
    THE REACTOR • ContinuuityReactor is an app platform built on Hadoop and HBase • Collect, Process, Store, and Query data. • A Flow is a real-time processor with exactly-once guarantee • A flow is composed of flowlets, connected via queues • All processing happens with ACID guarantees in transactions 4
  • 5.
    HBase Table PROCESSING IN AFLOW 5 ...Queue ... ... Flowlet ... ...
  • 6.
    HBase Table PROCESSING IN AFLOW 6 ...Queue ... ... Flowlet ... ...
  • 7.
    HBase Table PROCESSING IN AFLOW 7 ...Queue ... ... Flowlet
  • 8.
    TRANSACTIONS: WHAT? • Atomic- Entire transaction is committed as one • Consistent - No partial state change due to failure • Isolated - No dirty reads, transaction is only visible after commit • Durable - Once committed, data is persisted reliably 8
  • 9.
    WHAT ABOUT HBASE? •Atomic operations on cell value: 
 checkAndPut, checkAndDelete, increment, append • Atomic batch of operations on rows within region 9 • No cross region atomic operations support • No cross table atomic operations support • No multi-RPC atomic operations support
  • 10.
  • 11.
    OMID-STYLE TRANSACTIONS • Multi-VersionConcurrency Control • Cell version (timestamp) = transaction ID • All writes in the same transaction use the transaction ID as timestamp • Reads exclude other, uncommitted transactions (for isolation) • Optimistic Concurrency Control • Conflict detection at commit of transaction • Write Conflict: two overlapping transactions write the same row • Rollback of one transaction in case of conflict (whichever commits later) 11
  • 12.
    OPTIMISTIC CONCURRENCY CONTROL • Avoidscost of locking rows and tables • No deadlocks or lock escalations • Cost of conflict detection and possible rollback is higher • Good if conflicts are rare: short transaction, disjoint partitioning of work 12
  • 13.
    ZooKeeper TRANSACTIONS IN CONTEXT 13 TxManager (standby) HBase Master 1 Master 2 RS 1 RS 2 RS 4 RS 3 Client 1 Client 2 Client N Tx Manager (active)
  • 14.
    TRANSACTION LIFE CYCLE time out tryabort failed roll back in HBase write to HBase do work Client Tx Manager none complete V abortsucceeded in progress start tx start start tx commit try commit check conflicts RPC API invalid X invalidate failed
  • 15.
    HBase CLIENT SIDE: TXAWARE 15 Cell TS Value row1:col1 1001 10 Tx Manager Client 1 Client 2 write = 1002 read = 1001
  • 16.
    HBase CLIENT SIDE: TXAWARE 16 Cell TS Value row1:col1 1001 10 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1002 read = 1001
  • 17.
    HBase CLIENT SIDE: TXAWARE 17 Cell TS Value row1:col1 1001 10 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002]
  • 18.
    HBase CLIENT SIDE: TXAWARE 18 Cell TS Value row1:col1 1001 10 Tx Manager Client 1 increment write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002]
  • 19.
    HBase CLIENT SIDE: TXAWARE 19 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 Tx Manager Client 1 increment write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002]
  • 20.
    HBase CLIENT SIDE: TXAWARE 20 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002] write = 1003 read = 1001 excluded=[1002]
  • 21.
    HBase CLIENT SIDE: TXAWARE 21 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002, 1003] write = 1003 read = 1001 excluded=[1002]
  • 22.
    HBase CLIENT SIDE: TXAWARE 22 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 increment write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002, 1003] write = 1003 read = 1001 excluded=[1002]
  • 23.
    HBase CLIENT SIDE: TXAWARE 23 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 commit write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002, 1003] write = 1003 read = 1001 excluded=[1002]
  • 24.
    HBase CLIENT SIDE: TXAWARE 24 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 commit write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002] write = 1003 read = 1001 excluded=[1002]
  • 25.
    HBase CLIENT SIDE: TXAWARE 25 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 commit write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 26.
    HBase CLIENT SIDE: TXAWARE 26 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 conflict! write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 27.
    HBase CLIENT SIDE: TXAWARE 27 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 28.
    HBase CLIENT SIDE: TXAWARE 28 Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 rollback write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 29.
    HBase CLIENT SIDE: TXAWARE 29 Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 abort write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 30.
    HBase CLIENT SIDE: TXAWARE 30 Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 abort write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[]
  • 31.
    HBase CLIENT SIDE: TXAWARE 31 Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 abort write = 1002 read = 1001 Client 2 write = 1004 read = 1003 inprogress=[]
  • 32.
    HBase CLIENT SIDE: TXAWARE 32 Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 start Client 2 write = 1005 read = 1003 inprogress=[] write = 1004 read = 1003
  • 33.
    HBase CLIENT SIDE: TXAWARE 33 Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 read Client 2 write = 1005 read = 1003 inprogress=[] write = 1004 read = 1003
  • 34.
    HBase CLIENT SIDE: TXAWARE 34 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 conflict! write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 35.
    HBase CLIENT SIDE: TXAWARE 35 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 36.
    HBase CLIENT SIDE: TXAWARE 36 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback failed write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 37.
    HBase CLIENT SIDE: TXAWARE 37 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 invalidate write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 38.
    HBase CLIENT SIDE: TXAWARE 38 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 invalidate write = 1002 read = 1001 Client 2 write = 1004 read = 1003 inprogress=[] invalid=[1002]
  • 39.
    HBase CLIENT SIDE: TXAWARE 39 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 start Client 2 write = 1005 read = 1003 inprogress=[] invalid=[1002] write = 1004 read = 1003 exclude = [1002]
  • 40.
    HBase CLIENT SIDE: TXAWARE 40 Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 read Client 2 write = 1005 read = 1003 inprogress=[] invalid=[1002] write = 1004 read = 1003 exclude = [1002] invisible!
  • 41.
    TRANSACTION MANAGER • Createnew transactions • Provides monotonically increasing write pointers • Maintains all in-progress, committed, and invalid transactions • Detect conflicts • Transaction = Write Pointer: Timestamp for HBase writes Read pointer: Upper bound timestamp for reads Excludes: List of timestamps to exclude from reads 41
  • 42.
    TRANSACTION MANAGER • Simple Fast • All required state is in-memory • Single point of failure? • Persist all state to a write-ahead log • Secondary Tx Manager watches for failure of Primary • Failover can happen quickly 42
  • 43.
    TRANSACTION MANAGER 43 Tx Manager CurrentState in progress committed invalid read point write point start()
  • 44.
    TRANSACTION MANAGER 44 Tx Manager CurrentState in progress (+) committed invalid read point write point ++ start() Tx Log started, write pt HDFS
  • 45.
    TRANSACTION MANAGER 45 Tx Manager CurrentState in progress (-) committed (+) invalid read point write point commit() Tx Log start, write pt commit, write pt HDFS
  • 46.
    TRANSACTION SNAPSHOTS • Write-aheadlog provides persistence • Guarantees point-in-time recovery • Longer the log grows, longer recovery takes • Periodically write snapshot of full transaction state • Snapshot + all new logs provides full state 46
  • 47.
    Tx Manager Current State TRANSACTIONSNAPSHOTS 47 Tx Log A in progress committed invalid read point write point HDFS
  • 48.
    Tx Manager Current State TRANSACTIONSNAPSHOTS 48 Tx Log A in progress committed invalid read point write point Tx Log B1 HDFS
  • 49.
    TRANSACTION SNAPSHOTS 49 Tx LogATx Manager in progress committed invalid read point write point Current State State Snapshot in progress committed invalid read point write point Tx Log B2 HDFS
  • 50.
    TRANSACTION SNAPSHOTS 50 Tx LogATx Manager in progress committed invalid read point write point Current State State Snapshot in progress committed invalid read point write point Tx Log B Tx Snapshot in progress committed invalid read point write point 3 HDFS
  • 51.
    TRANSACTION SNAPSHOTS 51 Tx LogATx Manager in progress committed invalid read point write point Current State State Snapshot in progress committed invalid read point write point Tx Log B Tx Snapshot in progress committed invalid read point write point 4 HDFS
  • 52.
    HBase TRANSACTION CLEANUP 52 Cell TSValue row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback failed write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 53.
    TRANSACTION CLEANUP: DATA JANITOR •RegionObserver coprocessor • Maintains in-memory snapshot of recent invalid in-progress sets • Periodically updates from transaction snapshot in HDFS • Purges data from invalid transactions and older versions on flush compaction 53
  • 54.
    HBase TRANSACTION CLEANUP: DATA JANITOR 54 TxSnapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] refresh Data Janitor (RegionObserver) MemStore preFlush() read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Cell TS Value row1:col1 1004 12 1003 11 1002 11 Custom RegionScanner HFile Cell TS Value
  • 55.
    HBase TRANSACTION CLEANUP: DATA JANITOR 55 DataJanitor (RegionObserver) HFile Cell TS Value Custom RegionScanner MemStore read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11
  • 56.
    HBase TRANSACTION CLEANUP: DATA JANITOR 56 DataJanitor (RegionObserver) HFile Cell TS Value row1:col1 1004 12Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11
  • 57.
    HBase TRANSACTION CLEANUP: DATA JANITOR 57 DataJanitor (RegionObserver) Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12
  • 58.
    HBase TRANSACTION CLEANUP: DATA JANITOR 58 DataJanitor (RegionObserver) Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12 1003 11
  • 59.
    HBase TRANSACTION CLEANUP: DATA JANITOR 59 DataJanitor (RegionObserver) Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12 1003 11
  • 60.
    HBase TRANSACTION CLEANUP: DATA JANITOR 60 DataJanitor (RegionObserver) read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Custom RegionScanner preFlush() MemStore Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12 1003 11
  • 61.
    WHAT’S NEXT? • OpenSource • Continue Scaling Tx Manager • Transaction Groups? • Integration across other transactional stores 61
  • 62.
    QS? Looking for thechance to work with a team that is defining a new category within Big Data? ! We are hiring! http://continuuity.com/careers careers@continuuity.com 62