SlideShare a Scribd company logo
1 of 51
Download to read offline
Transactions in HBase
Andreas Neumann
ApacheCon Big Data May 2017
anew at apache.org
@caskoid
Goals of this Talk
- Why transactions?
- Optimistic Concurrency Control
- Three Apache projects: Omid, Tephra, Trafodion
- How are they different?
2
Transactions in noSQL?
History
• SQL: RDBMS, EDW, …
• noSQL: MapReduce, HDFS, HBase, …
• n(ot)o(nly)SQL: Hive, Phoenix, …
Motivation:
• Data consistency under highly concurrent loads
• Partial outputs after failure
• Consistent view of data for long-running jobs
• (Near) real-time processing
3
Stream Processing
4
HBase
Table
...Queue ...
...
Flowlet
... ...
HBase
Table
...Queue ...
...
Flowlet
... ...
Write Conflict!
5
Transactions to the Rescue
6
HBase
Table
...Queue ...
...
Flowlet
- Atomicity of all writes involved
- Protection from concurrent update
ACID Properties
From good old SQL:
• Atomic - Entire transaction is committed as one
• Consistent - No partial state change due to failure
• Isolated - No dirty reads, transaction is only visible after commit
• Durable - Once committed, data is persisted reliably
7
What is HBase?
8
…
Client
Region Server
Region Region…
Coprocessor
Region Server
Region Region…
Coprocessor
What is HBase?
9
Simplified:
• Distributed Key-Value Store
• Key = <row>.<family>.<column>.<timestamp>
• Partitioned into Regions (= continuous range of rows)
• Each Region Server hosts multiple regions
• Optional: Coprocessor in Region Server
• Durable writes
ACID Properties in HBase
• Atomic
• At cell, row, and region level
• Not across regions, tables or multiple calls
• Consistent - No built-in rollback mechanism
• Isolated - Timestamp filters provide some level of isolation
• Durable - Once committed, data is persisted reliably
How to implement full ACID?
10
Implementing Transactions
• Traditional approach (RDBMS): locking
• May produce deadlocks
• Causes idle wait
• complex and expensive in a distributed env
• Optimistic Concurrency Control
• lockless: allow concurrent writes to go forward
• on commit, detect conflicts with other transactions
• on conflict, roll back all changes and retry
• Snapshot Isolation
• Similar to repeatable read
• Take snapshot of all data at transaction start
• Read isolation
11
Optimistic Concurrency Control
12
time
x=10client1: start fail/rollback
client2: start read x commit
must see the
old value of x
Optimistic Concurrency Control
13
time
incr xclient1: start commit
client2: start incr x commit
x=10
rollback
x=11
sees the old 

value of x=10
Conflicting Transactions
14
time
tx:A
tx:B
tx:C (A fails)
tx:D (A fails)
tx:E (E fails)
tx:F (F fails)
tx:G
Conflicting Transactions
• Two transactions have a conflict if
• they write to the same cell
• they overlap in time

• If two transactions conflict, the one that commits later rolls back
• Active change set = set of transactions t such that:
• t is committed, and
• there is at least one in-flight tx t’ that started before t’s commit time

• This change set is needed in order to perform conflict detection.
15
HBase Transactions in Apache
16
Apache Omid (incubating)
(incubating)
(incubating)
In Common
• Optimistic Concurrency Control must:
• maintain Transaction State:
• what tx are in flight and committed?
• what is the change set of each tx? (for conflict detection, rollback)
• what transactions are invalid (failed to roll back due to crash etc.)
• generate unique transaction IDs
• coordinate the life cycle of a transaction
• start, detect conflicts, commit, rollback
• All of { Omid, Tephra, Trafodion } implement this
• but vary in how they do it
17
Apache Tephra
• Based on the original Omid paper:
Daniel Gómez Ferro, Flavio Junqueira, Ivan Kelly, Benjamin Reed, Maysam Yabandeh:

Omid: Lock-free transactional support for distributed data stores. ICDE 2014.

• Transaction Manager:
• Issues unique, monotonic transaction IDs
• Maintains the set of excluded (in-flight and invalid) transactions
• Maintains change sets for active transactions
• Performs conflict detection
• Client:
• Uses transaction ID as timestamp for writes
• Filters excluded transactions for isolation
• Performs rollback
18
Transaction Lifecycle
19
in progress
start new tx
write
to
HBase
aborting
conflicts
invalid
failure
roll back
in HBase
ok
time
out
detect conflicts
ok
complete
make visible
• Transaction consists of:
• transaction ID (unique timestamp)
• exclude list (in-flight and invalid tx)

• Transactions that do complete
• must still participate in conflict detection
• disappear from transaction state

when they do not overlap with in-flight tx

• Transactions that do not complete
• time out (by transaction manager)
• added to invalid list
Apache Tephra
20
Tx

ManagerClient A
HBase
Region Server
x:10 37
write 

x=11
x:11 42
Region Server
write: 

y=17
y:17 42
in-flight:
…
start()

id: 42, excludes = {…}
,42
HBase
Apache Tephra
21
Tx

Manager
read x
Client B
x:10
Region Server
x:10 37
x:11 42
Region Server
y:17 42
in-flight:
…,42
start()

id: 48, excludes = {…,42} ,48
Region Server
HBase
Region Server
x:10 37 y:17 42
Apache Tephra
22
Tx

ManagerClient A
x:11 42
roll back
commit()

conflict
x:10 37
in-flight:
…,42
in-flight:
…
make

visible
HBase
Apache Tephra
23
Region Server
x:10 37
x:11 42
Region Server
y:17 42
read x
x:11
Tx

ManagerClient A
in-flight:
…,42
commit()

success
in-flight:
…Client C start()

id: 52, excludes: {…}
in-flight:
…,52
Apache Tephra
24
…
Client
Region Server
Region Region…
Coprocessor
Region Server
Region Region…
Coprocessor
HBase
Tx

Manager
Tx id generation
Tx lifecycle

rollback
Tx state
lifecycle

transitions
data

operations
Apache Tephra
• HBase coprocessors
• For efficient visibility filtering (on region-server side)
• For eliminating invalid cells on flush and compaction
• Programming Abstraction
• TransactionalHTable:
• Implements HTable interface
• Existing code is easy to port
• TransactionContext:
• Implements transaction lifecycle
25
Apache Tephra - Example
txTable = new TransactionAwareHTable(table);

txContext = new TransactionContext(txClient, txTable);

txContext.start();
try {

// perform Hbase operations in txTable
txTable.put(…);
...
} catch (Exception e) {
// throws TransactionFailureException(e)

txContext.abort(e);
}
// throws TransactionConflictException if so

txContext.finish();
26
Apache Tephra - Strengths
• Compatible with existing, non-tx data in HBase
• Programming model
• Same API as HTable, keep existing client code
• Conflict detection granularity
• Row, Column, Off
• Special “long-running tx” for MapReduce and similar jobs
• HA and Fault Tolerance
• Checkpoints and WAL for transaction state, Standby Tx Manager
• Replication compatible
• Checkpoint to HBase, use HBase replication
• Secure, Multi-tenant
27
Apache Tephra - Not-So Strengths
• Exclude list can grow large over time
• RPC, post-filtering overhead
• Solution: Invalid tx pruning on compaction - complex!
• Single Transaction Manager
• performs all lifecycle state transitions, including conflict detection
• conflict detection requires lock on the transaction state
• becomes a bottleneck
• Solution: distributed Transaction Manager with consensus protocol
28
Apache Trafodion
• A complete distributed database (RDBMS)
• transaction system is not available by itself
• APIs: jdbc, SQL
• Inspired by original HBase TRX (transactional region server
• migrated transaction logic into coprocessors
• coprocessors cache in-flight data in-memory
• transaction state (change sets) in coprocessors
• conflict detection with 2-phase commit
• Transaction Manager
• orchestrates transaction lifecycle across involved region servers
• multiple instances, but one per client
29
(incubating)
Apache Trafodion
30
Apache Trafodion
31
Tx

ManagerClient A
HBase
Region Server
x:10
Region Server
in-flight:
…
start()

id:42
,42
write: 

y=17
y:17
write 

x=11
x:11
region:

…
,42
Apache Trafodion
32
Tx

Manager
read x
Client B
x:10
in-flight:
…,42
start()

id: 48 ,48
HBase
Region Server
x:10
Region Server
x:11 y:17
HBase
Apache Trafodion
33
Tx

ManagerClient A
1. conflicts?
commit()

in-flight:
…,42
in-flight:
…
Region Server
x:10
Region Server
x:11 y:17
2. roll back
HBase
Apache Trafodion
34
Tx

ManagerClient A
1. conflicts?
commit()

in-flight:
…,42
in-flight:
…
Region Server
x:10
Region Server
x:11 y:17
2. commit!
x:11 y:17
HBase
Apache Trafodion
35
Client
Region Server
Region Region…
Coprocessor
Region Server
Region Region…
Coprocessor
Tx

Manager
Tx id generation
conflicts
Tx state
Tx life cycle (commit)
transitions

region ids
2-phase

commit
data

operations
Tx lifecycle
In-flight data
Client 2 Tx 2
Manager
Apache Trafodion
• Scales well:
• Conflict detection is distributed: no single bottleneck
• Commit coordination by multiple transaction managers
• Optimization: bypass 2-hase commit if single region
• Coprocessors cache in-flight data in Memory
• Flushed to HBase only on commit
• Committed read (not snapshot, not repeatable read)
• Option: cause conflicts for reads, too
• HA and Fault Tolerance
• WAL for all state
• All services are redundant and take over for each other
• Replication: Only in paid (non-Apache) add-on
36
Apache Trafodion - Strengths
• Very good scalability
• Scales almost linearly
• Especially for very small transactions
• Familiar SQL/jdbc interface for RDB programmers
• Redundant and fault-tolerant
• Secure and multi-tenant:
• Trafodion/SQL layer provides authn+authz
37
Apache Trafodion - Not-So Strengths
• Monolithic, not available as standalone transaction system
• Heavy load on coprocessors
• memory and compute
• Large transactions (e.g., MapReduce) will cause Out-of-memory
• no special support for long-running transactions
38
Apache Omid
• Evolution of Omid based on the Google Percolator paper:
Daniel Peng, Frank Dabek: Large-scale Incremental Processing Using
Distributed Transactions and Notifications, USENIX 2010.

• Idea: Move as much transaction state as possible into HBase
• Shadow cells represent the state of a transaction
• One shadow cell for every data cell written
• Track committed transactions in an HBase table
• Transaction Manager (TSO) has only 3 tasks
• issue transaction IDs
• conflict detection
• write to commit table
39
Apache Omid
40
Apache Omid
41
Tx

Manager
Client A
start()

id: 42
HBase
Region Server
x:10 37: commit.40
write 

x=11
x:11 42: in-flight
Region Server Commits

37: 40
write: 

y=17
y:17 42: in-flight
HBase
Apache Omid
42
Tx

Managerstart()

id: 48
read x
Client B
x:10
Region Server
x:10 37: commit.40
x:11 42: in-flight
Region Server
y:17
Commits

37: 4042: in-flight
Region Server
HBase
Region Server
x:10 37: commit.40 y:17 42: in-flight
Apache Omid
43
Tx

Manager
Client A
Commits

37: 40
x:11 42: in-flight
roll back
commit()

conflict
x:10 37: commit.40
HBase
Apache Omid
44
Region Server
Tx

Manager
Client A
Client C start()

id: 52
x:10 37: commit.40
x:11 42: in-flight
Region Server
y:17
Commits

37: 4042: in-flight
mark as

committed
42: commit.50
42: commit.50
read x
x:11
commit()

success:50
42: 50
Apache Omid - Future
• Atomic commit with linking?
• Eliminate need for commit table
45
HBase
Region Server
x:10 37: commit.40
x:11 42: in-flight
Region Server Commits

37: 40y:17
HBase
Apache Omid
46
Client
Region Server
Region Region…
Coprocessor
Region Server
Region Region…
Coprocessor
Tx

Manager
Tx id generation
Conflict detection
start

commit
data

operations

+ shadow cells
Tx state
Tx lifecycle

rollback
commit
commit

table
Apache Omid - Strengths
• Transaction state is in the database
• Shadow cells plus commit table
• Scales with the size of the cluster
• Transaction Manager is lightweight
• Generation of tx IDs delegated to timestamp oracle
• Conflict detection
• Writing to commit table
• Fault Tolerance:
• After failure, fail all existing transactions attempting to commit
• Self-correcting: Read clients can delete invalid cells
47
Apache Omid - Not So Strengths
• Storage intensive - shadow cells double the space
• I/O intensive - every cell requires two writes
1. write data and shadow cell
2. record commit in shadow cell
• Reads may also require two reads from HBase (commit table)
• Producer/Consumer: will often find the (uncommitted) shadow cell
• Scans: high througput sequential read disrupted by frequent lookups
• Security/Multi-tenancy:
• All clients need access to commit table
• Read clients need write access to repair invalid data
• Replication: Not implemented
48
Summary
49
Apache Tephra Apache Trafodion Apache Omid
Tx State Tx Manager
Distributed to 

region servers
Tx Manager (changes)
HBase (shadows/commits)
Conflict detection Tx Manager
Distributed to regions, 2-
phase commit
Tx Manager
ID generation Tx Manager
Distributed to multiple
Tx Managers
Tx Manager
API HTable SQL Custom
Multi-tenant Yes Yes No
Strength Scans, Large Tx, API
 Scalable, full SQL Scale, throughput
So so Scale, Throughput API not Hbase, Large Tx Scans, Producer/Consumer
Links
Join the community:
50
Apache Omid (incubating)

http://omid.apache.org/
(incubating)

http://trafodion.apache.org/
(incubating)

http://tephra.apache.org/
Thank you
… for listening to my talk.
Credits:
- Sean Broeder, Narendra Goyal (Trafodion)
- Francisco Perez-Sorrosal (Omid)
51
Questions?

More Related Content

What's hot

[Altibase] 10 replication part3 (system design)
[Altibase] 10 replication part3 (system design)[Altibase] 10 replication part3 (system design)
[Altibase] 10 replication part3 (system design)
altistory
 
Cassandra drivers
Cassandra driversCassandra drivers
Cassandra drivers
Tyler Hobbs
 

What's hot (6)

A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loose...
Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loose...Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loose...
Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loose...
 
[Altibase] 10 replication part3 (system design)
[Altibase] 10 replication part3 (system design)[Altibase] 10 replication part3 (system design)
[Altibase] 10 replication part3 (system design)
 
[Altibase] 12 replication part5 (optimization and monitoring)
[Altibase] 12 replication part5 (optimization and monitoring)[Altibase] 12 replication part5 (optimization and monitoring)
[Altibase] 12 replication part5 (optimization and monitoring)
 
Cassandra drivers
Cassandra driversCassandra drivers
Cassandra drivers
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
 

Similar to Transaction in HBase, by Andreas Neumann, Cask

HBaseCon2015-final
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-final
Maryann Xue
 

Similar to Transaction in HBase, by Andreas Neumann, Cask (20)

HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
TDC2017 | São Paulo - Trilha Containers How we figured out we had a SRE team ...
TDC2017 | São Paulo - Trilha Containers How we figured out we had a SRE team ...TDC2017 | São Paulo - Trilha Containers How we figured out we had a SRE team ...
TDC2017 | São Paulo - Trilha Containers How we figured out we had a SRE team ...
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional Management
 
Blockchain meets database
Blockchain meets databaseBlockchain meets database
Blockchain meets database
 
LeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityLeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo University
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Introduction to Ethereum
Introduction to EthereumIntroduction to Ethereum
Introduction to Ethereum
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
 
Apache Tephra
Apache TephraApache Tephra
Apache Tephra
 
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinteresthbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Recent work on HBase at Pinterest
 
Woo: Writing a fast web server @ ELS2015
Woo: Writing a fast web server @ ELS2015Woo: Writing a fast web server @ ELS2015
Woo: Writing a fast web server @ ELS2015
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 
HBaseCon2015-final
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-final
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
 
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
 
DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술
DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술
DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술
 

More from Cask Data

Introducing a horizontally scalable, inference-based business Rules Engine fo...
Introducing a horizontally scalable, inference-based business Rules Engine fo...Introducing a horizontally scalable, inference-based business Rules Engine fo...
Introducing a horizontally scalable, inference-based business Rules Engine fo...
Cask Data
 

More from Cask Data (11)

Introducing a horizontally scalable, inference-based business Rules Engine fo...
Introducing a horizontally scalable, inference-based business Rules Engine fo...Introducing a horizontally scalable, inference-based business Rules Engine fo...
Introducing a horizontally scalable, inference-based business Rules Engine fo...
 
About CDAP
About CDAPAbout CDAP
About CDAP
 
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask #BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
 
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to..."Who Moved my Data? - Why tracking changes and sources of data is critical to...
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
 
Building Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache TwillBuilding Enterprise Grade Applications in Yarn with Apache Twill
Building Enterprise Grade Applications in Yarn with Apache Twill
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3 Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
 
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
 
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bagBrown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
 
HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Transaction in HBase, by Andreas Neumann, Cask

  • 1. Transactions in HBase Andreas Neumann ApacheCon Big Data May 2017 anew at apache.org @caskoid
  • 2. Goals of this Talk - Why transactions? - Optimistic Concurrency Control - Three Apache projects: Omid, Tephra, Trafodion - How are they different? 2
  • 3. Transactions in noSQL? History • SQL: RDBMS, EDW, … • noSQL: MapReduce, HDFS, HBase, … • n(ot)o(nly)SQL: Hive, Phoenix, … Motivation: • Data consistency under highly concurrent loads • Partial outputs after failure • Consistent view of data for long-running jobs • (Near) real-time processing 3
  • 6. Transactions to the Rescue 6 HBase Table ...Queue ... ... Flowlet - Atomicity of all writes involved - Protection from concurrent update
  • 7. ACID Properties From good old SQL: • Atomic - Entire transaction is committed as one • Consistent - No partial state change due to failure • Isolated - No dirty reads, transaction is only visible after commit • Durable - Once committed, data is persisted reliably 7
  • 8. What is HBase? 8 … Client Region Server Region Region… Coprocessor Region Server Region Region… Coprocessor
  • 9. What is HBase? 9 Simplified: • Distributed Key-Value Store • Key = <row>.<family>.<column>.<timestamp> • Partitioned into Regions (= continuous range of rows) • Each Region Server hosts multiple regions • Optional: Coprocessor in Region Server • Durable writes
  • 10. ACID Properties in HBase • Atomic • At cell, row, and region level • Not across regions, tables or multiple calls • Consistent - No built-in rollback mechanism • Isolated - Timestamp filters provide some level of isolation • Durable - Once committed, data is persisted reliably How to implement full ACID? 10
  • 11. Implementing Transactions • Traditional approach (RDBMS): locking • May produce deadlocks • Causes idle wait • complex and expensive in a distributed env • Optimistic Concurrency Control • lockless: allow concurrent writes to go forward • on commit, detect conflicts with other transactions • on conflict, roll back all changes and retry • Snapshot Isolation • Similar to repeatable read • Take snapshot of all data at transaction start • Read isolation 11
  • 12. Optimistic Concurrency Control 12 time x=10client1: start fail/rollback client2: start read x commit must see the old value of x
  • 13. Optimistic Concurrency Control 13 time incr xclient1: start commit client2: start incr x commit x=10 rollback x=11 sees the old 
 value of x=10
  • 14. Conflicting Transactions 14 time tx:A tx:B tx:C (A fails) tx:D (A fails) tx:E (E fails) tx:F (F fails) tx:G
  • 15. Conflicting Transactions • Two transactions have a conflict if • they write to the same cell • they overlap in time
 • If two transactions conflict, the one that commits later rolls back • Active change set = set of transactions t such that: • t is committed, and • there is at least one in-flight tx t’ that started before t’s commit time
 • This change set is needed in order to perform conflict detection. 15
  • 16. HBase Transactions in Apache 16 Apache Omid (incubating) (incubating) (incubating)
  • 17. In Common • Optimistic Concurrency Control must: • maintain Transaction State: • what tx are in flight and committed? • what is the change set of each tx? (for conflict detection, rollback) • what transactions are invalid (failed to roll back due to crash etc.) • generate unique transaction IDs • coordinate the life cycle of a transaction • start, detect conflicts, commit, rollback • All of { Omid, Tephra, Trafodion } implement this • but vary in how they do it 17
  • 18. Apache Tephra • Based on the original Omid paper: Daniel Gómez Ferro, Flavio Junqueira, Ivan Kelly, Benjamin Reed, Maysam Yabandeh:
 Omid: Lock-free transactional support for distributed data stores. ICDE 2014.
 • Transaction Manager: • Issues unique, monotonic transaction IDs • Maintains the set of excluded (in-flight and invalid) transactions • Maintains change sets for active transactions • Performs conflict detection • Client: • Uses transaction ID as timestamp for writes • Filters excluded transactions for isolation • Performs rollback 18
  • 19. Transaction Lifecycle 19 in progress start new tx write to HBase aborting conflicts invalid failure roll back in HBase ok time out detect conflicts ok complete make visible • Transaction consists of: • transaction ID (unique timestamp) • exclude list (in-flight and invalid tx)
 • Transactions that do complete • must still participate in conflict detection • disappear from transaction state
 when they do not overlap with in-flight tx
 • Transactions that do not complete • time out (by transaction manager) • added to invalid list
  • 20. Apache Tephra 20 Tx
 ManagerClient A HBase Region Server x:10 37 write 
 x=11 x:11 42 Region Server write: 
 y=17 y:17 42 in-flight: … start()
 id: 42, excludes = {…} ,42
  • 21. HBase Apache Tephra 21 Tx
 Manager read x Client B x:10 Region Server x:10 37 x:11 42 Region Server y:17 42 in-flight: …,42 start()
 id: 48, excludes = {…,42} ,48
  • 22. Region Server HBase Region Server x:10 37 y:17 42 Apache Tephra 22 Tx
 ManagerClient A x:11 42 roll back commit()
 conflict x:10 37 in-flight: …,42 in-flight: … make
 visible
  • 23. HBase Apache Tephra 23 Region Server x:10 37 x:11 42 Region Server y:17 42 read x x:11 Tx
 ManagerClient A in-flight: …,42 commit()
 success in-flight: …Client C start()
 id: 52, excludes: {…} in-flight: …,52
  • 24. Apache Tephra 24 … Client Region Server Region Region… Coprocessor Region Server Region Region… Coprocessor HBase Tx
 Manager Tx id generation Tx lifecycle
 rollback Tx state lifecycle
 transitions data
 operations
  • 25. Apache Tephra • HBase coprocessors • For efficient visibility filtering (on region-server side) • For eliminating invalid cells on flush and compaction • Programming Abstraction • TransactionalHTable: • Implements HTable interface • Existing code is easy to port • TransactionContext: • Implements transaction lifecycle 25
  • 26. Apache Tephra - Example txTable = new TransactionAwareHTable(table);
 txContext = new TransactionContext(txClient, txTable);
 txContext.start(); try {
 // perform Hbase operations in txTable txTable.put(…); ... } catch (Exception e) { // throws TransactionFailureException(e)
 txContext.abort(e); } // throws TransactionConflictException if so
 txContext.finish(); 26
  • 27. Apache Tephra - Strengths • Compatible with existing, non-tx data in HBase • Programming model • Same API as HTable, keep existing client code • Conflict detection granularity • Row, Column, Off • Special “long-running tx” for MapReduce and similar jobs • HA and Fault Tolerance • Checkpoints and WAL for transaction state, Standby Tx Manager • Replication compatible • Checkpoint to HBase, use HBase replication • Secure, Multi-tenant 27
  • 28. Apache Tephra - Not-So Strengths • Exclude list can grow large over time • RPC, post-filtering overhead • Solution: Invalid tx pruning on compaction - complex! • Single Transaction Manager • performs all lifecycle state transitions, including conflict detection • conflict detection requires lock on the transaction state • becomes a bottleneck • Solution: distributed Transaction Manager with consensus protocol 28
  • 29. Apache Trafodion • A complete distributed database (RDBMS) • transaction system is not available by itself • APIs: jdbc, SQL • Inspired by original HBase TRX (transactional region server • migrated transaction logic into coprocessors • coprocessors cache in-flight data in-memory • transaction state (change sets) in coprocessors • conflict detection with 2-phase commit • Transaction Manager • orchestrates transaction lifecycle across involved region servers • multiple instances, but one per client 29 (incubating)
  • 31. Apache Trafodion 31 Tx
 ManagerClient A HBase Region Server x:10 Region Server in-flight: … start()
 id:42 ,42 write: 
 y=17 y:17 write 
 x=11 x:11 region:
 … ,42
  • 32. Apache Trafodion 32 Tx
 Manager read x Client B x:10 in-flight: …,42 start()
 id: 48 ,48 HBase Region Server x:10 Region Server x:11 y:17
  • 33. HBase Apache Trafodion 33 Tx
 ManagerClient A 1. conflicts? commit()
 in-flight: …,42 in-flight: … Region Server x:10 Region Server x:11 y:17 2. roll back
  • 34. HBase Apache Trafodion 34 Tx
 ManagerClient A 1. conflicts? commit()
 in-flight: …,42 in-flight: … Region Server x:10 Region Server x:11 y:17 2. commit! x:11 y:17
  • 35. HBase Apache Trafodion 35 Client Region Server Region Region… Coprocessor Region Server Region Region… Coprocessor Tx
 Manager Tx id generation conflicts Tx state Tx life cycle (commit) transitions
 region ids 2-phase
 commit data
 operations Tx lifecycle In-flight data Client 2 Tx 2 Manager
  • 36. Apache Trafodion • Scales well: • Conflict detection is distributed: no single bottleneck • Commit coordination by multiple transaction managers • Optimization: bypass 2-hase commit if single region • Coprocessors cache in-flight data in Memory • Flushed to HBase only on commit • Committed read (not snapshot, not repeatable read) • Option: cause conflicts for reads, too • HA and Fault Tolerance • WAL for all state • All services are redundant and take over for each other • Replication: Only in paid (non-Apache) add-on 36
  • 37. Apache Trafodion - Strengths • Very good scalability • Scales almost linearly • Especially for very small transactions • Familiar SQL/jdbc interface for RDB programmers • Redundant and fault-tolerant • Secure and multi-tenant: • Trafodion/SQL layer provides authn+authz 37
  • 38. Apache Trafodion - Not-So Strengths • Monolithic, not available as standalone transaction system • Heavy load on coprocessors • memory and compute • Large transactions (e.g., MapReduce) will cause Out-of-memory • no special support for long-running transactions 38
  • 39. Apache Omid • Evolution of Omid based on the Google Percolator paper: Daniel Peng, Frank Dabek: Large-scale Incremental Processing Using Distributed Transactions and Notifications, USENIX 2010.
 • Idea: Move as much transaction state as possible into HBase • Shadow cells represent the state of a transaction • One shadow cell for every data cell written • Track committed transactions in an HBase table • Transaction Manager (TSO) has only 3 tasks • issue transaction IDs • conflict detection • write to commit table 39
  • 41. Apache Omid 41 Tx
 Manager Client A start()
 id: 42 HBase Region Server x:10 37: commit.40 write 
 x=11 x:11 42: in-flight Region Server Commits
 37: 40 write: 
 y=17 y:17 42: in-flight
  • 42. HBase Apache Omid 42 Tx
 Managerstart()
 id: 48 read x Client B x:10 Region Server x:10 37: commit.40 x:11 42: in-flight Region Server y:17 Commits
 37: 4042: in-flight
  • 43. Region Server HBase Region Server x:10 37: commit.40 y:17 42: in-flight Apache Omid 43 Tx
 Manager Client A Commits
 37: 40 x:11 42: in-flight roll back commit()
 conflict x:10 37: commit.40
  • 44. HBase Apache Omid 44 Region Server Tx
 Manager Client A Client C start()
 id: 52 x:10 37: commit.40 x:11 42: in-flight Region Server y:17 Commits
 37: 4042: in-flight mark as
 committed 42: commit.50 42: commit.50 read x x:11 commit()
 success:50 42: 50
  • 45. Apache Omid - Future • Atomic commit with linking? • Eliminate need for commit table 45 HBase Region Server x:10 37: commit.40 x:11 42: in-flight Region Server Commits
 37: 40y:17
  • 46. HBase Apache Omid 46 Client Region Server Region Region… Coprocessor Region Server Region Region… Coprocessor Tx
 Manager Tx id generation Conflict detection start
 commit data
 operations
 + shadow cells Tx state Tx lifecycle
 rollback commit commit
 table
  • 47. Apache Omid - Strengths • Transaction state is in the database • Shadow cells plus commit table • Scales with the size of the cluster • Transaction Manager is lightweight • Generation of tx IDs delegated to timestamp oracle • Conflict detection • Writing to commit table • Fault Tolerance: • After failure, fail all existing transactions attempting to commit • Self-correcting: Read clients can delete invalid cells 47
  • 48. Apache Omid - Not So Strengths • Storage intensive - shadow cells double the space • I/O intensive - every cell requires two writes 1. write data and shadow cell 2. record commit in shadow cell • Reads may also require two reads from HBase (commit table) • Producer/Consumer: will often find the (uncommitted) shadow cell • Scans: high througput sequential read disrupted by frequent lookups • Security/Multi-tenancy: • All clients need access to commit table • Read clients need write access to repair invalid data • Replication: Not implemented 48
  • 49. Summary 49 Apache Tephra Apache Trafodion Apache Omid Tx State Tx Manager Distributed to 
 region servers Tx Manager (changes) HBase (shadows/commits) Conflict detection Tx Manager Distributed to regions, 2- phase commit Tx Manager ID generation Tx Manager Distributed to multiple Tx Managers Tx Manager API HTable SQL Custom Multi-tenant Yes Yes No Strength Scans, Large Tx, API
 Scalable, full SQL Scale, throughput So so Scale, Throughput API not Hbase, Large Tx Scans, Producer/Consumer
  • 50. Links Join the community: 50 Apache Omid (incubating)
 http://omid.apache.org/ (incubating)
 http://trafodion.apache.org/ (incubating)
 http://tephra.apache.org/
  • 51. Thank you … for listening to my talk. Credits: - Sean Broeder, Narendra Goyal (Trafodion) - Francisco Perez-Sorrosal (Omid) 51 Questions?