Omid: Scalable and Highly Available Transaction Processing for Phoenix

Omid: Scalable an d Highly Available
Transaction Processing for Phoenix

Ohad Shacham, Edward Bortnikov ⎪ PhoenixCon, Jun 13, 2017

Let’s Get Started …
2
Our Yahoo Journey with Transactions over HBase

Omid for Users: Semantics, API, Integration with Phoenix

Omid for Programmers: Architecture and Use Cases

Omid, Advanced: Scalability, HA, Low-Latency

Transaction Processing in NoSQL @Yahoo
3
Motivation: Data Pipelines (Search, Mail, etc.)

Stream Processing a Popular Pattern

Compute Tasks process Data Items that arrive in the Real Time

Intermediate Artifacts stored in NoSQL (KV-)Storage

Extensive Use of Hadoop Technologies (Storm, HBase)

Scale: Thousands of Hadoop Nodes

Content Indexing for Search
Crawl Docproc
Link
Analysis Stream
Crawl
schedule
Content
Queue
Links
STORM
HBase

Zooming in on Tasks
Document processing

Read page content from the store

Compute search index features

Update computed features

Link processing

Read outgoing links for a page

Update reference for all linked-to pages

begin
begin
commit
commit

Transaction Processing: ACID 101
6
Multiple data accesses in a single logical operation

Atomic

“All or nothing” – no partial effect observable

Consistent

The DB transitions from one valid state to another

Isolated

Appear to execute in isolation

Durable

Committed data cannot disappear

Omid (‫)امید‬
7
2011

Incepted

@Yahoo Research

“Omid1”

2014

Large-Scale

Deployment

@Yahoo

2014/5

Major Re-Design

for Scalability & HA

“Omid2”

2016

Apache

Incubator

2017

Prototype

Integration

with Phoenix

Transaction Processing Service for Apache HBase

Contributors
8
Ohad Shacham

Yahoo Research

Francisco

Perez Sorrosal

Yahoo
Edward Bortnikov

Yahoo Research

Eshcar Hillel

Yahoo Research

Idit Keidar

Yahoo, Technion

Ivan Kelly

Midokura

Sameer Paranjpye

Databricks

Matthieu Morel

Skyscanner

Igor Katkov

Atlassian

Yonatan Gottesman

Yahoo Research

Omid 101
9
Client Library + Runtime Service

Database Agnostic (can work with other backends)

Snapshot Isolation consistency

Very Scalable (>380K peak tps) and Highly Available

Omid Programming Example
10
TransactionManager tm = HBaseTransactionManager.newInstance();

TTable txTable = new TTable("MY_TX_TABLE”);

Transaction tx = tm.begin(); // Control path

Put row1 = new Put(Bytes.toBytes("EXAMPLE_ROW1"));

row1.add(family, qualiﬁer, Bytes.toBytes("val1"));

txTable.put(tx, row1); // Data path

Put row2 = new Put(Bytes.toBytes("EXAMPLE_ROW2"));

row2.add(family, qualiﬁer, Bytes.toBytes("val2"));

txTable.put(tx, row2); // Data path

tm.commit(tx); // Control path

Snapshot Isolation (SI) Semantics
Distinct read (snapshot) and write (commit) points

No write-write conﬂicts allowed

Tephra: Sibling Technology
12
Transaction Processing technology for HBase

SI Semantics. Design Similar to Omid1

Apache Incubator since 2016

Integrated with Phoenix to provide ACID semantics (BETA)

Implements some Phoenix-speciﬁc scenarios

Phoenix-Omid Integration
13
Work in Progress under JIRA PHOENIX-3623

Backward Compatible – Conﬁgurable TP Provider Choice

Current Options: Tephra and Omid

How?

Internal Transaction Abstraction Layer (TAL) API

Multiple Implementations, Conﬁgurable Instantiation

Transaction Processing, Refactored
14
Transaction
Abstraction Layer

Tephra
Client

Omid

Client

Phoenix

Phoenix

Tephra
Client

Refactor

How Omid Works
Client

Begin/Commit

Data
Data
Data

Commit

Table

Persist

Commit

Verify commit
Read/Write

Conﬂict
Detection

15
Transaction
Manager
(TSO)

Lock-Free SI Implementation. Exploits Built-in MVCC.

Transacti
on
Manager

Client

Begin

Data
Data
Data

Commit

Table

t1

Write (k1, v1, t1)

Write (k2, v2, t1)

Read (k’, last committed t’ < t1)

(k1, v1, t1)
(k2, v2, t1)

Execution Example
tr = t1

Transaction
Manager

16

Client

Commit: t1, {k1, k2}

Data
Data
Data

Commit

Table

t2

(k1, v1, t1)
(k2, v2, t1)

Write (t1, t2)

(t1, t2)

Execution Example
tr = t1

tc = t2

17
Transaction
Manager

Client

Data
Data
Data

Commit

Table

Read (k1, t3)

(k1, v1, t1)
(k2, v2, t1)
(t1, t2)

Read (t1)

Execution Example
tr = t3

18
Bottleneck!

Transaction
Manager

Client

Data
Data
Data

Commit

Table

t2

(t1, t2)
(k1,v1,t1,t2)
(k2,v2,t1,t2)

Delete(t1)

Post-Commit Timestamp Replication
tr = t1

tc = t2

Update
commit
cells

19
Transaction
Manager

Data
Data
Data

Commit

Table

Read (k1, t3)

Using Commit Cells
Client

tr = t3

20
Transaction
Manager

(k1,v1,t1,t2)
(k2,v2,t1,t2)

Phoenix – New Scenarios for Omid
21
Secondary Indexes

On-the-Fly Index Creation

Atomic Updates

Query by Secondary Key

Extended Snapshot Isolation

Read-Your-Own-Writes Queries

On-the-Fly Secondary Index Creation
22
CREATE INDEX (CI) in parallel with writes to the base table

How? Distinguish between the pre-CI and post-CI data

CREATE INDEX command issue time deﬁnes a timestamp

1. All data committed before snapshot: scanned, bulk-inserted into index

2. All data generated after snapshot: triggers random update of index

3. All transactions in ﬂight at snapshot time: aborted (FENCE)

Secondary Index: Creation and Maintenance
23
T1

T2

T3

CREATE INDEX started

T4

CREATE INDEX complete

T5

T6

Bulk-
Inserted
into index
Abort

(enforced
upon
commit)

Added by a
coprocessor

Added by a
coprocessor

Index
update
(stored
procedure)

Extended Snapshot Isolation
24
CREATE TABLE T (ID INT);

BEGIN;

1: INSERT INTO T

SELECT ID+10 FROM T;

2: INSERT INTO T

SELECT ID+100 FROM T;

COMMIT;

Traditional SI: Read-Your-Writes

Challenge:

Circular Dependency

(Statement in Inﬁnite Loop)

Solution: Moving Snapshot

(series of checkpoint snapshots)

Moving Snapshot Implementation
25
Checkpoint for

Statement 1

Checkpoint for

Statement 2

Writes by

Statement 1

Timestamps allocated by TM in blocks.

Client promotes the checkpoint.

Omid Scalability
26
Extremely lean Client-Transaction Manager protocol

Omid1, Tephra replicate the entire state to client side upon BEGIN

Aggressive batching of writes to CT in Transaction Manager

Concurrent conﬂict detection (experimental)

HA algorithm incurs zero overhead in the mainstream

0

50

100

150

200

250

300

350

400

450

500

550

Omid1
Omid1 Non Durable
Omid
Omid Non Durable

Tps*103
Throughput Benchmark
YCSB workload driver

12-core Transaction Manager

1G network

0

500

1000

1500

2000

2500

document inversion
duplicate detection
out-link processing
in-link processing
stream to runtime

TaskLatency(ms)

Commit + CT update

Begin

Compute

Read

Update

Overhead in Production: Web Search Indexing

Low-Latency Omid (Experimental)
29
Original Design: Throughput-Oriented Applications in Mind

Sometimes, this comes at the expense of latency

Example: writes to Commit Table batched at the Transaction Manager

Key: Dissolve the Transaction Manager I/O Bottleneck

Distribute the Commit Table and the Writes to it

How?

The client, rather than the TM, persists the Commit Timestamp (CTS)

CTS embedded in the ﬁrst row written by the transaction

Benchmark: Single-Write Transaction Workload
0

10

20

30

40

50

60

70

80

0
50
100
150
200
250
300

Omid

Low latency

Throughput (tps * 103)

Latency(msec)

Summary
31
Scalable, Highly Available Open Source Transaction Processing

Battle-Tested, Ready for Public Cloud

Integration with Apache Phoenix Underway (GA in 2017)

Thanks to Our Partners for Being Awesome

32

Architecture, Recapped
Client

Begin/Commit

Data
Data
Data

Commit

Table

Persist

Commit

Verify commit
Read/Write

SPoF

34
Transaction
Manager
(TSO)

HA: Primary-Backup Transaction Manager
Client

Data
Data
Data

Commit

Table

35
Transaction
Manager
(TSO)
Transaction
Manager

Recovery
state (ZK)
Primary

Backup

Split Brain
Client

Commit

Table

36
Transaction
Manager
(TSO)
Transaction
Manager
Primary

Backup

Race
Conditions

Violate SI

Take I:

Fence CT upon

every write (slow!)

HA Algorithm – Key Ideas
37
Old and New Primaries may write conﬂicting commit records

No Locks!

Client detects inconsistencies, invalidates problematic records

Lease-Based Leader Election

Optimization: Local lease check before/after writing to CT

Zero Overhead in Non-Recovery Scenarios

Omid: Scalable and Highly Available Transaction Processing for Phoenix

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Omid: Scalable and Highly Available Transaction Processing for Phoenix

Similar to Omid: Scalable and Highly Available Transaction Processing for Phoenix (20)

Recently uploaded

Recently uploaded (20)

Omid: Scalable and Highly Available Transaction Processing for Phoenix