SlideShare a Scribd company logo
1 of 37
Download to read offline
Omid: Scalable an d Highly Available
Transaction Processing for Phoenix

Ohad Shacham, Edward Bortnikov ⎪ PhoenixCon, Jun 13, 2017
Let’s Get Started …
2
Our Yahoo Journey with Transactions over HBase



Omid for Users: Semantics, API, Integration with Phoenix



Omid for Programmers: Architecture and Use Cases



Omid, Advanced: Scalability, HA, Low-Latency
Transaction Processing in NoSQL @Yahoo
3
Motivation: Data Pipelines (Search, Mail, etc.)



Stream Processing a Popular Pattern

Compute Tasks process Data Items that arrive in the Real Time 

Intermediate Artifacts stored in NoSQL (KV-)Storage



Extensive Use of Hadoop Technologies (Storm, HBase)



Scale: Thousands of Hadoop Nodes
Content Indexing for Search
Crawl Docproc
Link
Analysis Stream
Crawl		
schedule	
Content	
Queue	
Links	
STORM
HBase
Zooming in on Tasks
Document processing


Read page content from the store 


Compute search index features


Update computed features

Link processing


Read outgoing links for a page


Update reference for all linked-to pages



begin
begin
commit
commit
Transaction Processing: ACID 101
6
Multiple data accesses in a single logical operation

Atomic 


“All or nothing” – no partial effect observable

Consistent


The DB transitions from one valid state to another

Isolated


Appear to execute in isolation 

Durable


Committed data cannot disappear
Omid (‫)امید‬
7
2011 

Incepted

@Yahoo Research

“Omid1”

2014

Large-Scale

Deployment

@Yahoo

2014/5

Major Re-Design

for Scalability & HA

“Omid2”

2016

Apache 

Incubator

2017

Prototype

Integration

with Phoenix

Transaction Processing Service for Apache HBase
Contributors
8
Ohad Shacham

Yahoo Research

Francisco 

Perez Sorrosal

Yahoo
Edward Bortnikov

Yahoo Research

Eshcar Hillel

Yahoo Research

Idit Keidar

Yahoo, Technion

Ivan Kelly

Midokura



Sameer Paranjpye 

Databricks

Matthieu Morel

Skyscanner 

Igor Katkov

Atlassian

Yonatan Gottesman

Yahoo Research
Omid 101
9
Client Library + Runtime Service



Database Agnostic (can work with other backends)



Snapshot Isolation consistency 



Very Scalable (>380K peak tps) and Highly Available
Omid Programming Example
10
TransactionManager tm = HBaseTransactionManager.newInstance();

TTable txTable = new TTable("MY_TX_TABLE”);



Transaction tx = tm.begin(); // Control path



Put row1 = new Put(Bytes.toBytes("EXAMPLE_ROW1"));

row1.add(family, qualifier, Bytes.toBytes("val1"));

txTable.put(tx, row1); // Data path



Put row2 = new Put(Bytes.toBytes("EXAMPLE_ROW2"));

row2.add(family, qualifier, Bytes.toBytes("val2")); 

txTable.put(tx, row2); // Data path



tm.commit(tx); // Control path
Snapshot Isolation (SI) Semantics
Distinct read (snapshot) and write (commit) points

No write-write conflicts allowed
Tephra: Sibling Technology
12
Transaction Processing technology for HBase



SI Semantics. Design Similar to Omid1 



Apache Incubator since 2016



Integrated with Phoenix to provide ACID semantics (BETA)

Implements some Phoenix-specific scenarios
Phoenix-Omid Integration
13
Work in Progress under JIRA PHOENIX-3623



Backward Compatible – Configurable TP Provider Choice

Current Options: Tephra and Omid



How?

Internal Transaction Abstraction Layer (TAL) API

Multiple Implementations, Configurable Instantiation
Transaction Processing, Refactored
14
Transaction
Abstraction Layer 

Tephra
Client

Omid

Client



Phoenix



Phoenix

Tephra
Client

Refactor
How Omid Works
Client

Begin/Commit

Data
 Data
 Data

Commit

	Table

Persist

Commit

Verify commit
Read/Write

Conflict
Detection

15
Transaction
Manager
(TSO)

Lock-Free SI Implementation. Exploits Built-in MVCC.
Transacti
on
Manager

Client

Begin

Data
 Data
 Data

Commit 

Table

t1

Write (k1, v1, t1)

Write (k2, v2, t1)

Read (k’, last committed t’ < t1)

(k1, v1, t1)
 (k2, v2, t1)

Execution Example
tr = t1

Transaction
Manager

16
Client

Commit: t1, {k1, k2} 

Data
 Data
 Data

Commit 

Table

t2

(k1, v1, t1)
 (k2, v2, t1)

Write (t1, t2)

(t1, t2)

Execution Example
tr = t1

tc = t2

17
Transaction
Manager
Client

Data
 Data
 Data

Commit 

Table

Read (k1, t3)

(k1, v1, t1)
 (k2, v2, t1)
 (t1, t2)

Read (t1)

Execution Example
tr = t3

18
Bottleneck!

Transaction
Manager
Client

Data
 Data
 Data

Commit 

Table

t2

(t1, t2)
(k1,v1,t1,t2)
 (k2,v2,t1,t2)

Delete(t1)

Post-Commit Timestamp Replication
tr = t1

tc = t2

Update
commit
cells

19
Transaction
Manager
Data
 Data
 Data

Commit 

Table

Read (k1, t3)

Using Commit Cells
Client

tr = t3

20
Transaction
Manager

(k1,v1,t1,t2)
 (k2,v2,t1,t2)
Phoenix – New Scenarios for Omid
21
Secondary Indexes

On-the-Fly Index Creation

Atomic Updates

Query by Secondary Key



Extended Snapshot Isolation 

Read-Your-Own-Writes Queries
On-the-Fly Secondary Index Creation
22
CREATE INDEX (CI) in parallel with writes to the base table



How? Distinguish between the pre-CI and post-CI data



CREATE INDEX command issue time defines a timestamp

1. All data committed before snapshot: scanned, bulk-inserted into index 

2. All data generated after snapshot: triggers random update of index

3. All transactions in flight at snapshot time: aborted (FENCE)
Secondary Index: Creation and Maintenance
23
T1

T2

T3

CREATE INDEX started

T4

CREATE INDEX complete

T5

T6



Bulk-
Inserted
into index
 Abort

(enforced
upon
commit)





Added by a
coprocessor



Added by a
coprocessor



Index
update
(stored
procedure)
Extended Snapshot Isolation
24
CREATE TABLE T (ID INT); 



BEGIN;



1: INSERT INTO T 


SELECT ID+10 FROM T;

2: INSERT INTO T 

SELECT ID+100 FROM T;



COMMIT;

Traditional SI: Read-Your-Writes



Challenge: 

Circular Dependency 

(Statement in Infinite Loop)



Solution: Moving Snapshot

(series of checkpoint snapshots)
Moving Snapshot Implementation
25
Checkpoint for

Statement 1

Checkpoint for

Statement 2

Writes by 

Statement 1

Timestamps allocated by TM in blocks.

Client promotes the checkpoint.
Omid Scalability
26
Extremely lean Client-Transaction Manager protocol

Omid1, Tephra replicate the entire state to client side upon BEGIN



Aggressive batching of writes to CT in Transaction Manager



Concurrent conflict detection (experimental)



HA algorithm incurs zero overhead in the mainstream
0

50

100

150

200

250

300

350

400

450

500

550

Omid1
 Omid1 Non Durable
 Omid
 Omid Non Durable

Tps*103
Throughput Benchmark
YCSB workload driver

12-core Transaction Manager 

1G network
0

500

1000

1500

2000

2500

document inversion
 duplicate detection
 out-link processing
 in-link processing
 stream to runtime

TaskLatency(ms)

Commit + CT update

Begin

Compute

Read

Update

Overhead in Production: Web Search Indexing
Low-Latency Omid (Experimental)
29
Original Design: Throughput-Oriented Applications in Mind

Sometimes, this comes at the expense of latency 

Example: writes to Commit Table batched at the Transaction Manager



Key: Dissolve the Transaction Manager I/O Bottleneck

Distribute the Commit Table and the Writes to it



How? 

The client, rather than the TM, persists the Commit Timestamp (CTS)

CTS embedded in the first row written by the transaction
Benchmark: Single-Write Transaction Workload
0

10

20

30

40

50

60

70

80

0
 50
 100
 150
 200
 250
 300

Omid

Low latency

Throughput (tps * 103)

Latency(msec)
Summary
31
Scalable, Highly Available Open Source Transaction Processing



Battle-Tested, Ready for Public Cloud



Integration with Apache Phoenix Underway (GA in 2017)
Thanks to Our Partners for Being Awesome

32
Backup

33
Architecture, Recapped
Client

Begin/Commit

Data
 Data
 Data

Commit

	Table

Persist

Commit

Verify commit
Read/Write

SPoF

34
Transaction
Manager
(TSO)
HA: Primary-Backup Transaction Manager
Client

Data
 Data
 Data

Commit

	Table

35
Transaction
Manager
(TSO)
Transaction
Manager

Recovery
state (ZK)
 Primary

Backup
Split Brain
Client

Commit

	Table

36
Transaction
Manager
(TSO)
Transaction
Manager
 Primary

Backup

Race
Conditions

Violate SI

Take I: 

Fence CT upon 

every write (slow!)
HA Algorithm – Key Ideas
37
Old and New Primaries may write conflicting commit records

No Locks!



Client detects inconsistencies, invalidates problematic records



Lease-Based Leader Election 

Optimization: Local lease check before/after writing to CT

Zero Overhead in Non-Recovery Scenarios

More Related Content

What's hot

Going Reactive with Spring 5
Going Reactive with Spring 5Going Reactive with Spring 5
Going Reactive with Spring 5Drazen Nikolic
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Timo Walther
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®confluent
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017confluent
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connectorconfluent
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Surviveconfluent
 
Stream Processing using Samza SQL
Stream Processing using Samza SQLStream Processing using Samza SQL
Stream Processing using Samza SQLSamarth Shetty
 
Building a Streaming Platform with Kafka
Building a Streaming Platform with KafkaBuilding a Streaming Platform with Kafka
Building a Streaming Platform with Kafkaconfluent
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology confluent
 
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data PipelinesETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelinesconfluent
 
Resilience from Theory to Practice
Resilience from Theory to PracticeResilience from Theory to Practice
Resilience from Theory to PracticeLivePerson
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams APIconfluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent
 
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...HostedbyConfluent
 
Build Event-Driven Microservices with Confluent Cloud Workshop #1
Build Event-Driven Microservices with Confluent Cloud Workshop #1Build Event-Driven Microservices with Confluent Cloud Workshop #1
Build Event-Driven Microservices with Confluent Cloud Workshop #1confluent
 
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...confluent
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafkaconfluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 

What's hot (20)

Going Reactive with Spring 5
Going Reactive with Spring 5Going Reactive with Spring 5
Going Reactive with Spring 5
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connector
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
 
Stream Processing using Samza SQL
Stream Processing using Samza SQLStream Processing using Samza SQL
Stream Processing using Samza SQL
 
Reactive Spring 5
Reactive Spring 5Reactive Spring 5
Reactive Spring 5
 
Building a Streaming Platform with Kafka
Building a Streaming Platform with KafkaBuilding a Streaming Platform with Kafka
Building a Streaming Platform with Kafka
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
 
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data PipelinesETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
 
Resilience from Theory to Practice
Resilience from Theory to PracticeResilience from Theory to Practice
Resilience from Theory to Practice
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
 
Build Event-Driven Microservices with Confluent Cloud Workshop #1
Build Event-Driven Microservices with Confluent Cloud Workshop #1Build Event-Driven Microservices with Confluent Cloud Workshop #1
Build Event-Driven Microservices with Confluent Cloud Workshop #1
 
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 

Similar to Omid: Scalable and Highly Available Transaction Processing for Phoenix

Data analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishData analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishCodeValue
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixDataWorks Summit
 
High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017Rick Hightower
 
Scalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaScalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaPrateek Maheshwari
 
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark StreamingSpark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark StreamingSantosh Sahoo
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
 
Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixDataWorks Summit
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Robert Metzger
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
High-Speed Reactive Microservices - trials and tribulations
High-Speed Reactive Microservices - trials and tribulationsHigh-Speed Reactive Microservices - trials and tribulations
High-Speed Reactive Microservices - trials and tribulationsRick Hightower
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basFlorent Ramiere
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
Transforming the Database: Critical Innovations for Performance at Scale
Transforming the Database: Critical Innovations for Performance at ScaleTransforming the Database: Critical Innovations for Performance at Scale
Transforming the Database: Critical Innovations for Performance at ScaleScyllaDB
 
Red Hat Enterprise Linux: The web performance leader
Red Hat Enterprise Linux: The web performance leaderRed Hat Enterprise Linux: The web performance leader
Red Hat Enterprise Linux: The web performance leaderJoanne El Chah
 

Similar to Omid: Scalable and Highly Available Transaction Processing for Phoenix (20)

Data analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishData analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publish
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache Phoenix
 
High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017
 
Scalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaScalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache Samza
 
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark StreamingSpark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache Phoenix
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data Platform
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
High-Speed Reactive Microservices - trials and tribulations
High-Speed Reactive Microservices - trials and tribulationsHigh-Speed Reactive Microservices - trials and tribulations
High-Speed Reactive Microservices - trials and tribulations
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
Velocity 2010 - ATS
Velocity 2010 - ATSVelocity 2010 - ATS
Velocity 2010 - ATS
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Transforming the Database: Critical Innovations for Performance at Scale
Transforming the Database: Critical Innovations for Performance at ScaleTransforming the Database: Critical Innovations for Performance at Scale
Transforming the Database: Critical Innovations for Performance at Scale
 
Red Hat Enterprise Linux: The web performance leader
Red Hat Enterprise Linux: The web performance leaderRed Hat Enterprise Linux: The web performance leader
Red Hat Enterprise Linux: The web performance leader
 

Recently uploaded

SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationShrmpro
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 

Recently uploaded (20)

SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 

Omid: Scalable and Highly Available Transaction Processing for Phoenix

  • 1. Omid: Scalable an d Highly Available Transaction Processing for Phoenix Ohad Shacham, Edward Bortnikov ⎪ PhoenixCon, Jun 13, 2017
  • 2. Let’s Get Started … 2 Our Yahoo Journey with Transactions over HBase Omid for Users: Semantics, API, Integration with Phoenix Omid for Programmers: Architecture and Use Cases Omid, Advanced: Scalability, HA, Low-Latency
  • 3. Transaction Processing in NoSQL @Yahoo 3 Motivation: Data Pipelines (Search, Mail, etc.) Stream Processing a Popular Pattern Compute Tasks process Data Items that arrive in the Real Time Intermediate Artifacts stored in NoSQL (KV-)Storage Extensive Use of Hadoop Technologies (Storm, HBase) Scale: Thousands of Hadoop Nodes
  • 4. Content Indexing for Search Crawl Docproc Link Analysis Stream Crawl schedule Content Queue Links STORM HBase
  • 5. Zooming in on Tasks Document processing Read page content from the store Compute search index features Update computed features Link processing Read outgoing links for a page Update reference for all linked-to pages begin begin commit commit
  • 6. Transaction Processing: ACID 101 6 Multiple data accesses in a single logical operation Atomic “All or nothing” – no partial effect observable Consistent The DB transitions from one valid state to another Isolated Appear to execute in isolation Durable Committed data cannot disappear
  • 7. Omid (‫)امید‬ 7 2011 Incepted @Yahoo Research “Omid1” 2014 Large-Scale Deployment @Yahoo 2014/5 Major Re-Design for Scalability & HA “Omid2” 2016 Apache Incubator 2017 Prototype Integration with Phoenix Transaction Processing Service for Apache HBase
  • 8. Contributors 8 Ohad Shacham Yahoo Research Francisco Perez Sorrosal Yahoo Edward Bortnikov Yahoo Research Eshcar Hillel Yahoo Research Idit Keidar Yahoo, Technion Ivan Kelly Midokura Sameer Paranjpye Databricks Matthieu Morel Skyscanner Igor Katkov Atlassian Yonatan Gottesman Yahoo Research
  • 9. Omid 101 9 Client Library + Runtime Service Database Agnostic (can work with other backends) Snapshot Isolation consistency Very Scalable (>380K peak tps) and Highly Available
  • 10. Omid Programming Example 10 TransactionManager tm = HBaseTransactionManager.newInstance(); TTable txTable = new TTable("MY_TX_TABLE”); Transaction tx = tm.begin(); // Control path Put row1 = new Put(Bytes.toBytes("EXAMPLE_ROW1")); row1.add(family, qualifier, Bytes.toBytes("val1")); txTable.put(tx, row1); // Data path Put row2 = new Put(Bytes.toBytes("EXAMPLE_ROW2")); row2.add(family, qualifier, Bytes.toBytes("val2")); txTable.put(tx, row2); // Data path tm.commit(tx); // Control path
  • 11. Snapshot Isolation (SI) Semantics Distinct read (snapshot) and write (commit) points No write-write conflicts allowed
  • 12. Tephra: Sibling Technology 12 Transaction Processing technology for HBase SI Semantics. Design Similar to Omid1 Apache Incubator since 2016 Integrated with Phoenix to provide ACID semantics (BETA) Implements some Phoenix-specific scenarios
  • 13. Phoenix-Omid Integration 13 Work in Progress under JIRA PHOENIX-3623 Backward Compatible – Configurable TP Provider Choice Current Options: Tephra and Omid How? Internal Transaction Abstraction Layer (TAL) API Multiple Implementations, Configurable Instantiation
  • 14. Transaction Processing, Refactored 14 Transaction Abstraction Layer Tephra Client Omid Client Phoenix Phoenix Tephra Client Refactor
  • 15. How Omid Works Client Begin/Commit Data Data Data Commit Table Persist Commit Verify commit Read/Write Conflict Detection 15 Transaction Manager (TSO) Lock-Free SI Implementation. Exploits Built-in MVCC.
  • 16. Transacti on Manager Client Begin Data Data Data Commit Table t1 Write (k1, v1, t1) Write (k2, v2, t1) Read (k’, last committed t’ < t1) (k1, v1, t1) (k2, v2, t1) Execution Example tr = t1 Transaction Manager 16
  • 17. Client Commit: t1, {k1, k2} Data Data Data Commit Table t2 (k1, v1, t1) (k2, v2, t1) Write (t1, t2) (t1, t2) Execution Example tr = t1 tc = t2 17 Transaction Manager
  • 18. Client Data Data Data Commit Table Read (k1, t3) (k1, v1, t1) (k2, v2, t1) (t1, t2) Read (t1) Execution Example tr = t3 18 Bottleneck! Transaction Manager
  • 19. Client Data Data Data Commit Table t2 (t1, t2) (k1,v1,t1,t2) (k2,v2,t1,t2) Delete(t1) Post-Commit Timestamp Replication tr = t1 tc = t2 Update commit cells 19 Transaction Manager
  • 20. Data Data Data Commit Table Read (k1, t3) Using Commit Cells Client tr = t3 20 Transaction Manager (k1,v1,t1,t2) (k2,v2,t1,t2)
  • 21. Phoenix – New Scenarios for Omid 21 Secondary Indexes On-the-Fly Index Creation Atomic Updates Query by Secondary Key Extended Snapshot Isolation Read-Your-Own-Writes Queries
  • 22. On-the-Fly Secondary Index Creation 22 CREATE INDEX (CI) in parallel with writes to the base table How? Distinguish between the pre-CI and post-CI data CREATE INDEX command issue time defines a timestamp 1. All data committed before snapshot: scanned, bulk-inserted into index 2. All data generated after snapshot: triggers random update of index 3. All transactions in flight at snapshot time: aborted (FENCE)
  • 23. Secondary Index: Creation and Maintenance 23 T1 T2 T3 CREATE INDEX started T4 CREATE INDEX complete T5 T6 Bulk- Inserted into index Abort (enforced upon commit) Added by a coprocessor Added by a coprocessor Index update (stored procedure)
  • 24. Extended Snapshot Isolation 24 CREATE TABLE T (ID INT); BEGIN; 1: INSERT INTO T SELECT ID+10 FROM T; 2: INSERT INTO T SELECT ID+100 FROM T; COMMIT; Traditional SI: Read-Your-Writes Challenge: Circular Dependency (Statement in Infinite Loop) Solution: Moving Snapshot (series of checkpoint snapshots)
  • 25. Moving Snapshot Implementation 25 Checkpoint for Statement 1 Checkpoint for Statement 2 Writes by Statement 1 Timestamps allocated by TM in blocks. Client promotes the checkpoint.
  • 26. Omid Scalability 26 Extremely lean Client-Transaction Manager protocol Omid1, Tephra replicate the entire state to client side upon BEGIN Aggressive batching of writes to CT in Transaction Manager Concurrent conflict detection (experimental) HA algorithm incurs zero overhead in the mainstream
  • 27. 0 50 100 150 200 250 300 350 400 450 500 550 Omid1 Omid1 Non Durable Omid Omid Non Durable Tps*103 Throughput Benchmark YCSB workload driver 12-core Transaction Manager 1G network
  • 28. 0 500 1000 1500 2000 2500 document inversion duplicate detection out-link processing in-link processing stream to runtime TaskLatency(ms) Commit + CT update Begin Compute Read Update Overhead in Production: Web Search Indexing
  • 29. Low-Latency Omid (Experimental) 29 Original Design: Throughput-Oriented Applications in Mind Sometimes, this comes at the expense of latency Example: writes to Commit Table batched at the Transaction Manager Key: Dissolve the Transaction Manager I/O Bottleneck Distribute the Commit Table and the Writes to it How? The client, rather than the TM, persists the Commit Timestamp (CTS) CTS embedded in the first row written by the transaction
  • 30. Benchmark: Single-Write Transaction Workload 0 10 20 30 40 50 60 70 80 0 50 100 150 200 250 300 Omid Low latency Throughput (tps * 103) Latency(msec)
  • 31. Summary 31 Scalable, Highly Available Open Source Transaction Processing Battle-Tested, Ready for Public Cloud Integration with Apache Phoenix Underway (GA in 2017)
  • 32. Thanks to Our Partners for Being Awesome 32
  • 34. Architecture, Recapped Client Begin/Commit Data Data Data Commit Table Persist Commit Verify commit Read/Write SPoF 34 Transaction Manager (TSO)
  • 35. HA: Primary-Backup Transaction Manager Client Data Data Data Commit Table 35 Transaction Manager (TSO) Transaction Manager Recovery state (ZK) Primary Backup
  • 37. HA Algorithm – Key Ideas 37 Old and New Primaries may write conflicting commit records No Locks! Client detects inconsistencies, invalidates problematic records Lease-Based Leader Election Optimization: Local lease check before/after writing to CT Zero Overhead in Non-Recovery Scenarios