SlideShare a Scribd company logo
1 of 33
Omid: A Transactional Framework for HBase
Francisco Perez-Sorrosal
Ohad Shacham
Hadoop Summit SJ
June 29th, 2016
Outline
 Background
 Basic Concepts
 Use cases
 Architecture
 Transaction Management
 High Availability
 Performance
 Summary
Hadoop Summit SJ (June 29th 2016)2
 New Big data apps → new requirements:
● Low-latency
● Incremental data processing
● e.g. Percolator
 Multiple clients updating same data concurrently
● Problem: Conflicts/Inconsistencies may arise
● Solution: Transactional Access to Data
Background
Hadoop Summit SJ (June 29th 2016)3
 Transaction → Abstract UoW to manage data with certain
guarantees
● ACID
● Relational databases
 Big data → NoSQL datastores → Transactions in NoSQL
● Relaxed Guarantees:
○ e.g. Atomicity, Consistency
Background
● Hard to Scale
○ Data partition
○ Data replication
Hadoop Summit SJ (June 29th 2016)4
5
 Flexible
 Reliable
 High Performant
 Scalable
…OLTP framework that allows BigData apps to
execute ACID transactions on top of HBase
+ =
Consistency in
BigData Apps
Omid is a…
Hadoop Summit SJ (June 29th 2016)
Why use Omid?
 Simplifies development of apps requiring consistency
● Multi-row/multi-table transactions on HBase
● Simple & well-known interface
 Good performance & reliability
 Lock-free
 Snapshot Isolation
 HBase is a blackbox
● No HBase code modification
● No changes on table schemas
 Used successfully at Yahoo
Hadoop Summit SJ (June 29th 2016)6
Snapshot Isolation
▪ Transaction T2 overlaps in time with T1 & T3, but spatially:
● T1 ∩ T2 = ∅
● T2 ∩ T3 = { R4 } Transactions T2 and T3 conflict
▪ Transaction T4 does not have conflicts
TxId
T1
T2
T3
T4
Time Overlap Spatial Overlap (WriteSet)
R1 R2 R3 R4
R3 R4
R2 R4
R1 R3
Hadoop Summit SJ (June 29th 2016)7
Sieve
Use Cases: Sieve @ Yahoo
HBase
Internet
Crawler Doc Proc Aggregation
Omid
Feeder
Real-Time
Index
Notifications
Transactional Data Flow
Hadoop Summit SJ (June 29th 2016)8
Hive Metastore Thrift Server
Use Cases:
HBase
HBaseStore
Omid
Hadoop Summit SJ (June 29th 2016)
ObjectStore
Relational
Database
9
Transactional App
Architectural Components
HBase
Omid Client
Transaction Status Oracle
(TSO)
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Keep track &
Validate TXs
Commit Table
Compactor
Commit data
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
Cells
Hadoop Summit SJ (June 29th 2016)10
Client APIs
▪ Transaction Manager → Create Transactional contexts
Transaction begin();
void commit(Transaction tx);
void rollback(Transaction tx);
▪ Transactional Tables (TTable) → Data access
Result get(Transaction tx, Get g);
void put(Transaction tx, Put p);
ResultScanner getScanner(Transaction tx, Scan s);
Hadoop Summit SJ (June 29th 2016)11
TX Management (Begin TX phase)
Omid Client TSO TO Table/SC CommitTable
Begin TX Get ST
ST=1
TX(ST=1)
R/W Ops for TX (ST=1)
App
Begin TX
R/W Ops (within TX context)
TX Context
R/W Results for TX with ST=1
Read Ops:
Get right results
for TX’s SnapshotWrite Ops:
Build Writeset
for TX
Hadoop Summit SJ (June 29th 2016)12
TX Management (Commit TX Phase)
Omid Client TSO TO Table/SC CommitTable
Commit TX (Writeset)
Get CT
CT=2
TX(CT=2)
App
Commit TX
Check Conflicts
of TX Writeset
in Conflict Map
Persist commit details (ST/CT) for TX
Hadoop Summit SJ (June 29th 2016)13
TX Management (Complete TX Phase)
Omid Client TSO TO Table/SC CommitTable
Update SC for TX (ST=1/CT=2)
App
Complete commit (Cleanup entry for TX with ST=1)
Result
Hadoop Summit SJ (June 29th 2016)14
Transactional App
High Availability
HBase
Omid Client
Transaction Status Oracle
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Commit Table
Compactor
Commit data
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
Cells
Single
point of
failure
Hadoop Summit SJ (June 29th 2016)15
Timestamp
Oracle
Transaction Status Oracle
Transactional App
High Availability
HBase
Omid Client
Transaction Status Oracle
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Commit Table
Compactor
Commit data
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
CellsRecovery
State
Primary
/
Backup
Hadoop Summit SJ (June 29th 2016)16
High Availability – Failing Scenario
Omid Client TSO P TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=1
TX(ST=1)
TX 1
TO
Data Store Commit Table
Write(k1, v1) (ST=1)
TX 1 Write(k1, v1)
(k1, v1, 1)
Hadoop Summit SJ (June 29th 2016)17
High Availability – Failing Scenario
Omid Client TSO P TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Write(k2, v2) (ST=1)
Write(k2, v2)
(k1, v1, 1)
(k2, v2, 1)
Commit TX 1{k1, k2}
Commit TX 1
Get CT
CT=2
Persist commit details for TX 1
Hadoop Summit SJ (June 29th 2016)18
High Availability – Failing Scenario
Omid Client TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=3
TX(ST=3)
TX 3
TO
Data Store Commit Table
Read(k1) (ST=3)
TX 3 Read(k1)
(k1, v1, 1)
(k1, v1, 1)
(k2, v2, 1)Hadoop Summit SJ (June 29th 2016)19
High Availability – Failing Scenario
Omid Client TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Return TX 1 CT
(k1, v1, 1)
! exist
! exist
Read(k2) (ST=3)
(k2, v2, 1)
TX 3 Read(k2)
(k2, v2, 1)
CT = 2
Return TX 1 CT
v2
(1, 2)Hadoop Summit SJ (June 29th 2016)20
Timestamp
Oracle
Transaction Status Oracle
Transactional App
High Availability
HBase
Omid Client
Transaction Status Oracle
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Commit Table
Compactor
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
CellsRecovery
State
Hadoop Summit SJ (June 29th 2016)21
High Availability – Solution
Omid Client TSO P TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=1
TX(ST=1,E=1)
TX 1, 1
TO
Data Store Commit Table
Write(k1, v1) (ST=1)
TX 1 Write(k1, v1)
(k1, v1, 1)
Hadoop Summit SJ (June 29th 2016)22
High Availability – Solution
Omid Client TSO P TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Write(k2, v2) (ST=1)
Write(k2, v2)
(k1, v1, 1)
(k2, v2, 1)
Commit TX 1{k1, k2}
Commit TX 1
Get CT
CT=2
Persist commit details for TX 1
Hadoop Summit SJ (June 29th 2016)23
High Availability – Solution
Omid Client TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=3
TX(ST=3,E=3)
TX 3,3
TO
Data Store Commit Table
Read(k1) (ST=3)
TX 3 Read(k1)
(k1, v1, 1)
(k1, v1, 1)
(k2, v2, 1)Hadoop Summit SJ (June 29th 2016)24
High Availability – Solution
Omid Client TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Return TX1 CT
(k1, v1, 1)
! exist
(k2, v2, 1)
Invalid
Try invalidate
(1, -, invalid)
! exist
Read(k2) (ST=3)
(k2, v2, 1)
TX 3 Read(k2)
Hadoop Summit SJ (June 29th 2016)25
High Availability – Solution
Omid Client TSO B Table/SC CommitTableApp TO
Data Store Commit Table
Return TX 1 CT
(k1, v1, 1)
! exist
! exist
(k2, v2, 1)
(1, 2, invalid)Hadoop Summit SJ (June 29th 2016)26
High Availability
 No runtime overhead in mainstream execution
• Minor overhead after failover
 TSO uses regular writes
 Leases for leader election
• Lease status check before/after writing to Commit Table
Hadoop Summit SJ (June 29th 2016)27
Perf. Improvements: Read-Only Txs
Omid Client TSO/TO Table/SC
Begin TX
TX(ST=1)
Read Ops for TX (ST=1)
App
Begin TX
Read Ops (in TX context)
TX Context
Read Results in Snapshot
Commit TX
Writeset is ∅, so no need to contact TSO!!!Success
Hadoop Summit SJ (June 29th 2016)28
TSO
HBase
Perf. Improvements: Commit Table Writes
Omid
Client
HBase
TSO
Commit
Table
Commit
Data
Omid
Client
Commit
Data
Hadoop Summit SJ (June 29th 2016)29
HBase
TSO
Perf. Improvements: Commit Table Writes
Omid
Client
HBase
TSO
Commit
Table
Commit
Data
Omid
Client
Commit
Data
Hadoop Summit SJ (June 29th 2016)30
0
50
100
150
200
250
300
350
400
1 2 4 6
Tps*103
Commit Table: # Region servers
Omid Throughput with Improvements
Hadoop Summit SJ (June 29th 2016)31
Summary
 Transactions in NoSQL
• Use cases in incremental big data processing
• Snapshot Isolation: Scalable consistency model
 Omid
• Web-scale TPS for HBase
• Reliable and performant
• Battle-tested
• http://omid.incubator.apache.org/
Hadoop Summit SJ (June 29th 2016)32
Questions?
Hadoop Summit SJ (June 29th 2016)33
@ApacheOmid
Apache Omid Incubator Page

More Related Content

What's hot

Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineDatabricks
 
Using Spark for Timeseries Graph Analytics ved
Using Spark for Timeseries Graph Analytics vedUsing Spark for Timeseries Graph Analytics ved
Using Spark for Timeseries Graph Analytics vedVed Mulkalwar
 
Big size meteorological data processing and mobile displaying system using ...
Big size meteorological data processing and mobile displaying system using ...Big size meteorological data processing and mobile displaying system using ...
Big size meteorological data processing and mobile displaying system using ...BJ Jang
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)Ryan Blue
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...Databricks
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming JobsDatabricks
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Tathagata Das
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution Chen Wu
 
[Foss4 g2013]the architecture of mobile traffic map service final
[Foss4 g2013]the architecture of mobile traffic map service final[Foss4 g2013]the architecture of mobile traffic map service final
[Foss4 g2013]the architecture of mobile traffic map service finalBJ Jang
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPTathagata Das
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceBlock Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceVasia Kalavri
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 

What's hot (20)

Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time Machine
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Using Spark for Timeseries Graph Analytics ved
Using Spark for Timeseries Graph Analytics vedUsing Spark for Timeseries Graph Analytics ved
Using Spark for Timeseries Graph Analytics ved
 
Big size meteorological data processing and mobile displaying system using ...
Big size meteorological data processing and mobile displaying system using ...Big size meteorological data processing and mobile displaying system using ...
Big size meteorological data processing and mobile displaying system using ...
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
Advancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGISAdvancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGIS
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
Spark Streaming into context
Spark Streaming into contextSpark Streaming into context
Spark Streaming into context
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution
 
Reading HDF family of formats via NetCDF-Java / CDM
Reading HDF family of formats via NetCDF-Java / CDMReading HDF family of formats via NetCDF-Java / CDM
Reading HDF family of formats via NetCDF-Java / CDM
 
[Foss4 g2013]the architecture of mobile traffic map service final
[Foss4 g2013]the architecture of mobile traffic map service final[Foss4 g2013]the architecture of mobile traffic map service final
[Foss4 g2013]the architecture of mobile traffic map service final
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceBlock Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 

Viewers also liked

Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopOwen O'Malley
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Julian Hyde
 
Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Chester Chen
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...DataWorks Summit/Hadoop Summit
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Sumeet Singh
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionDataWorks Summit/Hadoop Summit
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance DataWorks Summit/Hadoop Summit
 
Cyber Summit 2016: Issues and Challenges Facing Municipalities In Securing Data
Cyber Summit 2016: Issues and Challenges Facing Municipalities In Securing DataCyber Summit 2016: Issues and Challenges Facing Municipalities In Securing Data
Cyber Summit 2016: Issues and Challenges Facing Municipalities In Securing DataCybera Inc.
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Sumeet Singh
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyDataWorks Summit
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache StormP. Taylor Goetz
 

Viewers also liked (20)

Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
 
Data Process Systems, connecting everything
Data Process Systems, connecting everythingData Process Systems, connecting everything
Data Process Systems, connecting everything
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Practical advice to build a data driven company
Practical advice to build a data driven companyPractical advice to build a data driven company
Practical advice to build a data driven company
 
Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016Real time machine learning visualization with spark -- Hadoop Summit 2016
Real time machine learning visualization with spark -- Hadoop Summit 2016
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Tracing your security telemetry with Apache Metron
Tracing your security telemetry with Apache MetronTracing your security telemetry with Apache Metron
Tracing your security telemetry with Apache Metron
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
 
Cyber Summit 2016: Issues and Challenges Facing Municipalities In Securing Data
Cyber Summit 2016: Issues and Challenges Facing Municipalities In Securing DataCyber Summit 2016: Issues and Challenges Facing Municipalities In Securing Data
Cyber Summit 2016: Issues and Challenges Facing Municipalities In Securing Data
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Kafka Security
Kafka SecurityKafka Security
Kafka Security
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 

Similar to Omid: A transactional Framework for HBase

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingGuozhang Wang
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB
 
Traitement temps réel chez Streamroot - Golang Paris Juin 2016
Traitement temps réel chez Streamroot - Golang Paris Juin 2016Traitement temps réel chez Streamroot - Golang Paris Juin 2016
Traitement temps réel chez Streamroot - Golang Paris Juin 2016Simon Caplette
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Ververica
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureGabriele Modena
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013alanfgates
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesC4Media
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
 

Similar to Omid: A transactional Framework for HBase (20)

Omid: A Transactional Framework for HBase
Omid: A Transactional Framework for HBaseOmid: A Transactional Framework for HBase
Omid: A Transactional Framework for HBase
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
ch02-mapreduce.pptx
ch02-mapreduce.pptxch02-mapreduce.pptx
ch02-mapreduce.pptx
 
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
 
Delta Architecture
Delta ArchitectureDelta Architecture
Delta Architecture
 
Traitement temps réel chez Streamroot - Golang Paris Juin 2016
Traitement temps réel chez Streamroot - Golang Paris Juin 2016Traitement temps réel chez Streamroot - Golang Paris Juin 2016
Traitement temps réel chez Streamroot - Golang Paris Juin 2016
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 

Recently uploaded

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

Omid: A transactional Framework for HBase

  • 1. Omid: A Transactional Framework for HBase Francisco Perez-Sorrosal Ohad Shacham Hadoop Summit SJ June 29th, 2016
  • 2. Outline  Background  Basic Concepts  Use cases  Architecture  Transaction Management  High Availability  Performance  Summary Hadoop Summit SJ (June 29th 2016)2
  • 3.  New Big data apps → new requirements: ● Low-latency ● Incremental data processing ● e.g. Percolator  Multiple clients updating same data concurrently ● Problem: Conflicts/Inconsistencies may arise ● Solution: Transactional Access to Data Background Hadoop Summit SJ (June 29th 2016)3
  • 4.  Transaction → Abstract UoW to manage data with certain guarantees ● ACID ● Relational databases  Big data → NoSQL datastores → Transactions in NoSQL ● Relaxed Guarantees: ○ e.g. Atomicity, Consistency Background ● Hard to Scale ○ Data partition ○ Data replication Hadoop Summit SJ (June 29th 2016)4
  • 5. 5  Flexible  Reliable  High Performant  Scalable …OLTP framework that allows BigData apps to execute ACID transactions on top of HBase + = Consistency in BigData Apps Omid is a… Hadoop Summit SJ (June 29th 2016)
  • 6. Why use Omid?  Simplifies development of apps requiring consistency ● Multi-row/multi-table transactions on HBase ● Simple & well-known interface  Good performance & reliability  Lock-free  Snapshot Isolation  HBase is a blackbox ● No HBase code modification ● No changes on table schemas  Used successfully at Yahoo Hadoop Summit SJ (June 29th 2016)6
  • 7. Snapshot Isolation ▪ Transaction T2 overlaps in time with T1 & T3, but spatially: ● T1 ∩ T2 = ∅ ● T2 ∩ T3 = { R4 } Transactions T2 and T3 conflict ▪ Transaction T4 does not have conflicts TxId T1 T2 T3 T4 Time Overlap Spatial Overlap (WriteSet) R1 R2 R3 R4 R3 R4 R2 R4 R1 R3 Hadoop Summit SJ (June 29th 2016)7
  • 8. Sieve Use Cases: Sieve @ Yahoo HBase Internet Crawler Doc Proc Aggregation Omid Feeder Real-Time Index Notifications Transactional Data Flow Hadoop Summit SJ (June 29th 2016)8
  • 9. Hive Metastore Thrift Server Use Cases: HBase HBaseStore Omid Hadoop Summit SJ (June 29th 2016) ObjectStore Relational Database 9
  • 10. Transactional App Architectural Components HBase Omid Client Transaction Status Oracle (TSO) Timestamp Oracle Get Start/Commit Timestamps Start/Commit TXs Keep track & Validate TXs Commit Table Compactor Commit data R/W data Guarantee SI App Table Shadow CellsApp TableApp Table Shadow Cells Hadoop Summit SJ (June 29th 2016)10
  • 11. Client APIs ▪ Transaction Manager → Create Transactional contexts Transaction begin(); void commit(Transaction tx); void rollback(Transaction tx); ▪ Transactional Tables (TTable) → Data access Result get(Transaction tx, Get g); void put(Transaction tx, Put p); ResultScanner getScanner(Transaction tx, Scan s); Hadoop Summit SJ (June 29th 2016)11
  • 12. TX Management (Begin TX phase) Omid Client TSO TO Table/SC CommitTable Begin TX Get ST ST=1 TX(ST=1) R/W Ops for TX (ST=1) App Begin TX R/W Ops (within TX context) TX Context R/W Results for TX with ST=1 Read Ops: Get right results for TX’s SnapshotWrite Ops: Build Writeset for TX Hadoop Summit SJ (June 29th 2016)12
  • 13. TX Management (Commit TX Phase) Omid Client TSO TO Table/SC CommitTable Commit TX (Writeset) Get CT CT=2 TX(CT=2) App Commit TX Check Conflicts of TX Writeset in Conflict Map Persist commit details (ST/CT) for TX Hadoop Summit SJ (June 29th 2016)13
  • 14. TX Management (Complete TX Phase) Omid Client TSO TO Table/SC CommitTable Update SC for TX (ST=1/CT=2) App Complete commit (Cleanup entry for TX with ST=1) Result Hadoop Summit SJ (June 29th 2016)14
  • 15. Transactional App High Availability HBase Omid Client Transaction Status Oracle Timestamp Oracle Get Start/Commit Timestamps Start/Commit TXs Commit Table Compactor Commit data R/W data Guarantee SI App Table Shadow CellsApp TableApp Table Shadow Cells Single point of failure Hadoop Summit SJ (June 29th 2016)15
  • 16. Timestamp Oracle Transaction Status Oracle Transactional App High Availability HBase Omid Client Transaction Status Oracle Timestamp Oracle Get Start/Commit Timestamps Start/Commit TXs Commit Table Compactor Commit data R/W data Guarantee SI App Table Shadow CellsApp TableApp Table Shadow CellsRecovery State Primary / Backup Hadoop Summit SJ (June 29th 2016)16
  • 17. High Availability – Failing Scenario Omid Client TSO P TSO B Table/SC CommitTableApp Begin TX Begin TX Get ST ST=1 TX(ST=1) TX 1 TO Data Store Commit Table Write(k1, v1) (ST=1) TX 1 Write(k1, v1) (k1, v1, 1) Hadoop Summit SJ (June 29th 2016)17
  • 18. High Availability – Failing Scenario Omid Client TSO P TSO B Table/SC CommitTableApp TO Data Store Commit Table Write(k2, v2) (ST=1) Write(k2, v2) (k1, v1, 1) (k2, v2, 1) Commit TX 1{k1, k2} Commit TX 1 Get CT CT=2 Persist commit details for TX 1 Hadoop Summit SJ (June 29th 2016)18
  • 19. High Availability – Failing Scenario Omid Client TSO B Table/SC CommitTableApp Begin TX Begin TX Get ST ST=3 TX(ST=3) TX 3 TO Data Store Commit Table Read(k1) (ST=3) TX 3 Read(k1) (k1, v1, 1) (k1, v1, 1) (k2, v2, 1)Hadoop Summit SJ (June 29th 2016)19
  • 20. High Availability – Failing Scenario Omid Client TSO B Table/SC CommitTableApp TO Data Store Commit Table Return TX 1 CT (k1, v1, 1) ! exist ! exist Read(k2) (ST=3) (k2, v2, 1) TX 3 Read(k2) (k2, v2, 1) CT = 2 Return TX 1 CT v2 (1, 2)Hadoop Summit SJ (June 29th 2016)20
  • 21. Timestamp Oracle Transaction Status Oracle Transactional App High Availability HBase Omid Client Transaction Status Oracle Timestamp Oracle Get Start/Commit Timestamps Start/Commit TXs Commit Table Compactor R/W data Guarantee SI App Table Shadow CellsApp TableApp Table Shadow CellsRecovery State Hadoop Summit SJ (June 29th 2016)21
  • 22. High Availability – Solution Omid Client TSO P TSO B Table/SC CommitTableApp Begin TX Begin TX Get ST ST=1 TX(ST=1,E=1) TX 1, 1 TO Data Store Commit Table Write(k1, v1) (ST=1) TX 1 Write(k1, v1) (k1, v1, 1) Hadoop Summit SJ (June 29th 2016)22
  • 23. High Availability – Solution Omid Client TSO P TSO B Table/SC CommitTableApp TO Data Store Commit Table Write(k2, v2) (ST=1) Write(k2, v2) (k1, v1, 1) (k2, v2, 1) Commit TX 1{k1, k2} Commit TX 1 Get CT CT=2 Persist commit details for TX 1 Hadoop Summit SJ (June 29th 2016)23
  • 24. High Availability – Solution Omid Client TSO B Table/SC CommitTableApp Begin TX Begin TX Get ST ST=3 TX(ST=3,E=3) TX 3,3 TO Data Store Commit Table Read(k1) (ST=3) TX 3 Read(k1) (k1, v1, 1) (k1, v1, 1) (k2, v2, 1)Hadoop Summit SJ (June 29th 2016)24
  • 25. High Availability – Solution Omid Client TSO B Table/SC CommitTableApp TO Data Store Commit Table Return TX1 CT (k1, v1, 1) ! exist (k2, v2, 1) Invalid Try invalidate (1, -, invalid) ! exist Read(k2) (ST=3) (k2, v2, 1) TX 3 Read(k2) Hadoop Summit SJ (June 29th 2016)25
  • 26. High Availability – Solution Omid Client TSO B Table/SC CommitTableApp TO Data Store Commit Table Return TX 1 CT (k1, v1, 1) ! exist ! exist (k2, v2, 1) (1, 2, invalid)Hadoop Summit SJ (June 29th 2016)26
  • 27. High Availability  No runtime overhead in mainstream execution • Minor overhead after failover  TSO uses regular writes  Leases for leader election • Lease status check before/after writing to Commit Table Hadoop Summit SJ (June 29th 2016)27
  • 28. Perf. Improvements: Read-Only Txs Omid Client TSO/TO Table/SC Begin TX TX(ST=1) Read Ops for TX (ST=1) App Begin TX Read Ops (in TX context) TX Context Read Results in Snapshot Commit TX Writeset is ∅, so no need to contact TSO!!!Success Hadoop Summit SJ (June 29th 2016)28
  • 29. TSO HBase Perf. Improvements: Commit Table Writes Omid Client HBase TSO Commit Table Commit Data Omid Client Commit Data Hadoop Summit SJ (June 29th 2016)29
  • 30. HBase TSO Perf. Improvements: Commit Table Writes Omid Client HBase TSO Commit Table Commit Data Omid Client Commit Data Hadoop Summit SJ (June 29th 2016)30
  • 31. 0 50 100 150 200 250 300 350 400 1 2 4 6 Tps*103 Commit Table: # Region servers Omid Throughput with Improvements Hadoop Summit SJ (June 29th 2016)31
  • 32. Summary  Transactions in NoSQL • Use cases in incremental big data processing • Snapshot Isolation: Scalable consistency model  Omid • Web-scale TPS for HBase • Reliable and performant • Battle-tested • http://omid.incubator.apache.org/ Hadoop Summit SJ (June 29th 2016)32
  • 33. Questions? Hadoop Summit SJ (June 29th 2016)33 @ApacheOmid Apache Omid Incubator Page