Your SlideShare is downloading. ×
HIGH-THROUGHPUT TRANSACTIONAL
STREAM PROCESSING
ON HBASE
Alex Baranau @abaranau
Andreas Neumann @anew68
Thursday, June 6, ...
Continuuity Proprietary and Confidential
WHO WE ARE
• We’ve built Continuuity Reactor: the world’s first scale-out
applicati...
Continuuity Proprietary and Confidential
AGENDA
• Transactions in stream processing: what? why?
• Omid-style transactions e...
Continuuity Proprietary and Confidential
THE REACTOR
• Continuuity Reactor is an app platform built on Hadoop and HBase
• C...
Continuuity Proprietary and Confidential
PROCESSING IN A FLOWLET
...Queue ...
...
Flowlet
Thursday, June 6, 13
Continuuity Proprietary and Confidential
PROCESSING IN A FLOWLET
...Queue ...
...
Flowlet
DataSet
... ...
Thursday, June 6,...
Continuuity Proprietary and Confidential
TRANSACTIONS: WHY?
...Queue ...
...
Flowlet
DataSet
... ...
Thursday, June 6, 13
Continuuity Proprietary and Confidential
PROCESSING WITH TX
...Queue ...
...
Flowlet
DataSet
Thursday, June 6, 13
Continuuity Proprietary and Confidential
TRANSACTIONS: WHAT?
• Atomic - Entire transaction is committed as one
• Consistent...
Continuuity Proprietary and Confidential
OMID-STYLE TRANSACTIONS
• Multi-Version Concurrency Control with Version = HBase T...
Continuuity Proprietary and Confidential
OMID-STYLE TRANSACTIONS
start tx
do work
has conflicts
commit tx
Tx Oracle
get
writ...
Continuuity Proprietary and Confidential
OPTIMISTIC CONCURRENCY CONTROL
• Optimistic Concurrency Control
• Avoids cost of l...
Continuuity Proprietary and Confidential
OMID-STYLE TRANSACTIONS
has conflicts
create tx
track tx ops
check conflicts
make tx...
Continuuity Proprietary and Confidential
TRANSACTION ORACLE
• Simple & Fast
• Single point of failure?
• Persist all state ...
Continuuity Proprietary and Confidential
QUEUES
• Flowlets pass data to each other on queues
• Every consumer (flowlet) can ...
Continuuity Proprietary and Confidential
Flowlet
QUEUES & FLOWLETS
...Queue ...
Instance2
Instance1
...
Thursday, June 6, 13
Continuuity Proprietary and Confidential
QUEUE DESIGN
• Queue entries are written in sequence
• Write pointer only goes for...
Continuuity Proprietary and Confidential
QUEUE OPERATION
WritePointer
...
enqueue
inc & get entry meta valid?
... ... ...
[...
Continuuity Proprietary and Confidential
PERFORMANCE CONSIDERATIONS
• Every entry costs at least 4 writes:
• Enqueue = Incr...
Continuuity Proprietary and Confidential
PERFORMANCE IMPROVEMENTS
• Prefetching n entries and caching them in state:
• Dequ...
Continuuity Proprietary and Confidential
PERFORMANCE NUMBERS
• enqueue: 10K ops/sec per producer per node
• dequeue: 5K ops...
Continuuity Proprietary and Confidential
HBASE WISHLIST
• Filters for Get, not just max timestamp (for transactional read)
...
Continuuity Proprietary and Confidential
QS?
Looking for the chance to work with a team that is defining a new category with...
Upcoming SlideShare
Loading in...5
×

HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase

1,524

Published on

Presented by: Andreas Neumann (Continuuity) and Alex Baranau (Continuuity)

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,524
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase "

  1. 1. HIGH-THROUGHPUT TRANSACTIONAL STREAM PROCESSING ON HBASE Alex Baranau @abaranau Andreas Neumann @anew68 Thursday, June 6, 13
  2. 2. Continuuity Proprietary and Confidential WHO WE ARE • We’ve built Continuuity Reactor: the world’s first scale-out application server for Hadoop • Fast, easy development, deployment and management of Hadoop and HBase apps • Continuuity team has years of experience in using and contributing to Open Source, and we intend to continue doing so. Thursday, June 6, 13
  3. 3. Continuuity Proprietary and Confidential AGENDA • Transactions in stream processing: what? why? • Omid-style transactions explained • Queues: heart of stream processing • What’s next? Thursday, June 6, 13
  4. 4. Continuuity Proprietary and Confidential THE REACTOR • Continuuity Reactor is an app platform built on Hadoop and HBase • Collect, Process, Store, and Query data. • A Flow is a real-time processor with exactly-once guarantee • A flow is composed of flowlets, connected via queues • All processing happens with ACID guarantees in transactions Thursday, June 6, 13
  5. 5. Continuuity Proprietary and Confidential PROCESSING IN A FLOWLET ...Queue ... ... Flowlet Thursday, June 6, 13
  6. 6. Continuuity Proprietary and Confidential PROCESSING IN A FLOWLET ...Queue ... ... Flowlet DataSet ... ... Thursday, June 6, 13
  7. 7. Continuuity Proprietary and Confidential TRANSACTIONS: WHY? ...Queue ... ... Flowlet DataSet ... ... Thursday, June 6, 13
  8. 8. Continuuity Proprietary and Confidential PROCESSING WITH TX ...Queue ... ... Flowlet DataSet Thursday, June 6, 13
  9. 9. Continuuity Proprietary and Confidential TRANSACTIONS: WHAT? • Atomic - Entire transaction is committed as one • Consistent - No partial state change due to failure • Isolated - No dirty reads, transaction is only visible after commit • Durable - Once committed, data is persisted reliably Thursday, June 6, 13
  10. 10. Continuuity Proprietary and Confidential OMID-STYLE TRANSACTIONS • Multi-Version Concurrency Control with Version = HBase Timestamp • All writes in the same transaction use the transaction ID as timestamp • Reads exclude other, uncommitted transactions (for isolation) • Optimistic Concurrency Control • Conflict detection at commit of transaction • Write Conflict: two overlapping transactions write the same row • Rollback of one transaction in case of conflict (whichever commits later) Thursday, June 6, 13
  11. 11. Continuuity Proprietary and Confidential OMID-STYLE TRANSACTIONS start tx do work has conflicts commit tx Tx Oracle get write pointer HBase write with version=pointer rollback abort tx Thursday, June 6, 13
  12. 12. Continuuity Proprietary and Confidential OPTIMISTIC CONCURRENCY CONTROL • Optimistic Concurrency Control • Avoids cost of locking rows and tables • No deadlocks or lock escalations • Cost of conflict detection and possible rollback is higher • Good if conflicts are rare: short transaction, disjoint partitioning of work Thursday, June 6, 13
  13. 13. Continuuity Proprietary and Confidential OMID-STYLE TRANSACTIONS has conflicts create tx track tx ops check conflicts make tx visible commit tx no conflicts start tx do work remove txabort tx get ops to rollback get new tx add ops to tx try commit Tx OracleTx Agent Thursday, June 6, 13
  14. 14. Continuuity Proprietary and Confidential TRANSACTION ORACLE • Simple & Fast • Single point of failure? • Persist all state to a write-ahead log • Secondary oracle that subscribes to log • Failover can happen quickly Thursday, June 6, 13
  15. 15. Continuuity Proprietary and Confidential QUEUES • Flowlets pass data to each other on queues • Every consumer (flowlet) can be partitioned • More than one consumer (flowlet) can read a queue • Queues are partitioned to scale throughput Thursday, June 6, 13
  16. 16. Continuuity Proprietary and Confidential Flowlet QUEUES & FLOWLETS ...Queue ... Instance2 Instance1 ... Thursday, June 6, 13
  17. 17. Continuuity Proprietary and Confidential QUEUE DESIGN • Queue entries are written in sequence • Write pointer only goes forward • Queue entries are read sequentially • Read pointer only goes forward • Reader waits for entry to be written • Transactions are used for isolation & consistency guarantees Thursday, June 6, 13
  18. 18. Continuuity Proprietary and Confidential QUEUE OPERATION WritePointer ... enqueue inc & get entry meta valid? ... ... ... [data] [data] trueenqueue Queue set falsecommit tx abort tx start tx write commit tx abort tx start tx ReadPointer Consumer State Claimed Entries List inc & get read ... Producer Consumer put dequeue Thursday, June 6, 13
  19. 19. Continuuity Proprietary and Confidential PERFORMANCE CONSIDERATIONS • Every entry costs at least 4 writes: • Enqueue = Increment + Put = 2 writes to WAL • Dequeue = 2 x Get + Put = 1 write to WAL • Ack = Get + Put = 1 write to WAL • Caching consumer state in-memory (still persisting every change) • Dequeue = Get + Put • Ack = Put Thursday, June 6, 13
  20. 20. Continuuity Proprietary and Confidential PERFORMANCE IMPROVEMENTS • Prefetching n entries and caching them in state: • Dequeue = 1/n x Put • Batch enqueues + dequeues • Enqueue = 2/n x Put • Dequeue = 1/n x Get + 1/n Put • Ack = 1/n x Put Thursday, June 6, 13
  21. 21. Continuuity Proprietary and Confidential PERFORMANCE NUMBERS • enqueue: 10K ops/sec per producer per node • dequeue: 5K ops/sec per consumer per node • 2 vs 1 RPC calls comparing to enqueue op Thursday, June 6, 13
  22. 22. Continuuity Proprietary and Confidential HBASE WISHLIST • Filters for Get, not just max timestamp (for transactional read) • Filters for Increment and CheckAndPut (for transactional writes) • Ability to aggregate writes to WAL in co-processors (for faster queues) • No-read atomic Append operation • No-read atomic Increment operation Thursday, June 6, 13
  23. 23. Continuuity Proprietary and Confidential QS? Looking for the chance to work with a team that is defining a new category within Big Data? We are hiring! careers@continuuity.com Thursday, June 6, 13

×