• Save
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
 

HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase

on

  • 1,766 views

Presented by: Andreas Neumann (Continuuity) and Alex Baranau (Continuuity)

Presented by: Andreas Neumann (Continuuity) and Alex Baranau (Continuuity)

Statistics

Views

Total Views
1,766
Views on SlideShare
1,621
Embed Views
145

Actions

Likes
1
Downloads
3
Comments
0

1 Embed 145

http://www.scoop.it 145

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase Presentation Transcript

    • HIGH-THROUGHPUT TRANSACTIONAL STREAM PROCESSING ON HBASE Alex Baranau @abaranau Andreas Neumann @anew68 Thursday, June 6, 13
    • Continuuity Proprietary and Confidential WHO WE ARE • We’ve built Continuuity Reactor: the world’s first scale-out application server for Hadoop • Fast, easy development, deployment and management of Hadoop and HBase apps • Continuuity team has years of experience in using and contributing to Open Source, and we intend to continue doing so. Thursday, June 6, 13
    • Continuuity Proprietary and Confidential AGENDA • Transactions in stream processing: what? why? • Omid-style transactions explained • Queues: heart of stream processing • What’s next? Thursday, June 6, 13
    • Continuuity Proprietary and Confidential THE REACTOR • Continuuity Reactor is an app platform built on Hadoop and HBase • Collect, Process, Store, and Query data. • A Flow is a real-time processor with exactly-once guarantee • A flow is composed of flowlets, connected via queues • All processing happens with ACID guarantees in transactions Thursday, June 6, 13
    • Continuuity Proprietary and Confidential PROCESSING IN A FLOWLET ...Queue ... ... Flowlet Thursday, June 6, 13
    • Continuuity Proprietary and Confidential PROCESSING IN A FLOWLET ...Queue ... ... Flowlet DataSet ... ... Thursday, June 6, 13
    • Continuuity Proprietary and Confidential TRANSACTIONS: WHY? ...Queue ... ... Flowlet DataSet ... ... Thursday, June 6, 13
    • Continuuity Proprietary and Confidential PROCESSING WITH TX ...Queue ... ... Flowlet DataSet Thursday, June 6, 13
    • Continuuity Proprietary and Confidential TRANSACTIONS: WHAT? • Atomic - Entire transaction is committed as one • Consistent - No partial state change due to failure • Isolated - No dirty reads, transaction is only visible after commit • Durable - Once committed, data is persisted reliably Thursday, June 6, 13
    • Continuuity Proprietary and Confidential OMID-STYLE TRANSACTIONS • Multi-Version Concurrency Control with Version = HBase Timestamp • All writes in the same transaction use the transaction ID as timestamp • Reads exclude other, uncommitted transactions (for isolation) • Optimistic Concurrency Control • Conflict detection at commit of transaction • Write Conflict: two overlapping transactions write the same row • Rollback of one transaction in case of conflict (whichever commits later) Thursday, June 6, 13
    • Continuuity Proprietary and Confidential OMID-STYLE TRANSACTIONS start tx do work has conflicts commit tx Tx Oracle get write pointer HBase write with version=pointer rollback abort tx Thursday, June 6, 13
    • Continuuity Proprietary and Confidential OPTIMISTIC CONCURRENCY CONTROL • Optimistic Concurrency Control • Avoids cost of locking rows and tables • No deadlocks or lock escalations • Cost of conflict detection and possible rollback is higher • Good if conflicts are rare: short transaction, disjoint partitioning of work Thursday, June 6, 13
    • Continuuity Proprietary and Confidential OMID-STYLE TRANSACTIONS has conflicts create tx track tx ops check conflicts make tx visible commit tx no conflicts start tx do work remove txabort tx get ops to rollback get new tx add ops to tx try commit Tx OracleTx Agent Thursday, June 6, 13
    • Continuuity Proprietary and Confidential TRANSACTION ORACLE • Simple & Fast • Single point of failure? • Persist all state to a write-ahead log • Secondary oracle that subscribes to log • Failover can happen quickly Thursday, June 6, 13
    • Continuuity Proprietary and Confidential QUEUES • Flowlets pass data to each other on queues • Every consumer (flowlet) can be partitioned • More than one consumer (flowlet) can read a queue • Queues are partitioned to scale throughput Thursday, June 6, 13
    • Continuuity Proprietary and Confidential Flowlet QUEUES & FLOWLETS ...Queue ... Instance2 Instance1 ... Thursday, June 6, 13
    • Continuuity Proprietary and Confidential QUEUE DESIGN • Queue entries are written in sequence • Write pointer only goes forward • Queue entries are read sequentially • Read pointer only goes forward • Reader waits for entry to be written • Transactions are used for isolation & consistency guarantees Thursday, June 6, 13
    • Continuuity Proprietary and Confidential QUEUE OPERATION WritePointer ... enqueue inc & get entry meta valid? ... ... ... [data] [data] trueenqueue Queue set falsecommit tx abort tx start tx write commit tx abort tx start tx ReadPointer Consumer State Claimed Entries List inc & get read ... Producer Consumer put dequeue Thursday, June 6, 13
    • Continuuity Proprietary and Confidential PERFORMANCE CONSIDERATIONS • Every entry costs at least 4 writes: • Enqueue = Increment + Put = 2 writes to WAL • Dequeue = 2 x Get + Put = 1 write to WAL • Ack = Get + Put = 1 write to WAL • Caching consumer state in-memory (still persisting every change) • Dequeue = Get + Put • Ack = Put Thursday, June 6, 13
    • Continuuity Proprietary and Confidential PERFORMANCE IMPROVEMENTS • Prefetching n entries and caching them in state: • Dequeue = 1/n x Put • Batch enqueues + dequeues • Enqueue = 2/n x Put • Dequeue = 1/n x Get + 1/n Put • Ack = 1/n x Put Thursday, June 6, 13
    • Continuuity Proprietary and Confidential PERFORMANCE NUMBERS • enqueue: 10K ops/sec per producer per node • dequeue: 5K ops/sec per consumer per node • 2 vs 1 RPC calls comparing to enqueue op Thursday, June 6, 13
    • Continuuity Proprietary and Confidential HBASE WISHLIST • Filters for Get, not just max timestamp (for transactional read) • Filters for Increment and CheckAndPut (for transactional writes) • Ability to aggregate writes to WAL in co-processors (for faster queues) • No-read atomic Append operation • No-read atomic Increment operation Thursday, June 6, 13
    • Continuuity Proprietary and Confidential QS? Looking for the chance to work with a team that is defining a new category within Big Data? We are hiring! careers@continuuity.com Thursday, June 6, 13