Omid
Efficient Transaction Management and
  Incremental Processing for HBase




                                                             Daniel Gómez Ferro
                                                                         21/03/2013



       Copyright © 2013 Yahoo! All rights reserved. No reproduction or distribution allowed without express written permission.
About me

§ Research engineer at Yahoo!

§ Long term Omid committer

§ Apache S4 committer

§ Contributor to:
 • ZooKeeper
 • HBase
Motivation
Motivation

§ Traditional DBMS:            § NoSQL stores:
 • SQL                            • SQL
 • Consistent                     • Consistent
 • Scalable                       • Scalable



           § Some use cases require
                • Scalability
                • Consistent updates
Motivation

§ Lower latency
 • MapReduce latencies measured in hours
 • Usually low latency systems don’t provide strong
   consistency guarantees


§ Requested feature
 • Support for transactions in HBase is a recurrent request


§ Percolator
 • Google implemented transactions on BigTable
Incremental Processing

§ Small, consistent updates to a shared dataset

§ Useful when:

 • Very large datasets

 • Consistent updates

 • Incremental changes
Why Incremental Processing
        Offline                    Online
      MapReduce             Incremental Processing
      Fault tolerance              Fault tolerance
        Scalability                  Scalability
                                    Shared state




 State0            State1               Shared
                                         State



          Using Percolator, Google dramatically
              reduced crawl-to-index time
Transactions on HBase
Omid

§ ‘Hope’ in Persian

§ Optimistic Concurrency Control system
 • Lock free
 • Aborts transactions at commit time

§ Implements Snapshot Isolation

§ Low overhead

§ https://github.com/yahoo/omid
Snapshot Isolation

§ Each transaction reads from its own snapshot

§ Snapshot:
 • Immutable
 • Identified by creation time
 • Contains all values committed before creation time


§ Transactions conflict if two conditions apply:
 A) Write to the same row
 B) Overlap in time
Overlapping transactions

Transaction 1

Transaction 2

Transaction 3

Transaction 4
                                Time


  § Transaction 2 could conflict with 1 or 3
    • If they write to the same row

  § Transaction 4 doesn’t conflict
    • No overlapping transactions
Simple API

§ Based on Java Transaction API and HBase API

§ TransactionManager:
     Transaction begin();
     void commit(Transaction t) throws RollbackException;
     void rollback(Transaction t);


§ TTable:
     Result get(Transaction t, Get g);
     void put(Transaction t, Put p);
     ResultScanner getScanner(Transaction t, Scan s);
Omid Architecture
Architecture

1.  Centralized server
2.  Transactional metadata replicated to clients
3.  Store transaction ids in HBase timestamps


                      HBase Client
                       HBase Client S
                           Omid
                        HBase Client
                           Clients   O

                                   (2)    HBase
 Status Oracle (SO)                         HBase
                                          Region
                                              HBase
                                            Region
                                          Servers
                                                HBase
                                              Region
                                           Servers
                                               Region
                                             Servers
        (1)                                    Servers



                                                (3)
Omid Performance
§ Centralized server = bottleneck?
 • Focus on good performance
 • In our experiments, it never became the bottleneck
                      30
                             2 rows
                             4 rows
                      25     8 rows
                            16 rows
                            32 rows
                      20   128 rows
      Latency in ms




                           512 rows

                      15

                      10

                       5

                       0
                                      1K                10K    100K
                                           Throughput in TPS
Metadata Replication

§ Transactional metadata is replicated to clients

§ Clients guarantee Snapshot Isolation
 • Ignore data not in their snapshot
 • Talk directly to HBase
 • Conflicts resolved by the Status Oracle


§ Expensive, but scalable up to 1000 clients
Fault Tolerance

§ Omid uses BookKeeper for fault tolerance
 • BookKeeper is a Distributed Write Ahead Log


§ Before answering a commit request
 • Log it to BookKeeper asynchronously
 • Wait for a reply
 • Notify the client


§ If the Status Oracle crashes
 • Recover the state from BookKeeper
Fault Tolerance Overhead

§ Omid batches writes to BookKeeper
 • Write every 5 ms or when the batch > 1 KB

§ Recovery: reads log’s tail (bounded time)
                                         Downtime < 25 s
                       18000
   Throughput in TPS




                       15000
                       12000
                        9000
                        6000
                        3000
                           0
                           5900   5950      6000     6050   6100
                                         Time in s
Example Application
   TF-IDF on Tweets
TF-IDF

§ Term Frequency – Inverse Document Frequency
 • How important is a word to a document
 • Useful for search engines


§ Given a set of words (query)
 • Return documents with highest TF-IDF
TF-IDF on Tweets

§ Given a set of words
 • Return relevant HashTags


§ Document
 • collection of tweets with same hashtag


§ Update the index incrementally
Implementation

§ Read tweets’ stream, put them in queue
§ Workers process each tweet in parallel
 • One transaction per tweet
 • Update frequencies consistently
 • In case of abort, retry


§ Queries have a consistent view of the database
Problems


§ Hard to distribute load to other machines

§ Complex processing cause big transactions
 • More likely to conflict
 • More expensive to retry


§ API is too low level for data processing
Future Work
Future work


§ Framework for Incremental Processing
 • Simpler API
 • Trigger-based, easy to decompose operations
 • Auto scalable



§ Integrate Omid with other Data Stores
Contributors


       Ben Reed
       Flavio Junqueira
       Francisco Pérez-Sorrosal
       Ivan Kelly
       Matthieu Morel
       Maysam Yabandeh
Questions?

Omid Efficient Transaction Mgmt and Processing for HBase

  • 1.
    Omid Efficient Transaction Managementand Incremental Processing for HBase Daniel Gómez Ferro 21/03/2013 Copyright © 2013 Yahoo! All rights reserved. No reproduction or distribution allowed without express written permission.
  • 2.
    About me § Research engineerat Yahoo! § Long term Omid committer § Apache S4 committer § Contributor to: • ZooKeeper • HBase
  • 3.
  • 4.
    Motivation § Traditional DBMS: § NoSQL stores: • SQL • SQL • Consistent • Consistent • Scalable • Scalable § Some use cases require • Scalability • Consistent updates
  • 5.
    Motivation § Lower latency • MapReducelatencies measured in hours • Usually low latency systems don’t provide strong consistency guarantees § Requested feature • Support for transactions in HBase is a recurrent request § Percolator • Google implemented transactions on BigTable
  • 6.
    Incremental Processing § Small, consistentupdates to a shared dataset § Useful when: • Very large datasets • Consistent updates • Incremental changes
  • 7.
    Why Incremental Processing Offline Online MapReduce Incremental Processing Fault tolerance Fault tolerance Scalability Scalability Shared state State0 State1 Shared State Using Percolator, Google dramatically reduced crawl-to-index time
  • 8.
  • 9.
    Omid § ‘Hope’ in Persian § OptimisticConcurrency Control system • Lock free • Aborts transactions at commit time § Implements Snapshot Isolation § Low overhead § https://github.com/yahoo/omid
  • 10.
    Snapshot Isolation § Each transactionreads from its own snapshot § Snapshot: • Immutable • Identified by creation time • Contains all values committed before creation time § Transactions conflict if two conditions apply: A) Write to the same row B) Overlap in time
  • 11.
    Overlapping transactions Transaction 1 Transaction2 Transaction 3 Transaction 4 Time § Transaction 2 could conflict with 1 or 3 • If they write to the same row § Transaction 4 doesn’t conflict • No overlapping transactions
  • 12.
    Simple API § Based onJava Transaction API and HBase API § TransactionManager: Transaction begin(); void commit(Transaction t) throws RollbackException; void rollback(Transaction t); § TTable: Result get(Transaction t, Get g); void put(Transaction t, Put p); ResultScanner getScanner(Transaction t, Scan s);
  • 13.
  • 14.
    Architecture 1.  Centralized server 2. Transactional metadata replicated to clients 3.  Store transaction ids in HBase timestamps HBase Client HBase Client S Omid HBase Client Clients O (2) HBase Status Oracle (SO) HBase Region HBase Region Servers HBase Region Servers Region Servers (1) Servers (3)
  • 15.
    Omid Performance § Centralized server= bottleneck? • Focus on good performance • In our experiments, it never became the bottleneck 30 2 rows 4 rows 25 8 rows 16 rows 32 rows 20 128 rows Latency in ms 512 rows 15 10 5 0 1K 10K 100K Throughput in TPS
  • 16.
    Metadata Replication § Transactional metadatais replicated to clients § Clients guarantee Snapshot Isolation • Ignore data not in their snapshot • Talk directly to HBase • Conflicts resolved by the Status Oracle § Expensive, but scalable up to 1000 clients
  • 17.
    Fault Tolerance § Omid usesBookKeeper for fault tolerance • BookKeeper is a Distributed Write Ahead Log § Before answering a commit request • Log it to BookKeeper asynchronously • Wait for a reply • Notify the client § If the Status Oracle crashes • Recover the state from BookKeeper
  • 18.
    Fault Tolerance Overhead § Omidbatches writes to BookKeeper • Write every 5 ms or when the batch > 1 KB § Recovery: reads log’s tail (bounded time) Downtime < 25 s 18000 Throughput in TPS 15000 12000 9000 6000 3000 0 5900 5950 6000 6050 6100 Time in s
  • 19.
    Example Application TF-IDF on Tweets
  • 20.
    TF-IDF § Term Frequency –Inverse Document Frequency • How important is a word to a document • Useful for search engines § Given a set of words (query) • Return documents with highest TF-IDF
  • 21.
    TF-IDF on Tweets § Givena set of words • Return relevant HashTags § Document • collection of tweets with same hashtag § Update the index incrementally
  • 22.
    Implementation § Read tweets’ stream,put them in queue § Workers process each tweet in parallel • One transaction per tweet • Update frequencies consistently • In case of abort, retry § Queries have a consistent view of the database
  • 23.
    Problems § Hard to distributeload to other machines § Complex processing cause big transactions • More likely to conflict • More expensive to retry § API is too low level for data processing
  • 24.
  • 25.
    Future work § Framework forIncremental Processing • Simpler API • Trigger-based, easy to decompose operations • Auto scalable § Integrate Omid with other Data Stores
  • 26.
    Contributors Ben Reed Flavio Junqueira Francisco Pérez-Sorrosal Ivan Kelly Matthieu Morel Maysam Yabandeh
  • 27.