• Like
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.



Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Large-scale Incremental Processing Using Distributed Transactions and Notifications [email_address]
  • 2. Agenda
    • Introduction
    • Design
      • Bigtable overview
      • Transaction
      • Timestamps
      • Notifications
      • Discussion
    • Reference
  • 3. Introduction
    • why Percolator?
      • data processing tasks that transform a large repository of data via small, independent mutations.
        • RDBMS?
        • MapReduce?
      • a system for incrementally processing updates to a large data set
        • create the Google web search index
        • reduce the average age of documents in Google search results by 50%
  • 4. Introduction(cont.)
    • Percolator
      • features
        • random access to a multi-PB repository
        • ACID-compliant transactions
        • snapshot isolation semantics
        • observers: like triggers in DBMS, applications are structured as a series of observers
      • user scenarios
        • computation should be very large in some dimension
        • can be broken down into small updates
        • have some strong consistency requirements
  • 5. Design
    • Two main abstractions
      • ACID transactions over a random-access repository
      • observers
    • Components
      • a percolator worker/ a bigtable tablet server/ a GFS chunk server
      • timestamp oracle
      • light weight lock service
  • 6. Design: Bigtable overview
    • Bigtable
      • row transaction (hbase?)
      • percolator’s API closely resembles Bigtable’s API
      • percolator library largely consists of Bigtable operations wrapped in Percolator-specific computations
    • Challenges
      • multirow transactions
      • the observer framework
  • 7. Design: Transactions
    • Cross-row, cross-table transactions with ACID snapshot-isolation semantics
      • no serializability
    • No central transactions management, but built as a client library accessing Bigtable
      • lock server need to replicated, distributed and balanced, and write to a persistent data store.
      • store locks in special in-memory columns in the same Bigtable that stores data
  • 8. Design: Transactions(cont.)
    • The transaction’s constructor asks the timestamp oracle for a start timestamp.
      • determines the consistent snapshot seen by Get()
      • calls to Set() are buffered until commit time.
    • 2-phase commit
      • try to lock all the cells being written
      • obtains the commit timestamp, then release its lock and make its write visible by replacing the lock with a write record
  • 9. Design: Transactions(cont.)
    • Error recovery
      • client failure while transaction being commited.
        • lazy approach to cleanup
        • failure judgment: primary lock
        • roll back
      • client failure during the second phase of commit.
        • past the commit point
        • roll forward
    • Lock cleanup
      • only cleanup lock belongs to a dead or stuck worker (use chubby)
  • 10. Design: Timestamps
    • Hands out timestamps in strictly increasing order.
      • batches timestamp requests
      • 2 million timestamps per second from a single machine
    • Guarantee that Get() returns all commited writes before the transaction’s start timestamp.
      • T W < T R
  • 11. Design: Notifications
    • Observer
      • registers a function and a set of columns with Percolator
      • Percolator scan two special columns and call responding observers
        • Ack
        • Notify
      • in practice, very few observers(10), one observer run on a particular column
  • 12. Design: Discussion
    • Many RPCs per work unit
      • 50 to process a single document
      • solutions
        • Add conditional mutations in Bigtable API
        • Batch operations
        • Prefetch
    • All API calls blocking
      • Rely on running thousands of thread to provide enough parallelism
  • 13. Reference
    • Large-scale Incremental Processing Using Distributed Transactions and Notifications”, OSDI’10
    • http://www.infoq.com/cn/news/2010/10/google-percolator
  • 14. HBase Coprocessor
    • provides a framework both for distributed computation directly within the HBase server processes and flexible and generic extension.
      • Observer
        • RegionObserver
        • MasterObserver
        • WALObserver
      • Endpoint
  • 15.
    • Thank you!