Large-scale Incremental Processing Using Distributed Transactions and Notifications [email_address]
Agenda Introduction Design Bigtable overview Transaction Timestamps Notifications Discussion Reference
Introduction why Percolator? data processing tasks that transform a large repository of data via small, independent mutations. RDBMS? MapReduce? a system for incrementally processing updates to a large data set create the Google web search index reduce the average age of documents in Google search results by 50%
Introduction(cont.) Percolator features random access to a multi-PB repository ACID-compliant transactions  snapshot isolation semantics observers: like triggers in DBMS, applications are structured as a series of observers user scenarios computation should be very large in some dimension can be broken down into small updates have some strong  consistency requirements
Design Two main abstractions  ACID transactions over a random-access repository observers Components a percolator worker/ a bigtable tablet server/ a GFS chunk server timestamp oracle light weight lock service
Design: Bigtable overview Bigtable row transaction  (hbase?) percolator’s API closely resembles Bigtable’s API percolator library largely consists of Bigtable operations wrapped in Percolator-specific computations Challenges multirow transactions  the observer framework
Design: Transactions Cross-row, cross-table transactions with ACID snapshot-isolation semantics no serializability  No central transactions management, but built as a client library accessing Bigtable lock server need to replicated, distributed and balanced, and write to a persistent data store. store locks in special in-memory columns in the same Bigtable that stores data
Design: Transactions(cont.) The transaction’s constructor asks the timestamp oracle for a start timestamp. determines the consistent snapshot seen by Get() calls to Set() are buffered until commit time. 2-phase commit try to lock all the cells being written obtains the commit timestamp, then release its lock and make its write visible by replacing the lock with a write record
Design: Transactions(cont.) Error recovery client failure while transaction being commited. lazy approach to cleanup failure judgment: primary lock roll back client failure during the second phase of commit. past the commit point roll forward Lock cleanup only cleanup lock belongs to a dead or stuck worker (use chubby)
Design: Timestamps Hands out timestamps in strictly increasing order. batches timestamp requests 2 million timestamps per second from a single machine Guarantee that Get() returns all commited writes before the transaction’s start timestamp. T W  < T R
Design: Notifications Observer registers a function and a set of columns with Percolator Percolator scan two special columns and call responding observers Ack Notify in practice, very few observers(10), one observer run on a particular column
Design: Discussion Many RPCs per work unit 50 to process a single document solutions Add conditional mutations in Bigtable API Batch operations Prefetch All API calls blocking Rely on running thousands of thread to provide enough parallelism
Reference Large-scale Incremental Processing Using Distributed Transactions and Notifications”, OSDI’10 http://www.infoq.com/cn/news/2010/10/google-percolator
HBase Coprocessor provides a framework both for distributed computation directly within the HBase server processes and flexible and generic extension.  Observer RegionObserver MasterObserver WALObserver Endpoint
Thank you!

Percolator

  • 1.
    Large-scale Incremental ProcessingUsing Distributed Transactions and Notifications [email_address]
  • 2.
    Agenda Introduction DesignBigtable overview Transaction Timestamps Notifications Discussion Reference
  • 3.
    Introduction why Percolator?data processing tasks that transform a large repository of data via small, independent mutations. RDBMS? MapReduce? a system for incrementally processing updates to a large data set create the Google web search index reduce the average age of documents in Google search results by 50%
  • 4.
    Introduction(cont.) Percolator featuresrandom access to a multi-PB repository ACID-compliant transactions snapshot isolation semantics observers: like triggers in DBMS, applications are structured as a series of observers user scenarios computation should be very large in some dimension can be broken down into small updates have some strong consistency requirements
  • 5.
    Design Two mainabstractions ACID transactions over a random-access repository observers Components a percolator worker/ a bigtable tablet server/ a GFS chunk server timestamp oracle light weight lock service
  • 6.
    Design: Bigtable overviewBigtable row transaction (hbase?) percolator’s API closely resembles Bigtable’s API percolator library largely consists of Bigtable operations wrapped in Percolator-specific computations Challenges multirow transactions the observer framework
  • 7.
    Design: Transactions Cross-row,cross-table transactions with ACID snapshot-isolation semantics no serializability No central transactions management, but built as a client library accessing Bigtable lock server need to replicated, distributed and balanced, and write to a persistent data store. store locks in special in-memory columns in the same Bigtable that stores data
  • 8.
    Design: Transactions(cont.) Thetransaction’s constructor asks the timestamp oracle for a start timestamp. determines the consistent snapshot seen by Get() calls to Set() are buffered until commit time. 2-phase commit try to lock all the cells being written obtains the commit timestamp, then release its lock and make its write visible by replacing the lock with a write record
  • 9.
    Design: Transactions(cont.) Errorrecovery client failure while transaction being commited. lazy approach to cleanup failure judgment: primary lock roll back client failure during the second phase of commit. past the commit point roll forward Lock cleanup only cleanup lock belongs to a dead or stuck worker (use chubby)
  • 10.
    Design: Timestamps Handsout timestamps in strictly increasing order. batches timestamp requests 2 million timestamps per second from a single machine Guarantee that Get() returns all commited writes before the transaction’s start timestamp. T W < T R
  • 11.
    Design: Notifications Observerregisters a function and a set of columns with Percolator Percolator scan two special columns and call responding observers Ack Notify in practice, very few observers(10), one observer run on a particular column
  • 12.
    Design: Discussion ManyRPCs per work unit 50 to process a single document solutions Add conditional mutations in Bigtable API Batch operations Prefetch All API calls blocking Rely on running thousands of thread to provide enough parallelism
  • 13.
    Reference Large-scale IncrementalProcessing Using Distributed Transactions and Notifications”, OSDI’10 http://www.infoq.com/cn/news/2010/10/google-percolator
  • 14.
    HBase Coprocessor providesa framework both for distributed computation directly within the HBase server processes and flexible and generic extension. Observer RegionObserver MasterObserver WALObserver Endpoint
  • 15.