Upcoming SlideShare
Loading in...5







Total Views
Views on SlideShare
Embed Views



1 Embed 104 104



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Percolator Percolator Presentation Transcript

    • Large-scale Incremental Processing Using Distributed Transactions and Notifications [email_address]
    • Agenda
      • Introduction
      • Design
        • Bigtable overview
        • Transaction
        • Timestamps
        • Notifications
        • Discussion
      • Reference
    • Introduction
      • why Percolator?
        • data processing tasks that transform a large repository of data via small, independent mutations.
          • RDBMS?
          • MapReduce?
        • a system for incrementally processing updates to a large data set
          • create the Google web search index
          • reduce the average age of documents in Google search results by 50%
    • Introduction(cont.)
      • Percolator
        • features
          • random access to a multi-PB repository
          • ACID-compliant transactions
          • snapshot isolation semantics
          • observers: like triggers in DBMS, applications are structured as a series of observers
        • user scenarios
          • computation should be very large in some dimension
          • can be broken down into small updates
          • have some strong consistency requirements
    • Design
      • Two main abstractions
        • ACID transactions over a random-access repository
        • observers
      • Components
        • a percolator worker/ a bigtable tablet server/ a GFS chunk server
        • timestamp oracle
        • light weight lock service
    • Design: Bigtable overview
      • Bigtable
        • row transaction (hbase?)
        • percolator’s API closely resembles Bigtable’s API
        • percolator library largely consists of Bigtable operations wrapped in Percolator-specific computations
      • Challenges
        • multirow transactions
        • the observer framework
    • Design: Transactions
      • Cross-row, cross-table transactions with ACID snapshot-isolation semantics
        • no serializability
      • No central transactions management, but built as a client library accessing Bigtable
        • lock server need to replicated, distributed and balanced, and write to a persistent data store.
        • store locks in special in-memory columns in the same Bigtable that stores data
    • Design: Transactions(cont.)
      • The transaction’s constructor asks the timestamp oracle for a start timestamp.
        • determines the consistent snapshot seen by Get()
        • calls to Set() are buffered until commit time.
      • 2-phase commit
        • try to lock all the cells being written
        • obtains the commit timestamp, then release its lock and make its write visible by replacing the lock with a write record
    • Design: Transactions(cont.)
      • Error recovery
        • client failure while transaction being commited.
          • lazy approach to cleanup
          • failure judgment: primary lock
          • roll back
        • client failure during the second phase of commit.
          • past the commit point
          • roll forward
      • Lock cleanup
        • only cleanup lock belongs to a dead or stuck worker (use chubby)
    • Design: Timestamps
      • Hands out timestamps in strictly increasing order.
        • batches timestamp requests
        • 2 million timestamps per second from a single machine
      • Guarantee that Get() returns all commited writes before the transaction’s start timestamp.
        • T W < T R
    • Design: Notifications
      • Observer
        • registers a function and a set of columns with Percolator
        • Percolator scan two special columns and call responding observers
          • Ack
          • Notify
        • in practice, very few observers(10), one observer run on a particular column
    • Design: Discussion
      • Many RPCs per work unit
        • 50 to process a single document
        • solutions
          • Add conditional mutations in Bigtable API
          • Batch operations
          • Prefetch
      • All API calls blocking
        • Rely on running thousands of thread to provide enough parallelism
    • Reference
      • Large-scale Incremental Processing Using Distributed Transactions and Notifications”, OSDI’10
    • HBase Coprocessor
      • provides a framework both for distributed computation directly within the HBase server processes and flexible and generic extension.
        • Observer
          • RegionObserver
          • MasterObserver
          • WALObserver
        • Endpoint
      • Thank you!