Google Mesa
Sameer Tiwari
Hadoop Architect, Pivotal Inc.
stiwari@pivotal.io @sameertech Aug 12, 2014
What is Mesa?
● Geo-Replicated, Near Real-Time, Scalable
Data Warehousing for Google’s Internet
Advertising Business.
● Ok so what is it really?
o Its an Atomic, Consistent, Available, Near Real
Time, Scalable Store
Salient features
● DW for Ad serving at Google
● Metadata on BigTable
● Data on Colossus
● Trillions of Queries/day, Millions/second
● Support Multiple indexes
● Runs on tens of thousands machines across
geos
Data Model
● Table are specified by Table Schemas
● Table Schema by, Key and Value Space
o K, V are sets
o Each is represented as column tuples
o Specifies an aggregation function
● Each Col stored separately
● For consistency updates are multi-versioned
and batched for throughput
● Data is amenable to aggregation
Data Model
● Pre-aggregates data into Deltas (no
repeated row keys/delta) and applies a
version
● Compaction is multi-level
● A Controller handles updates/ maintenance,
works with BigTable
Controller
● 4 sub-systems
o Updates
o Compaction
o Checksum
o Schema change
● Does not do any work, only schedules it
Storage and Indexes
- AO, log structured, read-only
- Rows organized as compressed row-blocks
- Indexes have starting entry of the row-block
- Naive lookup
- Binary Search on index to find row-
blocks
- Binary Search on the row-blocks
Query sub system
● Limited Query engine with Filtering/Predicate
● Used by higher level systems
Dremel/MySQL
● Has multiple stateless Query Servers
● Works on both the BigTable and Colossus
● Provides nice sharding and LB mechanism
● Groups similar queries to a subset of
Servers
Multi Datacenter Deployment
● Tables are multi-versioned
o (Serve old data while new is in-progress)
● Committer is stateless and sends updates to
multiple Datacenters
o Built on top of versionsDB. - Globally replicated and
consistent store build on top of distributed Paxos.
● Data goes async across Mesa instances
● Only Metadata is sync-repl using Paxos-
versionsDB
Optimizations
● Delta pruning - similar to Filter pushdown
● Resume-Key, Key per data block
o Data is returned a block at a time, so if a
QueryServer dies, another one can pick it up.
● Parallelizing workloads: Uses MR to shard
o While writing delta, Mesa sample row-keys which is
used to figure out the right number of
Mappers/Reducers.
o The workers are the same 4 workers scheduled by
the Controller
Optimizations
● Schema changes - two techniques
o Create, Copy, Replay and delete - Expensive
o Link and add default values - This is used in Mesa
● New Instances of Mesa use P2P
mechanisms to come up and online.
Handling Data Corruption
● Mesa runs on ~50K boxes
● Online - During updates.
o Fact: Each Mesa instance is logically same but
physically may differ in deltas
o Check chksums of indexes/data
o Row-order, key-range, aggregate values should be
same, across instances
● Offline
o Run global chksums of all indexes
Reference
http://static.googleusercontent.com/media/rese
arch.google.com/en/us/pubs/archive/42851.pdf

Google mesa

  • 1.
    Google Mesa Sameer Tiwari HadoopArchitect, Pivotal Inc. stiwari@pivotal.io @sameertech Aug 12, 2014
  • 2.
    What is Mesa? ●Geo-Replicated, Near Real-Time, Scalable Data Warehousing for Google’s Internet Advertising Business. ● Ok so what is it really? o Its an Atomic, Consistent, Available, Near Real Time, Scalable Store
  • 3.
    Salient features ● DWfor Ad serving at Google ● Metadata on BigTable ● Data on Colossus ● Trillions of Queries/day, Millions/second ● Support Multiple indexes ● Runs on tens of thousands machines across geos
  • 4.
    Data Model ● Tableare specified by Table Schemas ● Table Schema by, Key and Value Space o K, V are sets o Each is represented as column tuples o Specifies an aggregation function ● Each Col stored separately ● For consistency updates are multi-versioned and batched for throughput ● Data is amenable to aggregation
  • 5.
    Data Model ● Pre-aggregatesdata into Deltas (no repeated row keys/delta) and applies a version ● Compaction is multi-level ● A Controller handles updates/ maintenance, works with BigTable
  • 6.
    Controller ● 4 sub-systems oUpdates o Compaction o Checksum o Schema change ● Does not do any work, only schedules it
  • 7.
    Storage and Indexes -AO, log structured, read-only - Rows organized as compressed row-blocks - Indexes have starting entry of the row-block - Naive lookup - Binary Search on index to find row- blocks - Binary Search on the row-blocks
  • 8.
    Query sub system ●Limited Query engine with Filtering/Predicate ● Used by higher level systems Dremel/MySQL ● Has multiple stateless Query Servers ● Works on both the BigTable and Colossus ● Provides nice sharding and LB mechanism ● Groups similar queries to a subset of Servers
  • 9.
    Multi Datacenter Deployment ●Tables are multi-versioned o (Serve old data while new is in-progress) ● Committer is stateless and sends updates to multiple Datacenters o Built on top of versionsDB. - Globally replicated and consistent store build on top of distributed Paxos. ● Data goes async across Mesa instances ● Only Metadata is sync-repl using Paxos- versionsDB
  • 10.
    Optimizations ● Delta pruning- similar to Filter pushdown ● Resume-Key, Key per data block o Data is returned a block at a time, so if a QueryServer dies, another one can pick it up. ● Parallelizing workloads: Uses MR to shard o While writing delta, Mesa sample row-keys which is used to figure out the right number of Mappers/Reducers. o The workers are the same 4 workers scheduled by the Controller
  • 11.
    Optimizations ● Schema changes- two techniques o Create, Copy, Replay and delete - Expensive o Link and add default values - This is used in Mesa ● New Instances of Mesa use P2P mechanisms to come up and online.
  • 12.
    Handling Data Corruption ●Mesa runs on ~50K boxes ● Online - During updates. o Fact: Each Mesa instance is logically same but physically may differ in deltas o Check chksums of indexes/data o Row-order, key-range, aggregate values should be same, across instances ● Offline o Run global chksums of all indexes
  • 13.