2. What is Mesa?
● Geo-Replicated, Near Real-Time, Scalable
Data Warehousing for Google’s Internet
Advertising Business.
● Ok so what is it really?
o Its an Atomic, Consistent, Available, Near Real
Time, Scalable Store
3. Salient features
● DW for Ad serving at Google
● Metadata on BigTable
● Data on Colossus
● Trillions of Queries/day, Millions/second
● Support Multiple indexes
● Runs on tens of thousands machines across
geos
4. Data Model
● Table are specified by Table Schemas
● Table Schema by, Key and Value Space
o K, V are sets
o Each is represented as column tuples
o Specifies an aggregation function
● Each Col stored separately
● For consistency updates are multi-versioned
and batched for throughput
● Data is amenable to aggregation
5. Data Model
● Pre-aggregates data into Deltas (no
repeated row keys/delta) and applies a
version
● Compaction is multi-level
● A Controller handles updates/ maintenance,
works with BigTable
6. Controller
● 4 sub-systems
o Updates
o Compaction
o Checksum
o Schema change
● Does not do any work, only schedules it
7. Storage and Indexes
- AO, log structured, read-only
- Rows organized as compressed row-blocks
- Indexes have starting entry of the row-block
- Naive lookup
- Binary Search on index to find row-
blocks
- Binary Search on the row-blocks
8. Query sub system
● Limited Query engine with Filtering/Predicate
● Used by higher level systems
Dremel/MySQL
● Has multiple stateless Query Servers
● Works on both the BigTable and Colossus
● Provides nice sharding and LB mechanism
● Groups similar queries to a subset of
Servers
9. Multi Datacenter Deployment
● Tables are multi-versioned
o (Serve old data while new is in-progress)
● Committer is stateless and sends updates to
multiple Datacenters
o Built on top of versionsDB. - Globally replicated and
consistent store build on top of distributed Paxos.
● Data goes async across Mesa instances
● Only Metadata is sync-repl using Paxos-
versionsDB
10. Optimizations
● Delta pruning - similar to Filter pushdown
● Resume-Key, Key per data block
o Data is returned a block at a time, so if a
QueryServer dies, another one can pick it up.
● Parallelizing workloads: Uses MR to shard
o While writing delta, Mesa sample row-keys which is
used to figure out the right number of
Mappers/Reducers.
o The workers are the same 4 workers scheduled by
the Controller
11. Optimizations
● Schema changes - two techniques
o Create, Copy, Replay and delete - Expensive
o Link and add default values - This is used in Mesa
● New Instances of Mesa use P2P
mechanisms to come up and online.
12. Handling Data Corruption
● Mesa runs on ~50K boxes
● Online - During updates.
o Fact: Each Mesa instance is logically same but
physically may differ in deltas
o Check chksums of indexes/data
o Row-order, key-range, aggregate values should be
same, across instances
● Offline
o Run global chksums of all indexes