Stephen McHenry - Chanecellor of Site Reliability Engineering, Google

Woulda, Coulda, Shoulda
The World of Tera, Peta & Exa
Stephen McHenry
Chancellor of Site Reliability Engineering

April 22, 2009

Google Confidential and Proprietary

Overview

•  Mission Statement
•  Some History
•  Planning for
•  Failure
•  Expansion
•  Applications
•  Infrastructure
•  Hardware
•  The Future


Google’s Mission

To organize the world’s information
and make it universally
accessible and useful


Lego Disk Case

One of our earliest storage systems


Peak of google.stanford.edu (circa 1997)


The Infamous “Corkboard”


Many Corkboards (1999)


A Data Center
in 1999…


Another Data Center, Spring 2000

Note the Cooling

google.com (new data center 2001)


google.com (3 days later)


Current Data
Center


Overview

•  Mission Statement
•  Some History
•  The Challenge
•  Planning for
•  Failure
•  Expansion
•  Applications
•  Infrastructure
•  Hardware
•  The Future


Just For Reference

Terabyte – 1012 Bytes -1,000,000,000,000 Bytes

Petabyte – 1015 Bytes – 1000 Terabytes
1,000,000,000,000,000 Bytes

Exabyte – 1018 Bytes – 1 Million Terabytes
1,000,000,000,000,000,000 Bytes

Zettabyte – 1021 Bytes – 1 Billion Terabytes
1,000,000,000,000,000,000,000 Bytes

Yottabyte – 1024 Bytes – 1 Trillion Terabytes
1,000,000,000,000,000,000,000,000 Bytes


How much information is out there?

How large is the Web?
•  Tens of billions of documents? Hundreds?
•  ~10KB/doc => 100s of Terabytes

Then there’s everything else
•  Email, personal files, closed databases, broadcast media, print, etc.

Estimated 5 Exabytes/year (growing at 30%)*

800MB/year/person – ~90% in magnetic media

Web is just a tiny starting point

Source: How much information 2003

Google takes its mission seriously

Started with the Web (html)
Added various document formats
•  Images
•  Commercial data: ads and shopping (Froogle)
•  Enterprise (corporate data)
•  News
•  Email (Gmail)
•  Scholarly publications
•  Local information
•  Maps
•  Yellow pages
•  Satellite images
•  Instant messaging and VoIP
•  Communities (Orkut)
•  Printed media
•  …


Ever-Increasing Computation Needs

more

Every Google service sees
data

continuing growth in
computational needs
•  More queries
  More users, happier users more
queries

•  More data
  Bigger web, mailbox, blog, etc.
better
results
•  Better results
  Find the right information, and
find it faster


When Your Data Center Reaches 170o F
o


The Joys of Real Hardware

Typical first year for a new cluster:

~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)
~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)
~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)
~1 network rewiring (rolling ~5% of machines down over 2-day span)
~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)
~5 racks go wonky (40-80 machines see 50% packetloss)
~8 network maintenances (4 might cause ~30-minute random connectivity losses)
~12 router reloads (takes out DNS and external vips for a couple minutes)
~3 router failures (have to immediately pull traffic for an hour)
~dozens of minor 30-second blips for dns
~1000 individual machine failures
~thousands of hard drive failures

slow disks, bad memory, misconfigured machines, flaky machines, etc.


Components of Web Search
Crawling process
Get link from
Crawler (Spider): Expired pages
list
from index
Fetch page
  Collects the documents List of
links to
Parses page

•  Tradeoff between size and speed
to
explore
extract links
Add URL
•  High networking bandwidth requirements Add to queue

•  Be gentle to serving hosts while doing it

Indexer:
  Generates the index - similar to the back of a book (but big!)
  Requires several days on thousands of computers
  More than 20 billion web documents
•  Web, Images, News, Usenet messages, …
  Pre-compute query-independent ranking (PageRank, etc)

Query serving:
  Processes user queries
  Finding all relevant documents
•  Search over tens of Terabytes, 1000s of times/second
  Scoring - Mix of query dependent and independent factors

Google Query Serving Infrastructure
Misc. servers
query
Spell checker
Google Web Server
Ad Server

Doc servers
Index servers

I0 I1 I2 IN D0 D1 DM

… …
Replicas

Replicas
…

…
Doc shards
Index shards

Elapsed time: 0.25s, machines involved: 1000+

Ads System

As challenging as search
•  But with some transactional semantics

Problem: find useful ads based on what the user is interested in at that
moment
•  A form of mind reading

Two systems
•  Ads for search results pages (search for tires or restaurants)
•  Ads for web browsing/email (or ‘content ads’)
  Extract a contextual meaning from web pages
  Do the same thing for data from a gazillion advertisers
  Match those up and score them
  Do it faster than the original content provider can respond to the web page!


Example: Sunday NY Times


Language Translation (by Machine)

Information is more useful if more people can understand it

Translation is a long-standing, challenging Artificial Intelligence problem

Key insight:
•  Transform it into a statistical modeling problem
•  Train it with tons of data!

Doubling training corpus size 
Chinese-English Arabic-English ~0.5% higher score


Data + CPUs = Playground

Substantial fraction of internet available for processing

Easy-to-use teraflops/petabytes

Cool problems, great fun…


Learning From Data

Searching for Britney Spears…


Query Frequency Over Time

Queries containing “eclipse”
Queries containing “world series”

Queries containing “full moon”
Queries containing “summer olympics”

Queries containing “watermelon”
Queries containing “opteron”


WhiteHouse.gov/openforquestions


A Simple Challenge For Our Computing Platform

1.  Create the world’s largest computing infrastructure

2.  Make sure we can afford it

Need to drive efficiency of the computing infrastructure to
unprecedented levels
  indices containing more documents
  updated more often
  faster queries
  faster product development cycles
  …


Systems Infrastructure

Google File System (GFS)

Map Reduce

Big Table


GFS: Google File System

Planning – For unprecedented quantities of data storage & failure(s)

Google has unique FS requirements
•  Huge read/write bandwidth
•  Reliability over thousands of nodes
•  Mostly operating on large data blocks
•  Need efficient distributed operations

GFS Usage @ Google
•  Many clusters
•  Filesystem clusters of up to 5000+ machines
•  Pools of 10000+ clients
•  5+ PB Filesystems
•  40 GB/s read/write load in single cluster
•  (in the presence of frequent HW failures)

GFS Setup

Replicas
Misc. servers
GFS Master
Client
Masters
GFS Master
Client
Client

C1
C1
C0
C0
C5

…
C2
C3
C2
C5
C5

Machine 2
Machine N
Machine 1

•  Master manages metadata
•  Data transfers happen directly between clients/
machines

MapReduce – Large Scale Processing

Okay, GFS lets us store lots of data… now what?

We need to process that data in new and interesting ways!
•  Fast: locality optimization, optimized sorter, lots of tuning work done...
•  Robust: handles machine failure, bad records, …
•  Easy to use: little boilerplate, supports many formats, …
•  Scalable: can easily add more machines to handle more data or reduce the
run-time
•  Widely applicable: can solve a broad range of problems
•  Monitoring: status page, counters, …

The Plan – Develop a robust compute infrastructure that allows rapid
development of complex analyses, and is tolerant to failure(s)


MapReduce – Large Scale Processing

MapReduce:
•  a framework to simplify large-scale computations on large clusters
•  Good for batch operations
•  User writes two simple functions: map and reduce
•  Underlying library/framework takes care of messy details
•  Greatly simplifies large, distributed data processing

Lots of uses inside Google
Ads Sawmill (Logs Analysis)
Froogle Search My History
Google Earth Search quality
Google Local Spelling
Google News Web search indexing
Google Print …many other internal projects ...
Machine Translation


Large Scale Processing – (semi) Structured Data

Why not just use commercial DB?
•  Scale is too large for most commercial databases
•  Even if it weren’t, cost would be very high
  Building internally means system can be applied across many projects
for low incremental cost
•  Low-level storage optimizations help performance significantly
  Much harder to do when running on top of a database layer

Okay, traditional relational databases are woefully
inadequate at this scale… now what?

The Plan – Build a large scale, distributed solution for semi-
structured data, that is resistant to failure(s)


Large Scale Processing – (semi) Structured Data

BigTable:
•  A large-scale storage system for semi-structured data
•  Database-like model, but data stored on thousands of machines..
•  Fault-tolerant, persistent
•  Scalable
Thousands of servers
 
Terabytes of in-memory data
 
Petabytes of disk-based data
 
Millions of reads/writes per second, efficient scans
 
billions of URLs, many versions/page (~20K/version)
 
Hundreds of millions of users, thousands of queries/sec
 
100TB+ of satellite image data
 
•  Self-managing
  Servers can be added/removed dynamically
  Servers adjust to load imbalance
•  Design/initial implementation started beginning of 2004


BigTable Usage

Useful for structured/semi-structured data
  URLs - Contents, crawl metadata, links, anchors, pagerank, …
  Per-user data - User preference settings, recent queries/search results, …
  Geographic data - Physical entities, roads, satellite imagery, annotations, …

Production use or active development for ~70 projects:
Google Print
 
My Search History
 
Orkut
 
Crawling/indexing pipeline
 
Google Maps/Google Earth
 
Blogger
 
…
 

Currently ~500 BigTable cells
Largest bigtable cell manages ~3000TB of data spread over several
thousand machines (larger cells planned)


Innovative Solutions Needed In Several Areas

Server design and architecture

Power efficiency

System software

Large scale networking

Performance tuning and optimization

System management and repairs automation


Pictorial History

•  Brainstorming Circa 2003
•  Container-based data centers
•  Battery per server instead of traditional
UPS
99.9% efficient backup power!
o 

•  Application of best practices leads to PUE
below 1.2


Pictorial History

Prototype arriving at Google, Jan 2005


Pictorial History

The first crane was too small -- Take 2


Pictorial History

Google prototypes first airborne data center


Pictorial History

And into the parking garage we go


Data Center Vitals

•  Capacity: 10 MW IT load
•  Area: 75000 sq ft total under roof
•  Overall power density: 133W/sq ft
•  Prototype container delivered January 2005
•  Data center built 2004-2005
•  Construction completed September, 2005
•  Went live November 21, 2005


Additional Vitals

•  45 containers, approx. 40000 servers
•  Single and 2-story on facing sides of hangar
•  Bridge crane for container handling


Planning for the Future

•  Manage Total Cost of Ownership
•  Reduce Water Usage
•  Reduce Power Consumption
•  Manage E-Waste


Total Cost of Ownership - TCO

Earnings and sustainability are (often) aligned
•  Careful application of best practices leads
to much lower energy use which leads to lower
TCO for facilities – Examples:
Manage air flow - avoid hot/cold mixing
o 
Raise the inlet temperature
o 
Use free cooling (Belgium has no
o 
chillers!)
Optimize power distribution
o 
•  Don't need exotic technologies
•  But: need to break down traditional silos
Between capex and opex
o 
Between facilities and IT
o 

Manage everyone by impact on TCO
o 


Water resources management is the next
quot;elephant in the roomquot; we are all
going to have to address.


A Great Wave Rising:
The coming U.S. crisis in water policy

Lake Powell
53% full

(from ESPN!)

Shasta Lake

Lake Mead water could dry up by 2021*

Lake Mead historical levels

Lake Mead - 45% full

* Scripps Institution of Oceanography, UCSD,
Feb 2008.
Lake Oroville - new docks

Georgia’s Lake Lanier

March 4, 2007 February 11, 2008


Lake Hartwell, GA – November 2008


Water – The Next “Big Elephant”

Why?
•  Water resources are becoming (a lot) scarcer and more
variable
How do data centers fit in?
•  For every 10 MW consumed, the average data center
uses ~150,000 gallons of water per day for cooling.
•  Upstream of the data center, the same 10 MW of
delivered power consumes 480,000 gallons of water per
day to generate that power.
References:
U.S. Dept. of Energy – Energy Demands On Water Resources – Dec., 2006
National Renewable Energy Laboratory - Consumptive Water Use for U.S. Power Production - Dec., 2003
USGS - Water Use At Home - Jan., 2009


Water Consumption (gpd) by DC Type
Factoid: The typical 'water-less' DC uses about a third more water than the evaporatively cooled Google DC

Using less power is the most significant factor for reducing
water consumption

Water Recycling:

Our data center in St. Ghislain, Belgium

Google's data center
in Belgium uses
100% reclaimed
water from an
industrial canal


Power - Cutting waste / Smarter computing

Fact: The typical PC wastes half the electricity it uses

Fact: Over 60% of all corporate PCs are left on overnight
________________________________________________

•  End-user devices are the largest portion of IT footprint
•  Power efficiency is critical as billions of devices are deployed
•  The technology exists today to save energy and money

Buy power efficient laptops / PCs / servers
Google saves $30 per server every year

Enable power management
Power management suites: ROI < 1 year

Transition to lightweight devices
Reduce power from 150W to less than 5W
Potential: 50% emissions reduction


E-waste is a Growing Problem

•  Hazardous
•  High volume because of
obsolescence
•  Ubiquitous (computers,
appliances, consumer
electronics, cell phones) Solutions
•  4 R's: Reduce, reuse,
repair, recycle
•  Dispose of remainder
responsibly


Thank you!


Stephen McHenry - Chanecellor of Site Reliability Engineering, Google

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Stephen McHenry - Chanecellor of Site Reliability Engineering, Google

Similar to Stephen McHenry - Chanecellor of Site Reliability Engineering, Google (20)

Recently uploaded

Recently uploaded (20)

Stephen McHenry - Chanecellor of Site Reliability Engineering, Google