SlideShare a Scribd company logo
1 of 89
Warehouse Scale Computer
Introduction
• Had scale been the only distinguishing feature
of these systems we might simply refer to
them as datacenters.
• Datacenters are buildings where multiple
servers and communication gear are
co-located because of their common
environmental requirements and physical
security needs, and for ease of maintenance.
• In that sense, a WSC is a type of datacenter.
Introduction
• Traditional datacenters, however, typically host a large
number of relatively small- or medium-sized
applications, each running on a dedicated hardware
infrastructure that is de-coupled and protected from
other systems in the same facility.
• Those datacenters host hardware and software for
multiple organizational units or even different
companies.
• Different computing systems within such a datacenter
often have little in common in terms of hardware,
software, or maintenance infrastructure, and tend not
to communicate with each other at all.
Introduction
• WSCs currently power the services offered by
companies such as Google, Amazon, Facebook, and
Microsoft’s online services division.
• They differ significantly from traditional datacenters:
1. They belong to a single organization.
2. Use a relatively homogeneous hardware and system
software platform
3. Share a common systems management layer.
• Often, much of the application, middleware, and
system software is built in-house compared to the
predominance of third-party software running in
conventional datacenters.
Introduction
• Most importantly, WSCs run a smaller number of
very large applications (or Internet services), and
the common resource management
infrastructure allows significant deployment
flexibility.
• The requirements of homogeneity, single-
organization control, and enhanced focus on cost
efficiency motivate designers to take new
approaches in constructing and operating these
systems.
Introduction
• Internet services must achieve high
availability, typically aiming for at least 99.99%
uptime (“four nines”, about an hour of
downtime per year).
• Achieving fault-free operation on a large
collection of hardware and system software is
hard and is made more difficult by the large
number of servers involved.
Introduction
• Although it might be theoretically possible to
prevent hardware failures in a collection of
10,000 servers, it would surely be extremely
expensive.
• Consequently, WSC workloads must be
designed to gracefully tolerate large numbers
of component faults with little or no impact
on service level performance and availability.
ARCHITECTURAL OVERVIEW OF WSCS
• The hardware implementation of a WSC will differ
significantly from one installation to the next.
• Even within a single organization such as Google,
systems deployed in different years use different
basic elements, reflecting the hardware
improvements provided by the industry.
• However, the architectural organization of these
systems has been relatively stable over the last few
years.
• Therefore, it is useful to describe this general
architecture at a high level as it sets the background for
subsequent discussions.
ARCHITECTURAL OVERVIEW OF WSCS
• Being satisfied with neither the metric nor the
US system, rack designers use “rack units” to
measure the height of servers.
• 1U is 1.75 inches or 44.45 mm; a typical rack is
42U high.
• The 19-inch (48.26-cm) rack is still the
standard framework to hold servers, despite
this standard going back to railroad hardware
from the 1930s.
ARCHITECTURAL OVERVIEW OF WSCS
Sketch of the typical elements in warehouse-scale systems: 1U server (left), 7’ rack with
Ethernet switch (middle), and diagram of a small cluster with a cluster-level Ethernet switch/
router (right).
ARCHITECTURAL OVERVIEW OF WSCS
• Previous Figure depicts the high-level building blocks
for WSCs.
• A set of low-end servers, typically in a 1U or blade
enclosure format, are mounted within a rack and
interconnected using a local Ethernet switch.
• These rack-level switches, which can use 1- or 10-Gbps
links, have a number of uplink connections to one or
more cluster-level (or datacenter-level) Ethernet
switches.
• This second-level switching domain can potentially
span more than ten thousand individual servers.
ARCHITECTURAL OVERVIEW OF WSCS
• In the case of a blade enclosure there is an
additional first level of networking
aggregation within the enclosure where
multiple processing blades connect to a small
number of networking blades through an I/O
bus such as PCIe.
ARCHITECTURAL OVERVIEW OF WSCS
• A 7-foot (213.36-cm) rack offers 48 U, so it’s not a
coincidence that the most popular switch for a
rack is a 48-port Ethernet switch.
• This product has become a commodity that costs
as little as $30 per port for a 1 Gbit/sec Ethernet
link in 2011.
• Note that the bandwidth within the rack is the
same for each server, so it does not matter where
the software places the sender and the receiver
as long as they are within the same rack.
ARCHITECTURAL OVERVIEW OF WSCS
• This flexibility is ideal from a software
perspective.
• These switches typically offer two to eight
uplinks, which leave the rack to go to the next
higher switch in the network hierarchy.
• Thus, the bandwidth leaving the rack is 6 to 24
times smaller—48/8 to 48/2—than the
bandwidth within the rack. This ratio is called
oversubscription.
• Uplink has 48 / n times lower bandwidth, where
n= # of uplink ports
ARCHITECTURAL OVERVIEW OF WSCS
• Alas, large oversubscription means
programmers must be aware of the
performance consequences when placing
senders and receivers in different racks.
• This increased software-scheduling burden is
another argument for network switches
designed specifically for the datacenter.
ARCHITECTURAL OVERVIEW OF WSCS
Picture of a row of servers in a Google WSC, 2012.
ARCHITECTURAL OVERVIEW OF WSCS
• Array Switch
• Switch that connects an array of racks.
• Array switch should have 10 X the bisection
bandwidth⌘ of a rack switch
• Cost of n-port switch grows as n2
• Often utilize content addressable memory
chips and FPGAs to support high-speed packet
inspection.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
• Previous Figures shows the latency, bandwidth, and
capacity of memory hierarchy inside a WSC, and also
shows the same data visually.
• Each server contains:
16 GBytes of memory with a 100-nanosecond access
time and transfers at 20 GBytes/sec and
2 terabytes of disk that offers a 10-millisecond access
time and transfers at 200 MBytes/sec.
• There are two sockets per board, and they share one
1 Gbit/sec Ethernet port.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
• Every pair of racks includes one rack switch and holds
80 2U servers.
• Networking software plus switch overhead increases
the latency to DRAM to 100 microseconds and the disk
access latency to 11 milliseconds.
• Thus, the total storage capacity of a rack is roughly 1
terabyte of DRAM and 160 terabytes of disk storage.
• The 1 Gbit/sec Ethernet limits the remote bandwidth
to DRAM or disk within the rack to 100 MBytes/sec.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
• The array switch can handle 30 racks, so storage
capacity of an array goes up by a factor of 30: 30
terabytes of DRAM and 4.8 petabytes of disk.
• The array switch hardware and software
increases latency to DRAM within an array to 500
microseconds and disk latency to 12 milliseconds.
• The bandwidth of the array switch limits the
remote bandwidth to either array DRAM or array
disk to 10 MBytes/sec.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
• Previous figures show that network overhead
dramatically increases latency from local
DRAM to rack DRA M and array DRAM, but
both still have more than 10 times better
latency than the local disk.
• The network collapses the difference in
bandwidth between rack DRAM and rack disk
and between array DRAM and array disk.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
• What is the average latency assuming that
90% of accesses are local to the server, 9% are
outside the server but local to the rack , and
1% are outside the rack but within the array?
• (90%x0.1)+(9%100)+(1%x300)=12.09 msec
ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
• How long does it take to transfer 1000MB between disks
within the server, between servers in the rack, and
between servers in different racks of an array?
• Within server: 1000/200=5 sec
• Within rack: 1000/100=10 sec
• Within array: 1000/10= 100 sec
ARCHITECTURAL OVERVIEW OF WSCS
ARCHITECTURAL OVERVIEW OF WSCS
ARCHITECTURAL OVERVIEW OF WSCS
• The WSC needs 20 arrays to reach 50,000
servers, so there is one more level of the
networking hierarchy.
• Next Figure shows the conventional Layer 3
routers to connect the arrays together and to
the Internet.
ARCHITECTURAL OVERVIEW OF WSCS
The Layer 3 network used to link arrays together and to the Internet
[Greenberg et al. 2009].
Some WSCs use a separate border router to connect the Internet to the
datacenter Layer 3 switches.
ARCHITECTURAL OVERVIEW OF WSCS
Sample three-stage fat tree topology.
ARCHITECTURAL OVERVIEW OF WSCS
• Another way to tackle network scalability is to
offload some traffic to a special-purpose
network.
• For example, if storage traffic is a big component
of overall traffic, we could build a separate
network to connect servers to storage units.
• If that traffic is more localized (not all servers
need to be attached to all storage units) we can
build smaller-scale networks, thus reducing costs.
ARCHITECTURAL OVERVIEW OF WSCS
• Historically, that’s how all storage was networked:
a SAN (storage area network) connected servers
to disks, typically using FibreChannel networks
rather than Ethernet.
• Today, Ethernet is becoming more common since
it offers comparable speeds, and protocols such
as FCoE (FibreChannel over Ethernet) and iSCSI
(SCSI over IP) allow Ethernet networks to
integrate well with traditional SANs.
ARCHITECTURAL OVERVIEW OF WSCS
• WSCs using VMs (or, more generally, task
migration) pose further challenges to
networks since connection endpoints (i.e., IP
address/port combinations) can move from
one physical machine to another.
• Typical networking hardware as well as
network management software doesn’t
anticipate such moves and in fact often
explicitly assume that they’re not possible.
ARCHITECTURAL OVERVIEW OF WSCS
• For example, network designs often assume that
all machines in a given rack have IP addresses in a
common subnet, which simplifies administration
and minimizes the number of required
forwarding table entries routing tables.
• More importantly, frequent migration makes it
impossible to manage the network manually--
programming network elements needs to be
automated, so the same cluster manager that
decides the placement of computations also
needs to update the network state.
ARCHITECTURAL OVERVIEW OF WSCS
• The Need of SDN
• The need for a programmable network has led
to much interest in OpenFlow
[http://www.openflow.org/] and software-
defined networking (SDN), which moves the
network control plane out of the individual
switches into a logically centralized controller.
ARCHITECTURAL OVERVIEW OF WSCS
ARCHITECTURAL OVERVIEW OF WSCS
• The Need of SDN
• Controlling a network from a logically centralized
server offers many advantages; in particular, common
networking algorithms such as computing reachability,
shortest paths, or max-flow traffic placement become
much simpler to solve, compared to their
implementation in current networks where each
individual router must solve the same problem while
dealing with limited visibility (direct neighbors only),
inconsistent network state (routers that are out of
synch with the current network state), and many
independent and concurrent actors (routers).
ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• Disk drives or Flash devices are connected
directly to each individual server and managed by
a global distributed file system (such as Google’s
GFS) or they can be part of Network Attached
Storage (NAS) devices directly connected to the
cluster-level switching fabric.
• A NAS tends to be a simpler solution to deploy
initially because it allows some of the data
management responsibilities to be outsourced to
a NAS appliance vendor.
ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• Keeping storage separate from computing nodes also
makes it easier to enforce quality of service guarantees
since the NAS runs no compute jobs besides the
storage server.
• In contrast, attaching disks directly to compute nodes
can reduce hardware costs (the disks leverage the
existing server enclosure) and improve networking
fabric utilization (each server network port is
effectively dynamically shared between the computing
tasks and the file system).
ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• The replication model between these two
approaches is also fundamentally different.
• A NAS tends to provide high availability through
replication or error correction capabilities within
each appliance, whereas systems like GFS
implement replication across different machines
and consequently will use more networking
bandwidth to complete write operations.
ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• However, GFS-like systems are able to keep data
available even after the loss of an entire server
enclosure or rack and may allow higher aggregate
read bandwidth because the same data can be
sourced from multiple replicas.
• Trading off higher write overheads for lower cost,
higher availability, and increased read bandwidth
was the right solution for many of Google’s early
workloads.
ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• An additional advantage of having disks co-
located with compute servers is that it enables
distributed system software to exploit data
locality.
• Given how networking performance has
outpaced disk performance for the last decades
such locality advantages are less useful for disks
but may remain beneficial to faster modern
storage devices such as those using Flash storage.
ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• NAND Flash technology has made Solid State Drives
(SSDs) affordable for a growing class of storage needs
in WSCs.
• While the cost per byte stored in SSDs will remain
much higher than in disks for the foreseeable future,
many Web services have I/O rates that cannot be easily
achieved with disk based systems.
• Since SSDs can deliver IO rates many orders of
magnitude higher than disks, they are increasingly
displacing disk drives as the repository of choice for
databases in Web services.
ARCHITECTURAL OVERVIEW OF WSCS
HDD interiors almost resemble a high-tech record player.
OCZ's Vector SSD is one of the fastest around
The OCZ RevoDrive Hybrid.
ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• Types of NAND Flash
• There are primarily two types of NAND Flash widely used
today, Single-Level Cell (SLC) and Multi-Level Cell (MLC).
NAND Flash stores data in a large array of cells.
• Each cell can store data — one bit for cell for SLC NAND,
and two bits per cell for MLC. So, SLC NAND would store a
“0” or “1” in each cell, and MLC NAND would store “00”,
“01”, “10”, or “11” in each cell.
• SLC and MLC NAND offer different levels of performance
and endurance characteristics at different price points, with
SLC being the higher performing and more costly of the
two.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• The data manipulated by WSC workloads tends to fall into
two categories:
• data that is private to individual running tasks and data that
is part of the shared state of the distributed workload.
• Private data tends to reside in local DRAM or disk, is rarely
replicated, and its management is simplified by virtue of its
single user semantics.
• In contrast, shared data must be much more durable and is
accessed by a large number of clients, and thus requires a
much more sophisticated distributed storage system.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• UNSTRUCTURED WSC STORAGE
• Google’s GFS is an example of a storage system with a
simple file-like abstraction (Google’s Colossus system has
since replaced GFS, but follows a similar architectural
philosophy so we choose to describe the better known GFS
here).
• GFS was designed to support the Web search indexing
system (the system that turned crawled Web pages into
index files for use in Web search), and therefore focuses on
high throughput for thousands of concurrent
readers/writers and robust performance under high
hardware failures rates.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• UNSTRUCTURED WSC STORAGE
• GFS users typically manipulate large quantities of
data, and thus GFS is further optimized for large
operations.
• The system architecture consists of a master,
which handles metadata operations, and
thousands of chunk server (slave) processes
running on every server with a disk drive, to
manage the data chunks on those drives.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• UNSTRUCTURED WSC STORAGE
• In GFS, fault tolerance is provided by replication
across machines instead of within them, as is the
case in RAID systems.
• Cross-machine replication allows the system to
tolerate machine and network failures and
enables fast recovery, since replicas for a given
disk or machine can be spread across thousands
of other machines.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• UNSTRUCTURED WSC STORAGE
• Although the initial version of GFS only
supported simple replication, today’s version
(Colossus) has added support for more space-
efficient Reed-Solomon codes, which tend to
reduce the space overhead of replication by
roughly a factor of two over simple replication
for the same level of availability.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• UNSTRUCTURED WSC STORAGE
• An important factor in maintaining high availability is distributing file
chunks across the whole cluster in such a way that a small number of
correlated failures is extremely unlikely to lead to data loss.
• GFS takes advantage of knowledge about the known possible correlated
fault scenarios and attempts to distribute replicas in a way that avoids
their co-location in a single fault domain.
• Wide distribution of chunks across disks over a whole cluster is also key for
speeding up recovery.
• Since replicas of chunks in a given disk are spread across possibly all
machines in a storage cluster, reconstruction of lost data chunks is
performed in parallel at high speed.
• Quick recovery is important since long recovery time windows leave
under-replicated chunks vulnerable to data loss should additional faults
hit the cluster.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• STRUCTURED WSC STORAGE
• The simple file abstraction of GFS and Colossus may suffice
for systems that manipulate large blobs of data, but
application developers also need the WSC equivalent of
database-like functionality, where data sets can be
structured and indexed for easy small updates or complex
queries.
• Blobs (binary large object, basic large object, BLOB, or
BLOb) is a collection of binary data stored as a single entity
in a database management system. Blobs are typically
images, audio or other multimedia objects, though
sometimes binary executable code is stored as a blob.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• STRUCTURED WSC STORAGE
• Structured distributed storage systems such as Google’s BigTable
and Amazon’s Dynamo were designed to fulfill those needs.
• Compared to traditional database systems, BigTable and Dynamo
sacrifice some features, such as richness of schema representation
and strong consistency, in favor of higher performance and
availability at massive scales.
• BigTable, for example, presents a simple multi-dimensional sorted
map consisting of row keys (strings) associated with multiple values
organized in columns, forming a distributed sparse table space.
Column values are associated with timestamps in order to support
versioning and time-series.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• STRUCTURED WSC STORAGE
• The choice of eventual consistency in BigTable and Dynamo shifts
the burden of resolving temporary inconsistencies to the
applications using these systems.
• A number of application developers within Google have found it
inconvenient to deal with weak consistency models and the
limitations of the simple data schemes in BigTable.
• Second-generation structured storage systems such as MegaStore
and subsequently Spanner have been designed to address such
• concerns.
• Both MegaStore and Spanner provide richer schemas and SQL-like
functionality while providing simpler, stronger consistency models.
ARCHITECTURAL OVERVIEW OF WSCS
Weak Consistency
• The protocol is said to support weak
consistency if:
• All accesses to synchronization
variables are seen by all processes (or
nodes, processors) in the same order
(sequentially) - these are
synchronization operations.
• Accesses to critical sections are seen
sequentially.
• All other accesses may be seen in
different order on different processes
(or nodes, processors).
• The set of both read and write
operations in between different
synchronization operations is the same
in each process.
Strong Consistency
• The protocol is said to support
strong consistency if:
• All accesses are seen by all
parallel processes (or nodes,
processors etc.) in the same
order (sequentially)
• Therefore only one consistent
state can be observed, as
opposed to weak consistency,
where different parallel
processes (or nodes etc.) can
perceive variables in different
states.
ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• INTERPLAY OF STORAGE AND NETWORKING TECHNOLOGY
• The success of WSC distributed storage systems can be
partially attributed to the evolution of datacenter
networking fabrics.
• The observe that the gap between networking and disk
performance has widened to the point that disk locality is
no longer relevant in intra-datacenter computations.
• This observation enables dramatic simplifications in the
design of distributed disk-based storage systems as well as
utilization improvements since any disk byte in a WSC
facility can in principle be utilized by any task regardless of
their relative locality.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER TIER CLASSIFICATIONS AND
SPECIFICATIONS
• The design of a datacenter is often classified as
belonging to “Tier I–IV”.
• The Uptime Institute, a professional services
organization specializing in datacenters, and the
Telecommunications Industry Association (TIA), an
industry group accredited by ANSI and made up of
approximately 400 member companies, both advocate
a 4-tier classification loosely based on the power
distribution, uninterruptible power supply (UPS),
cooling delivery and redundancy of the datacenter.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER TIER CLASSIFICATIONS AND SPECIFICATIONS
• Tier I datacenters have a single path for power distribution, UPS,
and cooling distribution, without redundant components.
• Tier II adds redundant components to this design (N + 1), improving
availability.
• Tier III datacenters have one active and one alternate distribution
path for utilities. Each path has redundant components and are
concurrently maintainable, that is, they provide redundancy even
during maintenance.
• Tier IV datacenters have two simultaneously active power and
cooling distribution paths, redundant components in each path, and
are supposed to tolerate any single equipment failure without
impacting the load.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER TIER CLASSIFICATIONS AND SPECIFICATIONS
• The Uptime Institute’s specification is generally
performance-based (with notable exceptions for the
amount of backup diesel fuel, water storage, and ASHRAE
temperature design points ).
• The specification describes topology rather than
prescribing a specific list of components to meet the
requirements, so there are many architectures that can
achieve a given tier classification.
• In contrast, TIA-942 is very prescriptive and specifies a
variety of implementation details such as building
construction, ceiling height, voltage levels, types of racks,
and patch cord labeling, for example.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER TIER CLASSIFICATIONS AND
SPECIFICATIONS
• Formally achieving tier classification certification is
difficult and requires a full review from one of the
granting bodies, and most datacenters are not formally
rated.
• Most commercial datacenters fall somewhere between
tiers III and IV, choosing a balance between
construction cost and reliability.
• Generally, the lowest of the individual subsystem
ratings (cooling, power, etc.) determines the overall
tier classification of the datacenter.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER TIER CLASSIFICATIONS AND
SPECIFICATIONS
• Real-world datacenter reliability is strongly influenced
by the quality of the organization running the
datacenter, not just by the design.
• The Uptime Institute reports that over 70% of
datacenter outages are the result of human error,
including management decisions on staffing,
maintenance, and training.
• Theoretical availability estimates used in the industry
range from 99.7% for tier II datacenters to 99.98% and
99.995% for tiers III and IV, respectively.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER ENERGY EFFICIENCY
• The broadest definition of WSC energy efficiency would
measure the energy used to run a particular workload
(say, to sort a petabyte of data).
• Unfortunately, no two companies run the same
workload and real-world application mixes change all
the time so it is hard to benchmark real-world WSCs
this way.
• Thus, even though such benchmarks have been
contemplated as far back as 2008 they haven’t yet
been found and we doubt they ever will.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER ENERGY EFFICIENCY
• However, it is useful to view energy efficiency
as the product of three factors we can
independently measure and optimize:
• The first term (a) measures facility efficiency,
the second server power conversion efficiency,
and the third measures the server’s
architectural efficiency.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER ENERGY EFFICIENCY
• THE PUE METRIC
• Power usage effectiveness (PUE) reflects the
quality of the datacenter building infrastructure
itself, and captures the ratio of total building
power to IT power (the power consumed by the
actual computing and network equipment, etc.).
(Sometimes IT power is also referred to as
“critical power.”)
• PUE = (Facility power) / (IT Equipment power)
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER ENERGY EFFICIENCY
• THE PUE METRIC
• PUE has gained a lot of traction as a datacenter
efficiency metric since widespread reporting
started around 2009.
• We can easily measure PUE by adding electrical
meters to the lines powering the various parts of
a datacenter, thus determining how much power
is used by chillers or a UPS.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER ENERGY EFFICIENCY
• THE PUE METRIC
• Historically, the PUE for the average
datacenter has been embarrassingly poor.
• According to a 2006 study, 85% of current
datacenters were estimated to have a PUE of
greater than 3.0.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER ENERGY EFFICIENCY
• THE PUE METRIC
• In other words, the building’s mechanical and electrical
systems consumed twice as much power as the actual
computing load! Only 5% had a PUE of 2.0 or better.
• A subsequent EPA survey of over 100 datacenters
reported an average PUE value of 1.91, and a 2012
Uptime Institute survey of over 1100 datacenters
covering a range of geographies and datacenter sizes
reported an average PUE value between 1.8 and 1.89.
ARCHITECTURAL OVERVIEW OF WSCS
Uptime Institute survey of PUE for 1100+ datacenters.
ARCHITECTURAL OVERVIEW OF WSCS
• SOURCES OF EFFICIENCY LOSSES IN
DATACENTERS
• For illustration, let us walk through the losses
in a typical datacenter.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER E
• The second term (b) accounts for overheads inside
servers or other IT equipment using a metric analogous
to PUE, server PUE (SPUE).
• SPUE consists of the ratio of total server input power to
its useful power, where useful power includes only the
power consumed by the electronic components
directly involved in the computation: motherboard,
disks, CPUs, DRAM, I/O cards, and so on.
• Substantial amounts of power may be lost in the
server’s power supply, voltage regulator modules
(VRMs), and cooling fans.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER E
• The product of PUE and SPUE constitutes an
accurate assessment of the end-to-end
electromechanical efficiency of a WSC. Such a
true (or total) PUE metric (TPUE), defined as
PUE.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER E
• MEASURING ENERGY EFFICIENCY
• Similarly, server-level benchmarks such as Joulesort and
SPECpower characterize other aspects of computing
efficiency.
• Joulesort measures the total system energy to perform an
out-of-core sort and derives a metric that enables the
comparison of systems ranging from embedded devices to
supercomputers.
• SPECpower focuses on server-class systems and computes
the performance-to-power ratio of a system running a
typical business application on an enterprise Java platform.
ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER E
• MEASURING ENERGY EFFICIENCY
• Two separate benchmarking efforts aim to
characterize the efficiency of storage systems: the
Emerald Program by the Storage Networking
Industry Association (SNIA) and the SPC-2/E by
the Storage Performance Council.
• Both benchmarks measure storage servers under
different kinds of request activity and report
ratios of transaction throughput per Watt.
ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
• To better understand the potential impact of energy-
related optimizations, let us examine the total cost of
ownership (TCO) of a datacenter.
• At the top level, costs split up into capital expenses
(Capex) and operational expenses (Opex).
• Capex refers to investments that must be made
upfront and that are then depreciated over a certain
time frame; examples are the construction cost of a
datacenter or the purchase price of a server.
ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
• Opex refers to the recurring monthly costs of
actually running the equipment, excluding
depreciation: electricity costs, repairs and
maintenance, salaries of on-site personnel,
and so on.
• Thus, we have:
TCO = datacenter depreciation + datacenter Opex + server
depreciation + server Opex
ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
• The monthly depreciation cost (or amortization
cost) that results from the initial construction
expense depends on the duration over which the
investment is amortized (which is related to its
expected lifetime) and the assumed interest rate.
• Typically, datacenters are depreciated over
periods of 10–15 years.
• Under U.S. accounting rules, it is common to use
straight-line depreciation where the value of the
asset declines by a fixed amount each month.
ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
• For example, if we depreciate a $12/W
datacenter over 12 years, the depreciation cost is
$0.08/W per month.
• If we had to take out a loan to finance
construction at an interest rate of 8%, the
associated monthly interest payments add an
additional cost of $0.05/W, for a total of $0.13/W
per month.
• Typical interest rates vary over time, but many
companies will pay interest in the 7–12% range.
ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
To put the cost of energy into
perspective, Hamilton did a case
study to estimate the costs of a WSC.
He determined that the CAPEX of this
8 MW facility was $88M, and
that the roughly 46,000 servers and
corresponding networking
equipment added another
$79M to the CAPEX for the WSC.
ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
•We can now price the total cost of energy, since U.S . accounting rules allow us to
convert CAPEX into OPEX.
•We can just amortize CAPEX as a fixed amount each month for the effective life of the
equipment.
•Note that the amortization rates differ significantly, from 10 years for the facility to 4
years for the networking equipment and 3 years for the servers.
•Hence, the WSC facility lasts a decade, but you need to replace the servers every 3
years and the networking equipment every 4 years.
•By amortizing the CAPEX, Hamilton came up with a monthly OPEX, including accounting
for the cost of borrowing money (5% annually) to pay for the WSC.
•At $3.8M, the monthly OPEX is about 2% of the CAPEX.
ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Since many companies with WSCs are competing vigorously
in the marketplace, up until recently, they have been
reluctant to share their latest innovations with the public
(and each other).
• In 2009, Google described a state-of-the-art WSC as of
2005.
• Google graciously provided an update of the 2007 status of
their WS C, making this section the most up-to-date
description of a Google WS C.
• Even more recently, Facebook described their latest
datacenter as part of
• http://opencompute.org.
ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
• Both Google and Microsoft have built WSCs using shipping
containers.
• The idea of building a WSC from containers is to make WSC
design modular.
• Each container is independent, and the only external
connections are networking, power, and water.
• The containers in turn supply networking, power, and
cooling to the servers placed inside them, so the job of the
WSC is to supply networking, power, and cold water to the
containers and to pump the resulting warm water to
external cooling towers and chillers.
ARCHITECTURAL OVERVIEW OF WSCS
ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
• Diagram is a cutaway drawing of a Google container.
• A container holds up to 1160 servers, so 45 containers
have space for 52,200 servers. (This WSC has about
40,000 servers.)
• The servers are stacked 20 high in racks that form two
long rows of 29 racks (also called bays) each, with one
row on each side of the container.
• The rack switches are 48-port, 1 Gbit/sec Ethernet
switches, which are placed in every other rack.
ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
• The Google WSC that we are looking at contains 45 40-
foot-long containers in a 300- foot by 250-foot space,
or 75,000 square feet (about 7000 square meters).
• To fit in the warehouse, 30 of the containers are
stacked two high, or 15 pairs of stacked containers.
• Although the location was not revealed, it was built at
the time that Google developed WSCs in The Dallas,
Oregon, which provides a moderate climate and is near
cheap hydroelectric power and Internet backbone
fiber.
ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
• This WSC offers 10 megawatts with a PUE of 1.23 over the prior 12
months.
• Of that 0.230 of PUE overhead, 85% goes to cooling losses (0.195
PUE) and 15% (0.035) goes to power losses.
• The system went live in November 2005, and this section describes
its state as of 2007.
• A Google container can handle up to 250 kilowatts. That means the
container can handle 780 watts per square foot (0.09 square
meters), or 133 watts per square foot across the entire 75,000-
square-foot space with 40 containers.
• However, the containers in this WSC average just 222 kilowatts
ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
• Servers In A Google WSC
• The server in Figure 6.21 has two sockets, each containing a
dual-core AMD Opteron processor running at 2.2 GHz. The
photo shows eight DIMMS, and these servers are typically
deployed with 8 GB of DDR2 DRA M.
• A novel feature is that the memory bus is down clocked to
533 MHz from the standard 666 MHz since the slower bus
has little impact on performance but a significant impact on
power.
• The baseline design has a single network interface card
(NIC) for a 1 Gbit/sec Ethernet link.
ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
• Servers In A Google WSC
• Although the photo in Figure 6.21 shows two SATA disk drives, the
baseline server has just one.
• The peak power of the baseline is about 160 watts, and idle power is 85
watts.
• This baseline node is supplemented to offer a storage (or “diskfull”) node.
• First, a second tray containing 10 S ATA disks is connected to the server.
• To get one more disk, a second disk is placed into the empty spot on the
motherboard, giving the storage node 12 S ATA disks.
• Finally, since a storage node could saturate a single 1 Gbit/sec Ethernet
link, a second Ethernet NIC was added.
• Peak power for a storage node is about 300 watts, and it idles at 198
watts.

More Related Content

What's hot

What's hot (20)

NUMA
NUMANUMA
NUMA
 
Vlsi testing
Vlsi testingVlsi testing
Vlsi testing
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory Multiprocessors
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Physical design
Physical design Physical design
Physical design
 
Vliw
VliwVliw
Vliw
 
ASIC design verification
ASIC design verificationASIC design verification
ASIC design verification
 
Amba bus
Amba busAmba bus
Amba bus
 
sigma delta converters
sigma delta converterssigma delta converters
sigma delta converters
 
System On Chip
System On ChipSystem On Chip
System On Chip
 
Code division duplexing
Code division duplexingCode division duplexing
Code division duplexing
 
Mac protocols for ad hoc wireless networks
Mac protocols for ad hoc wireless networks Mac protocols for ad hoc wireless networks
Mac protocols for ad hoc wireless networks
 
GSM channels
GSM channelsGSM channels
GSM channels
 
Metastability
MetastabilityMetastability
Metastability
 
Topic : X.25, Frame relay and ATM
Topic :  X.25, Frame relay and ATMTopic :  X.25, Frame relay and ATM
Topic : X.25, Frame relay and ATM
 
Fpga architectures and applications
Fpga architectures and applicationsFpga architectures and applications
Fpga architectures and applications
 
FPGA
FPGAFPGA
FPGA
 
Vx works RTOS
Vx works RTOSVx works RTOS
Vx works RTOS
 
Satellite data network communication
Satellite data network communicationSatellite data network communication
Satellite data network communication
 
Zynq architecture
Zynq architectureZynq architecture
Zynq architecture
 

Similar to Warehouse scale computer

Elastistore flexible elastic buffering for virtual-channel-based networks on...
Elastistore  flexible elastic buffering for virtual-channel-based networks on...Elastistore  flexible elastic buffering for virtual-channel-based networks on...
Elastistore flexible elastic buffering for virtual-channel-based networks on...I3E Technologies
 
SAN overview.pptx
SAN overview.pptxSAN overview.pptx
SAN overview.pptxMugabo4
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RSimon Huang
 
Data Center for Cloud Computing - DC3X
Data Center for Cloud Computing - DC3XData Center for Cloud Computing - DC3X
Data Center for Cloud Computing - DC3XRenaud Blanchette
 
Why new hardware may not make Oracle databases faster
Why new hardware may not make Oracle databases fasterWhy new hardware may not make Oracle databases faster
Why new hardware may not make Oracle databases fasterSolarWinds
 
Madge LANswitch 3LS Application Guide
Madge LANswitch 3LS Application GuideMadge LANswitch 3LS Application Guide
Madge LANswitch 3LS Application GuideRonald Bartels
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
Autonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with CassandraAutonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with CassandraEmiliano
 
Storage networks
Storage networksStorage networks
Storage networksAhmed Nour
 
Exploration lan switching_chapter1
Exploration lan switching_chapter1Exploration lan switching_chapter1
Exploration lan switching_chapter1nixon
 
Technology Brief: Flexible Blade Server IO
Technology Brief: Flexible Blade Server IOTechnology Brief: Flexible Blade Server IO
Technology Brief: Flexible Blade Server IOIT Brand Pulse
 
CloudSmartz Layer 2 Direct Connect [Factsheet] | Smarter Transformation
CloudSmartz Layer 2 Direct Connect [Factsheet] | Smarter TransformationCloudSmartz Layer 2 Direct Connect [Factsheet] | Smarter Transformation
CloudSmartz Layer 2 Direct Connect [Factsheet] | Smarter TransformationCloudSmartz
 
Lecture notes - Data Centers________.pptx
Lecture notes - Data Centers________.pptxLecture notes - Data Centers________.pptx
Lecture notes - Data Centers________.pptxSandeepGupta229023
 
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...OpenStack Korea Community
 

Similar to Warehouse scale computer (20)

Deco1
Deco1Deco1
Deco1
 
Elastistore flexible elastic buffering for virtual-channel-based networks on...
Elastistore  flexible elastic buffering for virtual-channel-based networks on...Elastistore  flexible elastic buffering for virtual-channel-based networks on...
Elastistore flexible elastic buffering for virtual-channel-based networks on...
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Challenges in Managing IT Infrastructure
Challenges in Managing IT InfrastructureChallenges in Managing IT Infrastructure
Challenges in Managing IT Infrastructure
 
SAN overview.pptx
SAN overview.pptxSAN overview.pptx
SAN overview.pptx
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3R
 
Data Center for Cloud Computing - DC3X
Data Center for Cloud Computing - DC3XData Center for Cloud Computing - DC3X
Data Center for Cloud Computing - DC3X
 
Why new hardware may not make Oracle databases faster
Why new hardware may not make Oracle databases fasterWhy new hardware may not make Oracle databases faster
Why new hardware may not make Oracle databases faster
 
Madge LANswitch 3LS Application Guide
Madge LANswitch 3LS Application GuideMadge LANswitch 3LS Application Guide
Madge LANswitch 3LS Application Guide
 
cluster computing
cluster computingcluster computing
cluster computing
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Autonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with CassandraAutonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with Cassandra
 
Storage networks
Storage networksStorage networks
Storage networks
 
Exploration lan switching_chapter1
Exploration lan switching_chapter1Exploration lan switching_chapter1
Exploration lan switching_chapter1
 
What is 3d torus
What is 3d torusWhat is 3d torus
What is 3d torus
 
Technology Brief: Flexible Blade Server IO
Technology Brief: Flexible Blade Server IOTechnology Brief: Flexible Blade Server IO
Technology Brief: Flexible Blade Server IO
 
CloudSmartz Layer 2 Direct Connect [Factsheet] | Smarter Transformation
CloudSmartz Layer 2 Direct Connect [Factsheet] | Smarter TransformationCloudSmartz Layer 2 Direct Connect [Factsheet] | Smarter Transformation
CloudSmartz Layer 2 Direct Connect [Factsheet] | Smarter Transformation
 
Lecture notes - Data Centers________.pptx
Lecture notes - Data Centers________.pptxLecture notes - Data Centers________.pptx
Lecture notes - Data Centers________.pptx
 
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
 
Storage area network
Storage area networkStorage area network
Storage area network
 

More from Hassan A-j

IOS Swift Language 4th tutorial
IOS Swift Language 4th tutorialIOS Swift Language 4th tutorial
IOS Swift Language 4th tutorialHassan A-j
 
IOS Swift Language 3rd tutorial
IOS Swift Language 3rd tutorialIOS Swift Language 3rd tutorial
IOS Swift Language 3rd tutorialHassan A-j
 
IOS Swift language 2nd tutorial
IOS Swift language 2nd tutorialIOS Swift language 2nd tutorial
IOS Swift language 2nd tutorialHassan A-j
 
IOS Swift language 1st Tutorial
IOS Swift language 1st TutorialIOS Swift language 1st Tutorial
IOS Swift language 1st TutorialHassan A-j
 
Software Process Models
Software Process ModelsSoftware Process Models
Software Process ModelsHassan A-j
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduceHassan A-j
 

More from Hassan A-j (6)

IOS Swift Language 4th tutorial
IOS Swift Language 4th tutorialIOS Swift Language 4th tutorial
IOS Swift Language 4th tutorial
 
IOS Swift Language 3rd tutorial
IOS Swift Language 3rd tutorialIOS Swift Language 3rd tutorial
IOS Swift Language 3rd tutorial
 
IOS Swift language 2nd tutorial
IOS Swift language 2nd tutorialIOS Swift language 2nd tutorial
IOS Swift language 2nd tutorial
 
IOS Swift language 1st Tutorial
IOS Swift language 1st TutorialIOS Swift language 1st Tutorial
IOS Swift language 1st Tutorial
 
Software Process Models
Software Process ModelsSoftware Process Models
Software Process Models
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 

Recently uploaded

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 

Recently uploaded (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 

Warehouse scale computer

  • 2. Introduction • Had scale been the only distinguishing feature of these systems we might simply refer to them as datacenters. • Datacenters are buildings where multiple servers and communication gear are co-located because of their common environmental requirements and physical security needs, and for ease of maintenance. • In that sense, a WSC is a type of datacenter.
  • 3. Introduction • Traditional datacenters, however, typically host a large number of relatively small- or medium-sized applications, each running on a dedicated hardware infrastructure that is de-coupled and protected from other systems in the same facility. • Those datacenters host hardware and software for multiple organizational units or even different companies. • Different computing systems within such a datacenter often have little in common in terms of hardware, software, or maintenance infrastructure, and tend not to communicate with each other at all.
  • 4. Introduction • WSCs currently power the services offered by companies such as Google, Amazon, Facebook, and Microsoft’s online services division. • They differ significantly from traditional datacenters: 1. They belong to a single organization. 2. Use a relatively homogeneous hardware and system software platform 3. Share a common systems management layer. • Often, much of the application, middleware, and system software is built in-house compared to the predominance of third-party software running in conventional datacenters.
  • 5. Introduction • Most importantly, WSCs run a smaller number of very large applications (or Internet services), and the common resource management infrastructure allows significant deployment flexibility. • The requirements of homogeneity, single- organization control, and enhanced focus on cost efficiency motivate designers to take new approaches in constructing and operating these systems.
  • 6. Introduction • Internet services must achieve high availability, typically aiming for at least 99.99% uptime (“four nines”, about an hour of downtime per year). • Achieving fault-free operation on a large collection of hardware and system software is hard and is made more difficult by the large number of servers involved.
  • 7. Introduction • Although it might be theoretically possible to prevent hardware failures in a collection of 10,000 servers, it would surely be extremely expensive. • Consequently, WSC workloads must be designed to gracefully tolerate large numbers of component faults with little or no impact on service level performance and availability.
  • 8. ARCHITECTURAL OVERVIEW OF WSCS • The hardware implementation of a WSC will differ significantly from one installation to the next. • Even within a single organization such as Google, systems deployed in different years use different basic elements, reflecting the hardware improvements provided by the industry. • However, the architectural organization of these systems has been relatively stable over the last few years. • Therefore, it is useful to describe this general architecture at a high level as it sets the background for subsequent discussions.
  • 9. ARCHITECTURAL OVERVIEW OF WSCS • Being satisfied with neither the metric nor the US system, rack designers use “rack units” to measure the height of servers. • 1U is 1.75 inches or 44.45 mm; a typical rack is 42U high. • The 19-inch (48.26-cm) rack is still the standard framework to hold servers, despite this standard going back to railroad hardware from the 1930s.
  • 10. ARCHITECTURAL OVERVIEW OF WSCS Sketch of the typical elements in warehouse-scale systems: 1U server (left), 7’ rack with Ethernet switch (middle), and diagram of a small cluster with a cluster-level Ethernet switch/ router (right).
  • 11. ARCHITECTURAL OVERVIEW OF WSCS • Previous Figure depicts the high-level building blocks for WSCs. • A set of low-end servers, typically in a 1U or blade enclosure format, are mounted within a rack and interconnected using a local Ethernet switch. • These rack-level switches, which can use 1- or 10-Gbps links, have a number of uplink connections to one or more cluster-level (or datacenter-level) Ethernet switches. • This second-level switching domain can potentially span more than ten thousand individual servers.
  • 12. ARCHITECTURAL OVERVIEW OF WSCS • In the case of a blade enclosure there is an additional first level of networking aggregation within the enclosure where multiple processing blades connect to a small number of networking blades through an I/O bus such as PCIe.
  • 13. ARCHITECTURAL OVERVIEW OF WSCS • A 7-foot (213.36-cm) rack offers 48 U, so it’s not a coincidence that the most popular switch for a rack is a 48-port Ethernet switch. • This product has become a commodity that costs as little as $30 per port for a 1 Gbit/sec Ethernet link in 2011. • Note that the bandwidth within the rack is the same for each server, so it does not matter where the software places the sender and the receiver as long as they are within the same rack.
  • 14. ARCHITECTURAL OVERVIEW OF WSCS • This flexibility is ideal from a software perspective. • These switches typically offer two to eight uplinks, which leave the rack to go to the next higher switch in the network hierarchy. • Thus, the bandwidth leaving the rack is 6 to 24 times smaller—48/8 to 48/2—than the bandwidth within the rack. This ratio is called oversubscription. • Uplink has 48 / n times lower bandwidth, where n= # of uplink ports
  • 15. ARCHITECTURAL OVERVIEW OF WSCS • Alas, large oversubscription means programmers must be aware of the performance consequences when placing senders and receivers in different racks. • This increased software-scheduling burden is another argument for network switches designed specifically for the datacenter.
  • 16. ARCHITECTURAL OVERVIEW OF WSCS Picture of a row of servers in a Google WSC, 2012.
  • 17. ARCHITECTURAL OVERVIEW OF WSCS • Array Switch • Switch that connects an array of racks. • Array switch should have 10 X the bisection bandwidth⌘ of a rack switch • Cost of n-port switch grows as n2 • Often utilize content addressable memory chips and FPGAs to support high-speed packet inspection.
  • 18. ARCHITECTURAL OVERVIEW OF WSCS • WSC Memory Hierarchy
  • 19. ARCHITECTURAL OVERVIEW OF WSCS • WSC Memory Hierarchy
  • 20. ARCHITECTURAL OVERVIEW OF WSCS • WSC Memory Hierarchy • Previous Figures shows the latency, bandwidth, and capacity of memory hierarchy inside a WSC, and also shows the same data visually. • Each server contains: 16 GBytes of memory with a 100-nanosecond access time and transfers at 20 GBytes/sec and 2 terabytes of disk that offers a 10-millisecond access time and transfers at 200 MBytes/sec. • There are two sockets per board, and they share one 1 Gbit/sec Ethernet port.
  • 21. ARCHITECTURAL OVERVIEW OF WSCS • WSC Memory Hierarchy • Every pair of racks includes one rack switch and holds 80 2U servers. • Networking software plus switch overhead increases the latency to DRAM to 100 microseconds and the disk access latency to 11 milliseconds. • Thus, the total storage capacity of a rack is roughly 1 terabyte of DRAM and 160 terabytes of disk storage. • The 1 Gbit/sec Ethernet limits the remote bandwidth to DRAM or disk within the rack to 100 MBytes/sec.
  • 22. ARCHITECTURAL OVERVIEW OF WSCS • WSC Memory Hierarchy • The array switch can handle 30 racks, so storage capacity of an array goes up by a factor of 30: 30 terabytes of DRAM and 4.8 petabytes of disk. • The array switch hardware and software increases latency to DRAM within an array to 500 microseconds and disk latency to 12 milliseconds. • The bandwidth of the array switch limits the remote bandwidth to either array DRAM or array disk to 10 MBytes/sec.
  • 23. ARCHITECTURAL OVERVIEW OF WSCS • WSC Memory Hierarchy • Previous figures show that network overhead dramatically increases latency from local DRAM to rack DRA M and array DRAM, but both still have more than 10 times better latency than the local disk. • The network collapses the difference in bandwidth between rack DRAM and rack disk and between array DRAM and array disk.
  • 24. ARCHITECTURAL OVERVIEW OF WSCS • WSC Memory Hierarchy • What is the average latency assuming that 90% of accesses are local to the server, 9% are outside the server but local to the rack , and 1% are outside the rack but within the array? • (90%x0.1)+(9%100)+(1%x300)=12.09 msec
  • 25. ARCHITECTURAL OVERVIEW OF WSCS • WSC Memory Hierarchy • How long does it take to transfer 1000MB between disks within the server, between servers in the rack, and between servers in different racks of an array? • Within server: 1000/200=5 sec • Within rack: 1000/100=10 sec • Within array: 1000/10= 100 sec
  • 28. ARCHITECTURAL OVERVIEW OF WSCS • The WSC needs 20 arrays to reach 50,000 servers, so there is one more level of the networking hierarchy. • Next Figure shows the conventional Layer 3 routers to connect the arrays together and to the Internet.
  • 29. ARCHITECTURAL OVERVIEW OF WSCS The Layer 3 network used to link arrays together and to the Internet [Greenberg et al. 2009]. Some WSCs use a separate border router to connect the Internet to the datacenter Layer 3 switches.
  • 30. ARCHITECTURAL OVERVIEW OF WSCS Sample three-stage fat tree topology.
  • 31. ARCHITECTURAL OVERVIEW OF WSCS • Another way to tackle network scalability is to offload some traffic to a special-purpose network. • For example, if storage traffic is a big component of overall traffic, we could build a separate network to connect servers to storage units. • If that traffic is more localized (not all servers need to be attached to all storage units) we can build smaller-scale networks, thus reducing costs.
  • 32. ARCHITECTURAL OVERVIEW OF WSCS • Historically, that’s how all storage was networked: a SAN (storage area network) connected servers to disks, typically using FibreChannel networks rather than Ethernet. • Today, Ethernet is becoming more common since it offers comparable speeds, and protocols such as FCoE (FibreChannel over Ethernet) and iSCSI (SCSI over IP) allow Ethernet networks to integrate well with traditional SANs.
  • 33. ARCHITECTURAL OVERVIEW OF WSCS • WSCs using VMs (or, more generally, task migration) pose further challenges to networks since connection endpoints (i.e., IP address/port combinations) can move from one physical machine to another. • Typical networking hardware as well as network management software doesn’t anticipate such moves and in fact often explicitly assume that they’re not possible.
  • 34. ARCHITECTURAL OVERVIEW OF WSCS • For example, network designs often assume that all machines in a given rack have IP addresses in a common subnet, which simplifies administration and minimizes the number of required forwarding table entries routing tables. • More importantly, frequent migration makes it impossible to manage the network manually-- programming network elements needs to be automated, so the same cluster manager that decides the placement of computations also needs to update the network state.
  • 35. ARCHITECTURAL OVERVIEW OF WSCS • The Need of SDN • The need for a programmable network has led to much interest in OpenFlow [http://www.openflow.org/] and software- defined networking (SDN), which moves the network control plane out of the individual switches into a logically centralized controller.
  • 37. ARCHITECTURAL OVERVIEW OF WSCS • The Need of SDN • Controlling a network from a logically centralized server offers many advantages; in particular, common networking algorithms such as computing reachability, shortest paths, or max-flow traffic placement become much simpler to solve, compared to their implementation in current networks where each individual router must solve the same problem while dealing with limited visibility (direct neighbors only), inconsistent network state (routers that are out of synch with the current network state), and many independent and concurrent actors (routers).
  • 38. ARCHITECTURAL OVERVIEW OF WSCS • STORAGE • Disk drives or Flash devices are connected directly to each individual server and managed by a global distributed file system (such as Google’s GFS) or they can be part of Network Attached Storage (NAS) devices directly connected to the cluster-level switching fabric. • A NAS tends to be a simpler solution to deploy initially because it allows some of the data management responsibilities to be outsourced to a NAS appliance vendor.
  • 39. ARCHITECTURAL OVERVIEW OF WSCS • STORAGE • Keeping storage separate from computing nodes also makes it easier to enforce quality of service guarantees since the NAS runs no compute jobs besides the storage server. • In contrast, attaching disks directly to compute nodes can reduce hardware costs (the disks leverage the existing server enclosure) and improve networking fabric utilization (each server network port is effectively dynamically shared between the computing tasks and the file system).
  • 40. ARCHITECTURAL OVERVIEW OF WSCS • STORAGE • The replication model between these two approaches is also fundamentally different. • A NAS tends to provide high availability through replication or error correction capabilities within each appliance, whereas systems like GFS implement replication across different machines and consequently will use more networking bandwidth to complete write operations.
  • 41. ARCHITECTURAL OVERVIEW OF WSCS • STORAGE • However, GFS-like systems are able to keep data available even after the loss of an entire server enclosure or rack and may allow higher aggregate read bandwidth because the same data can be sourced from multiple replicas. • Trading off higher write overheads for lower cost, higher availability, and increased read bandwidth was the right solution for many of Google’s early workloads.
  • 42. ARCHITECTURAL OVERVIEW OF WSCS • STORAGE • An additional advantage of having disks co- located with compute servers is that it enables distributed system software to exploit data locality. • Given how networking performance has outpaced disk performance for the last decades such locality advantages are less useful for disks but may remain beneficial to faster modern storage devices such as those using Flash storage.
  • 43. ARCHITECTURAL OVERVIEW OF WSCS • STORAGE • NAND Flash technology has made Solid State Drives (SSDs) affordable for a growing class of storage needs in WSCs. • While the cost per byte stored in SSDs will remain much higher than in disks for the foreseeable future, many Web services have I/O rates that cannot be easily achieved with disk based systems. • Since SSDs can deliver IO rates many orders of magnitude higher than disks, they are increasingly displacing disk drives as the repository of choice for databases in Web services.
  • 44. ARCHITECTURAL OVERVIEW OF WSCS HDD interiors almost resemble a high-tech record player. OCZ's Vector SSD is one of the fastest around The OCZ RevoDrive Hybrid.
  • 45. ARCHITECTURAL OVERVIEW OF WSCS • STORAGE • Types of NAND Flash • There are primarily two types of NAND Flash widely used today, Single-Level Cell (SLC) and Multi-Level Cell (MLC). NAND Flash stores data in a large array of cells. • Each cell can store data — one bit for cell for SLC NAND, and two bits per cell for MLC. So, SLC NAND would store a “0” or “1” in each cell, and MLC NAND would store “00”, “01”, “10”, or “11” in each cell. • SLC and MLC NAND offer different levels of performance and endurance characteristics at different price points, with SLC being the higher performing and more costly of the two.
  • 46. ARCHITECTURAL OVERVIEW OF WSCS • WSC STORAGE • The data manipulated by WSC workloads tends to fall into two categories: • data that is private to individual running tasks and data that is part of the shared state of the distributed workload. • Private data tends to reside in local DRAM or disk, is rarely replicated, and its management is simplified by virtue of its single user semantics. • In contrast, shared data must be much more durable and is accessed by a large number of clients, and thus requires a much more sophisticated distributed storage system.
  • 47. ARCHITECTURAL OVERVIEW OF WSCS • WSC STORAGE • UNSTRUCTURED WSC STORAGE • Google’s GFS is an example of a storage system with a simple file-like abstraction (Google’s Colossus system has since replaced GFS, but follows a similar architectural philosophy so we choose to describe the better known GFS here). • GFS was designed to support the Web search indexing system (the system that turned crawled Web pages into index files for use in Web search), and therefore focuses on high throughput for thousands of concurrent readers/writers and robust performance under high hardware failures rates.
  • 48. ARCHITECTURAL OVERVIEW OF WSCS • WSC STORAGE • UNSTRUCTURED WSC STORAGE • GFS users typically manipulate large quantities of data, and thus GFS is further optimized for large operations. • The system architecture consists of a master, which handles metadata operations, and thousands of chunk server (slave) processes running on every server with a disk drive, to manage the data chunks on those drives.
  • 49. ARCHITECTURAL OVERVIEW OF WSCS • WSC STORAGE • UNSTRUCTURED WSC STORAGE • In GFS, fault tolerance is provided by replication across machines instead of within them, as is the case in RAID systems. • Cross-machine replication allows the system to tolerate machine and network failures and enables fast recovery, since replicas for a given disk or machine can be spread across thousands of other machines.
  • 50. ARCHITECTURAL OVERVIEW OF WSCS • WSC STORAGE • UNSTRUCTURED WSC STORAGE • Although the initial version of GFS only supported simple replication, today’s version (Colossus) has added support for more space- efficient Reed-Solomon codes, which tend to reduce the space overhead of replication by roughly a factor of two over simple replication for the same level of availability.
  • 51. ARCHITECTURAL OVERVIEW OF WSCS • WSC STORAGE • UNSTRUCTURED WSC STORAGE • An important factor in maintaining high availability is distributing file chunks across the whole cluster in such a way that a small number of correlated failures is extremely unlikely to lead to data loss. • GFS takes advantage of knowledge about the known possible correlated fault scenarios and attempts to distribute replicas in a way that avoids their co-location in a single fault domain. • Wide distribution of chunks across disks over a whole cluster is also key for speeding up recovery. • Since replicas of chunks in a given disk are spread across possibly all machines in a storage cluster, reconstruction of lost data chunks is performed in parallel at high speed. • Quick recovery is important since long recovery time windows leave under-replicated chunks vulnerable to data loss should additional faults hit the cluster.
  • 52. ARCHITECTURAL OVERVIEW OF WSCS • WSC STORAGE • STRUCTURED WSC STORAGE • The simple file abstraction of GFS and Colossus may suffice for systems that manipulate large blobs of data, but application developers also need the WSC equivalent of database-like functionality, where data sets can be structured and indexed for easy small updates or complex queries. • Blobs (binary large object, basic large object, BLOB, or BLOb) is a collection of binary data stored as a single entity in a database management system. Blobs are typically images, audio or other multimedia objects, though sometimes binary executable code is stored as a blob.
  • 53. ARCHITECTURAL OVERVIEW OF WSCS • WSC STORAGE • STRUCTURED WSC STORAGE • Structured distributed storage systems such as Google’s BigTable and Amazon’s Dynamo were designed to fulfill those needs. • Compared to traditional database systems, BigTable and Dynamo sacrifice some features, such as richness of schema representation and strong consistency, in favor of higher performance and availability at massive scales. • BigTable, for example, presents a simple multi-dimensional sorted map consisting of row keys (strings) associated with multiple values organized in columns, forming a distributed sparse table space. Column values are associated with timestamps in order to support versioning and time-series.
  • 54. ARCHITECTURAL OVERVIEW OF WSCS • WSC STORAGE • STRUCTURED WSC STORAGE • The choice of eventual consistency in BigTable and Dynamo shifts the burden of resolving temporary inconsistencies to the applications using these systems. • A number of application developers within Google have found it inconvenient to deal with weak consistency models and the limitations of the simple data schemes in BigTable. • Second-generation structured storage systems such as MegaStore and subsequently Spanner have been designed to address such • concerns. • Both MegaStore and Spanner provide richer schemas and SQL-like functionality while providing simpler, stronger consistency models.
  • 55. ARCHITECTURAL OVERVIEW OF WSCS Weak Consistency • The protocol is said to support weak consistency if: • All accesses to synchronization variables are seen by all processes (or nodes, processors) in the same order (sequentially) - these are synchronization operations. • Accesses to critical sections are seen sequentially. • All other accesses may be seen in different order on different processes (or nodes, processors). • The set of both read and write operations in between different synchronization operations is the same in each process. Strong Consistency • The protocol is said to support strong consistency if: • All accesses are seen by all parallel processes (or nodes, processors etc.) in the same order (sequentially) • Therefore only one consistent state can be observed, as opposed to weak consistency, where different parallel processes (or nodes etc.) can perceive variables in different states.
  • 56. ARCHITECTURAL OVERVIEW OF WSCS • WSC STORAGE • INTERPLAY OF STORAGE AND NETWORKING TECHNOLOGY • The success of WSC distributed storage systems can be partially attributed to the evolution of datacenter networking fabrics. • The observe that the gap between networking and disk performance has widened to the point that disk locality is no longer relevant in intra-datacenter computations. • This observation enables dramatic simplifications in the design of distributed disk-based storage systems as well as utilization improvements since any disk byte in a WSC facility can in principle be utilized by any task regardless of their relative locality.
  • 57. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER TIER CLASSIFICATIONS AND SPECIFICATIONS • The design of a datacenter is often classified as belonging to “Tier I–IV”. • The Uptime Institute, a professional services organization specializing in datacenters, and the Telecommunications Industry Association (TIA), an industry group accredited by ANSI and made up of approximately 400 member companies, both advocate a 4-tier classification loosely based on the power distribution, uninterruptible power supply (UPS), cooling delivery and redundancy of the datacenter.
  • 58. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER TIER CLASSIFICATIONS AND SPECIFICATIONS • Tier I datacenters have a single path for power distribution, UPS, and cooling distribution, without redundant components. • Tier II adds redundant components to this design (N + 1), improving availability. • Tier III datacenters have one active and one alternate distribution path for utilities. Each path has redundant components and are concurrently maintainable, that is, they provide redundancy even during maintenance. • Tier IV datacenters have two simultaneously active power and cooling distribution paths, redundant components in each path, and are supposed to tolerate any single equipment failure without impacting the load.
  • 59. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER TIER CLASSIFICATIONS AND SPECIFICATIONS • The Uptime Institute’s specification is generally performance-based (with notable exceptions for the amount of backup diesel fuel, water storage, and ASHRAE temperature design points ). • The specification describes topology rather than prescribing a specific list of components to meet the requirements, so there are many architectures that can achieve a given tier classification. • In contrast, TIA-942 is very prescriptive and specifies a variety of implementation details such as building construction, ceiling height, voltage levels, types of racks, and patch cord labeling, for example.
  • 60. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER TIER CLASSIFICATIONS AND SPECIFICATIONS • Formally achieving tier classification certification is difficult and requires a full review from one of the granting bodies, and most datacenters are not formally rated. • Most commercial datacenters fall somewhere between tiers III and IV, choosing a balance between construction cost and reliability. • Generally, the lowest of the individual subsystem ratings (cooling, power, etc.) determines the overall tier classification of the datacenter.
  • 61. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER TIER CLASSIFICATIONS AND SPECIFICATIONS • Real-world datacenter reliability is strongly influenced by the quality of the organization running the datacenter, not just by the design. • The Uptime Institute reports that over 70% of datacenter outages are the result of human error, including management decisions on staffing, maintenance, and training. • Theoretical availability estimates used in the industry range from 99.7% for tier II datacenters to 99.98% and 99.995% for tiers III and IV, respectively.
  • 62. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER ENERGY EFFICIENCY • The broadest definition of WSC energy efficiency would measure the energy used to run a particular workload (say, to sort a petabyte of data). • Unfortunately, no two companies run the same workload and real-world application mixes change all the time so it is hard to benchmark real-world WSCs this way. • Thus, even though such benchmarks have been contemplated as far back as 2008 they haven’t yet been found and we doubt they ever will.
  • 63. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER ENERGY EFFICIENCY • However, it is useful to view energy efficiency as the product of three factors we can independently measure and optimize: • The first term (a) measures facility efficiency, the second server power conversion efficiency, and the third measures the server’s architectural efficiency.
  • 64. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER ENERGY EFFICIENCY • THE PUE METRIC • Power usage effectiveness (PUE) reflects the quality of the datacenter building infrastructure itself, and captures the ratio of total building power to IT power (the power consumed by the actual computing and network equipment, etc.). (Sometimes IT power is also referred to as “critical power.”) • PUE = (Facility power) / (IT Equipment power)
  • 65. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER ENERGY EFFICIENCY • THE PUE METRIC • PUE has gained a lot of traction as a datacenter efficiency metric since widespread reporting started around 2009. • We can easily measure PUE by adding electrical meters to the lines powering the various parts of a datacenter, thus determining how much power is used by chillers or a UPS.
  • 66. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER ENERGY EFFICIENCY • THE PUE METRIC • Historically, the PUE for the average datacenter has been embarrassingly poor. • According to a 2006 study, 85% of current datacenters were estimated to have a PUE of greater than 3.0.
  • 67. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER ENERGY EFFICIENCY • THE PUE METRIC • In other words, the building’s mechanical and electrical systems consumed twice as much power as the actual computing load! Only 5% had a PUE of 2.0 or better. • A subsequent EPA survey of over 100 datacenters reported an average PUE value of 1.91, and a 2012 Uptime Institute survey of over 1100 datacenters covering a range of geographies and datacenter sizes reported an average PUE value between 1.8 and 1.89.
  • 68. ARCHITECTURAL OVERVIEW OF WSCS Uptime Institute survey of PUE for 1100+ datacenters.
  • 69. ARCHITECTURAL OVERVIEW OF WSCS • SOURCES OF EFFICIENCY LOSSES IN DATACENTERS • For illustration, let us walk through the losses in a typical datacenter.
  • 70. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER E • The second term (b) accounts for overheads inside servers or other IT equipment using a metric analogous to PUE, server PUE (SPUE). • SPUE consists of the ratio of total server input power to its useful power, where useful power includes only the power consumed by the electronic components directly involved in the computation: motherboard, disks, CPUs, DRAM, I/O cards, and so on. • Substantial amounts of power may be lost in the server’s power supply, voltage regulator modules (VRMs), and cooling fans.
  • 71. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER E • The product of PUE and SPUE constitutes an accurate assessment of the end-to-end electromechanical efficiency of a WSC. Such a true (or total) PUE metric (TPUE), defined as PUE.
  • 72. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER E • MEASURING ENERGY EFFICIENCY • Similarly, server-level benchmarks such as Joulesort and SPECpower characterize other aspects of computing efficiency. • Joulesort measures the total system energy to perform an out-of-core sort and derives a metric that enables the comparison of systems ranging from embedded devices to supercomputers. • SPECpower focuses on server-class systems and computes the performance-to-power ratio of a system running a typical business application on an enterprise Java platform.
  • 73. ARCHITECTURAL OVERVIEW OF WSCS • DATACENTER E • MEASURING ENERGY EFFICIENCY • Two separate benchmarking efforts aim to characterize the efficiency of storage systems: the Emerald Program by the Storage Networking Industry Association (SNIA) and the SPC-2/E by the Storage Performance Council. • Both benchmarks measure storage servers under different kinds of request activity and report ratios of transaction throughput per Watt.
  • 74. ARCHITECTURAL OVERVIEW OF WSCS • Cost of a WSC • To better understand the potential impact of energy- related optimizations, let us examine the total cost of ownership (TCO) of a datacenter. • At the top level, costs split up into capital expenses (Capex) and operational expenses (Opex). • Capex refers to investments that must be made upfront and that are then depreciated over a certain time frame; examples are the construction cost of a datacenter or the purchase price of a server.
  • 75. ARCHITECTURAL OVERVIEW OF WSCS • Cost of a WSC • Opex refers to the recurring monthly costs of actually running the equipment, excluding depreciation: electricity costs, repairs and maintenance, salaries of on-site personnel, and so on. • Thus, we have: TCO = datacenter depreciation + datacenter Opex + server depreciation + server Opex
  • 76. ARCHITECTURAL OVERVIEW OF WSCS • Cost of a WSC
  • 77. ARCHITECTURAL OVERVIEW OF WSCS • Cost of a WSC • The monthly depreciation cost (or amortization cost) that results from the initial construction expense depends on the duration over which the investment is amortized (which is related to its expected lifetime) and the assumed interest rate. • Typically, datacenters are depreciated over periods of 10–15 years. • Under U.S. accounting rules, it is common to use straight-line depreciation where the value of the asset declines by a fixed amount each month.
  • 78. ARCHITECTURAL OVERVIEW OF WSCS • Cost of a WSC • For example, if we depreciate a $12/W datacenter over 12 years, the depreciation cost is $0.08/W per month. • If we had to take out a loan to finance construction at an interest rate of 8%, the associated monthly interest payments add an additional cost of $0.05/W, for a total of $0.13/W per month. • Typical interest rates vary over time, but many companies will pay interest in the 7–12% range.
  • 79. ARCHITECTURAL OVERVIEW OF WSCS • Cost of a WSC To put the cost of energy into perspective, Hamilton did a case study to estimate the costs of a WSC. He determined that the CAPEX of this 8 MW facility was $88M, and that the roughly 46,000 servers and corresponding networking equipment added another $79M to the CAPEX for the WSC.
  • 80. ARCHITECTURAL OVERVIEW OF WSCS • Cost of a WSC •We can now price the total cost of energy, since U.S . accounting rules allow us to convert CAPEX into OPEX. •We can just amortize CAPEX as a fixed amount each month for the effective life of the equipment. •Note that the amortization rates differ significantly, from 10 years for the facility to 4 years for the networking equipment and 3 years for the servers. •Hence, the WSC facility lasts a decade, but you need to replace the servers every 3 years and the networking equipment every 4 years. •By amortizing the CAPEX, Hamilton came up with a monthly OPEX, including accounting for the cost of borrowing money (5% annually) to pay for the WSC. •At $3.8M, the monthly OPEX is about 2% of the CAPEX.
  • 81. ARCHITECTURAL OVERVIEW OF WSCS • A Google Warehouse-Scale Computer • Since many companies with WSCs are competing vigorously in the marketplace, up until recently, they have been reluctant to share their latest innovations with the public (and each other). • In 2009, Google described a state-of-the-art WSC as of 2005. • Google graciously provided an update of the 2007 status of their WS C, making this section the most up-to-date description of a Google WS C. • Even more recently, Facebook described their latest datacenter as part of • http://opencompute.org.
  • 82. ARCHITECTURAL OVERVIEW OF WSCS • A Google Warehouse-Scale Computer • Containers • Both Google and Microsoft have built WSCs using shipping containers. • The idea of building a WSC from containers is to make WSC design modular. • Each container is independent, and the only external connections are networking, power, and water. • The containers in turn supply networking, power, and cooling to the servers placed inside them, so the job of the WSC is to supply networking, power, and cold water to the containers and to pump the resulting warm water to external cooling towers and chillers.
  • 84. ARCHITECTURAL OVERVIEW OF WSCS • A Google Warehouse-Scale Computer • Containers • Diagram is a cutaway drawing of a Google container. • A container holds up to 1160 servers, so 45 containers have space for 52,200 servers. (This WSC has about 40,000 servers.) • The servers are stacked 20 high in racks that form two long rows of 29 racks (also called bays) each, with one row on each side of the container. • The rack switches are 48-port, 1 Gbit/sec Ethernet switches, which are placed in every other rack.
  • 85. ARCHITECTURAL OVERVIEW OF WSCS • A Google Warehouse-Scale Computer • Containers • The Google WSC that we are looking at contains 45 40- foot-long containers in a 300- foot by 250-foot space, or 75,000 square feet (about 7000 square meters). • To fit in the warehouse, 30 of the containers are stacked two high, or 15 pairs of stacked containers. • Although the location was not revealed, it was built at the time that Google developed WSCs in The Dallas, Oregon, which provides a moderate climate and is near cheap hydroelectric power and Internet backbone fiber.
  • 86. ARCHITECTURAL OVERVIEW OF WSCS • A Google Warehouse-Scale Computer • Containers • This WSC offers 10 megawatts with a PUE of 1.23 over the prior 12 months. • Of that 0.230 of PUE overhead, 85% goes to cooling losses (0.195 PUE) and 15% (0.035) goes to power losses. • The system went live in November 2005, and this section describes its state as of 2007. • A Google container can handle up to 250 kilowatts. That means the container can handle 780 watts per square foot (0.09 square meters), or 133 watts per square foot across the entire 75,000- square-foot space with 40 containers. • However, the containers in this WSC average just 222 kilowatts
  • 87. ARCHITECTURAL OVERVIEW OF WSCS • A Google Warehouse-Scale Computer • Containers
  • 88. ARCHITECTURAL OVERVIEW OF WSCS • A Google Warehouse-Scale Computer • Containers • Servers In A Google WSC • The server in Figure 6.21 has two sockets, each containing a dual-core AMD Opteron processor running at 2.2 GHz. The photo shows eight DIMMS, and these servers are typically deployed with 8 GB of DDR2 DRA M. • A novel feature is that the memory bus is down clocked to 533 MHz from the standard 666 MHz since the slower bus has little impact on performance but a significant impact on power. • The baseline design has a single network interface card (NIC) for a 1 Gbit/sec Ethernet link.
  • 89. ARCHITECTURAL OVERVIEW OF WSCS • A Google Warehouse-Scale Computer • Containers • Servers In A Google WSC • Although the photo in Figure 6.21 shows two SATA disk drives, the baseline server has just one. • The peak power of the baseline is about 160 watts, and idle power is 85 watts. • This baseline node is supplemented to offer a storage (or “diskfull”) node. • First, a second tray containing 10 S ATA disks is connected to the server. • To get one more disk, a second disk is placed into the empty spot on the motherboard, giving the storage node 12 S ATA disks. • Finally, since a storage node could saturate a single 1 Gbit/sec Ethernet link, a second Ethernet NIC was added. • Peak power for a storage node is about 300 watts, and it idles at 198 watts.