An introduction to the Design of Warehouse-Scale ComputersAlessio Villardita
A brief overview of the main factors involved in the design of Warehouse-Scale Computers (WSC), from the hardware, to the cooling system to the overall plant energy efficiency, always keeping in mind the costs of such a big architecture.
Co-Author: Pietro Piscione (https://www.linkedin.com/pub/pietro-piscione/84/b37/926)
A work based on:
"The Datacenter as a Computer, An Introduction to the Design of Warehouse-Scale Machines, Second Edition"
by
Luiz André Barroso
Jimmy Clidaras
Urs Hölzle
This presentation elaborates what are multiprocessor operating systems, Multiprocessor Hardware, Multiprocessing models and frameworks, Multiprocessor Synchronization, Multiprocessor Scheduling, Applications of multiprocessing systems, Advantages, Disadvantages and Solutions and New trends of Multiprocessing.
A Distributed computing architeture consists of very lightweight software agents installed on a number of client systems , and one or more dedicated distributed computing managment servers.
An introduction to the Design of Warehouse-Scale ComputersAlessio Villardita
A brief overview of the main factors involved in the design of Warehouse-Scale Computers (WSC), from the hardware, to the cooling system to the overall plant energy efficiency, always keeping in mind the costs of such a big architecture.
Co-Author: Pietro Piscione (https://www.linkedin.com/pub/pietro-piscione/84/b37/926)
A work based on:
"The Datacenter as a Computer, An Introduction to the Design of Warehouse-Scale Machines, Second Edition"
by
Luiz André Barroso
Jimmy Clidaras
Urs Hölzle
This presentation elaborates what are multiprocessor operating systems, Multiprocessor Hardware, Multiprocessing models and frameworks, Multiprocessor Synchronization, Multiprocessor Scheduling, Applications of multiprocessing systems, Advantages, Disadvantages and Solutions and New trends of Multiprocessing.
A Distributed computing architeture consists of very lightweight software agents installed on a number of client systems , and one or more dedicated distributed computing managment servers.
Three main Architectures For Parallel Database.pptxshailajawesley023
In Parallel Databases, mainly there are three architectural designs for parallel DBMS. They are as follows:
Shared Memory Architecture
Shared Disk Architecture
Shared Nothing Architecture
Description: In a shared-memory system, multiple CPUs (Central Processing Units) are connected to a common pool of main memory through an interconnection network.
Functionality: All CPUs can access and share data stored in the common main memory.
Advantages :
It has high-speed data access for a limited number of processors.
The communication is efficient.
Disadvantages :
It cannot use beyond 80 or 100 CPUs in parallel.
The bus or the interconnection network gets block due to the increment of the large number of CPUs.
Description: In a shared-disk system, each CPU has its private memory but can access all disks directly through an interconnection network.
Functionality: CPUs can independently access data stored on disks without relying on shared memory.
Advantages :
The interconnection network is no longer a bottleneck each CPU has its own memory.
Load-balancing is easier in shared disk architecture.
There is better fault tolerance.
Disadvantages :
If the number of CPUs increases, the problems of interference and memory contentions also increase.
There’s also exists a scalability problem.
Description: In a shared-nothing system, each CPU has its local main memory and disk space. No two CPUs can access the same storage area directly.
Functionality: CPUs communicate with each other through a network connection rather than sharing memory or disks.
Advantages :
It has better scalability as no sharing of resources is done
Multiple CPUs can be added
Disadvantages:
The cost of communications is higher as it involves sending of data and software interaction at both ends
The cost of non-local disk access is higher than the cost of shared disk architectures.
Parallel computing and its applicationsBurhan Ahmed
Parallel computing is a type of computing architecture in which several processors execute or process an application or computation simultaneously. Parallel computing helps in performing large computations by dividing the workload between more than one processor, all of which work through the computation at the same time. Most supercomputers employ parallel computing principles to operate. Parallel computing is also known as parallel processing.
↓↓↓↓ Read More:
Watch my videos on snack here: --> --> http://sck.io/x-B1f0Iy
@ Kindly Follow my Instagram Page to discuss about your mental health problems-
-----> https://instagram.com/mentality_streak?utm_medium=copy_link
@ Appreciate my work:
-----> behance.net/burhanahmed1
Thank-you !
Distributed computing deals with hardware and software systems containing more than one processing element or storage element, concurrent processes, or multiple programs, running under a loosely or tightly controlled regime. In distributed computing a program is split up into parts that run simultaneously on multiple computers communicating over a network. Distributed computing is a form of parallel computing, but parallel computing is most commonly used to describe program parts running simultaneously on multiple processors in the same computer. Both types of processing require dividing a program into parts that can run simultaneously, but distributed programs often must deal with heterogeneous environments, network links of varying latencies, and unpredictable failures in the network or the computers.
Three main Architectures For Parallel Database.pptxshailajawesley023
In Parallel Databases, mainly there are three architectural designs for parallel DBMS. They are as follows:
Shared Memory Architecture
Shared Disk Architecture
Shared Nothing Architecture
Description: In a shared-memory system, multiple CPUs (Central Processing Units) are connected to a common pool of main memory through an interconnection network.
Functionality: All CPUs can access and share data stored in the common main memory.
Advantages :
It has high-speed data access for a limited number of processors.
The communication is efficient.
Disadvantages :
It cannot use beyond 80 or 100 CPUs in parallel.
The bus or the interconnection network gets block due to the increment of the large number of CPUs.
Description: In a shared-disk system, each CPU has its private memory but can access all disks directly through an interconnection network.
Functionality: CPUs can independently access data stored on disks without relying on shared memory.
Advantages :
The interconnection network is no longer a bottleneck each CPU has its own memory.
Load-balancing is easier in shared disk architecture.
There is better fault tolerance.
Disadvantages :
If the number of CPUs increases, the problems of interference and memory contentions also increase.
There’s also exists a scalability problem.
Description: In a shared-nothing system, each CPU has its local main memory and disk space. No two CPUs can access the same storage area directly.
Functionality: CPUs communicate with each other through a network connection rather than sharing memory or disks.
Advantages :
It has better scalability as no sharing of resources is done
Multiple CPUs can be added
Disadvantages:
The cost of communications is higher as it involves sending of data and software interaction at both ends
The cost of non-local disk access is higher than the cost of shared disk architectures.
Parallel computing and its applicationsBurhan Ahmed
Parallel computing is a type of computing architecture in which several processors execute or process an application or computation simultaneously. Parallel computing helps in performing large computations by dividing the workload between more than one processor, all of which work through the computation at the same time. Most supercomputers employ parallel computing principles to operate. Parallel computing is also known as parallel processing.
↓↓↓↓ Read More:
Watch my videos on snack here: --> --> http://sck.io/x-B1f0Iy
@ Kindly Follow my Instagram Page to discuss about your mental health problems-
-----> https://instagram.com/mentality_streak?utm_medium=copy_link
@ Appreciate my work:
-----> behance.net/burhanahmed1
Thank-you !
Distributed computing deals with hardware and software systems containing more than one processing element or storage element, concurrent processes, or multiple programs, running under a loosely or tightly controlled regime. In distributed computing a program is split up into parts that run simultaneously on multiple computers communicating over a network. Distributed computing is a form of parallel computing, but parallel computing is most commonly used to describe program parts running simultaneously on multiple processors in the same computer. Both types of processing require dividing a program into parts that can run simultaneously, but distributed programs often must deal with heterogeneous environments, network links of varying latencies, and unpredictable failures in the network or the computers.
In this presentation, we will discuss in details about challenges in managing the IT infrastructure with a focus on server sizing, storage capacity planning and internet connectivity. We will also discuss about how to set up security architecture and disaster recovery plan.
To know more about Welingkar School’s Distance Learning Program and courses offered, visit:
http://www.welingkaronline.org/distance-learning/online-mba.html
Design and build a Private Cloud for your Enterprise using a Scalable Architecture.
- Bridge IT and the Public Cloud
- Reduce Cost
- On-Demand Services
- Run Scalable Applications
- Handle Traffic Growth
- Meet Compliance Objectives
- Offer Operational Flexibility and Efficiency
Why new hardware may not make Oracle databases fasterSolarWinds
How can you know if hardware is the right answer to your Oracle database performance issues? How can you know for sure which hardware components will have the biggest impact? As a DBA or database developer, you should know that you can gain significant performance improvements without the time, money and risk associated with providing the latest server or flash storage array.
Learn why new hardware may not make your Oracle database faster and what you can do instead.
CloudSmartz Layer 2 Direct Connect [Factsheet] | Smarter TransformationCloudSmartz
Enjoy on-ramps to the cloud from your premise, your data center or from cloud to cloud. Leverage up to 10Gbps dedicated links accessible in every major metropolitan market. Use any combination of network access points via private line. Interconnect among the services and infrastructure you already own. Deploy your application stack as-is without the complexity of layer 3 networking.
Complete configuration of SAN using ESXI Environment and Installation guide. Now you will be able to configure storage area network with the help of these slides.
This configuration helps user to configure ESXI 4, ESXI 3.0 Servers
If you want to learn iPhone app development complete, so you arrived on right location... From my slides u easily learn iPhone app development.. This is my fourth tutorial slides.. I also share some more tutorials.. Keep in touch...
If you want to learn iPhone app development complete, so you arrived on right location... From my slides u easily learn iPhone app development.. This is my third tutorial slides.. I also share some more tutorials.. Keep in touch...
If you want to learn iPhone app development complete, so you arrived on right location... From my slides u easily learn iPhone app development.. This is my second tutorial slides.. I also share some more tutorials.. Keep in touch...
If you want to learn iPhone app development complete, so you arrived on right location... From my slides u easily learn iPhone app development.. This is my first tutorial slides.. I also share some more tutorials.. Keep in touch...
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
2. Introduction
• Had scale been the only distinguishing feature
of these systems we might simply refer to
them as datacenters.
• Datacenters are buildings where multiple
servers and communication gear are
co-located because of their common
environmental requirements and physical
security needs, and for ease of maintenance.
• In that sense, a WSC is a type of datacenter.
3. Introduction
• Traditional datacenters, however, typically host a large
number of relatively small- or medium-sized
applications, each running on a dedicated hardware
infrastructure that is de-coupled and protected from
other systems in the same facility.
• Those datacenters host hardware and software for
multiple organizational units or even different
companies.
• Different computing systems within such a datacenter
often have little in common in terms of hardware,
software, or maintenance infrastructure, and tend not
to communicate with each other at all.
4. Introduction
• WSCs currently power the services offered by
companies such as Google, Amazon, Facebook, and
Microsoft’s online services division.
• They differ significantly from traditional datacenters:
1. They belong to a single organization.
2. Use a relatively homogeneous hardware and system
software platform
3. Share a common systems management layer.
• Often, much of the application, middleware, and
system software is built in-house compared to the
predominance of third-party software running in
conventional datacenters.
5. Introduction
• Most importantly, WSCs run a smaller number of
very large applications (or Internet services), and
the common resource management
infrastructure allows significant deployment
flexibility.
• The requirements of homogeneity, single-
organization control, and enhanced focus on cost
efficiency motivate designers to take new
approaches in constructing and operating these
systems.
6. Introduction
• Internet services must achieve high
availability, typically aiming for at least 99.99%
uptime (“four nines”, about an hour of
downtime per year).
• Achieving fault-free operation on a large
collection of hardware and system software is
hard and is made more difficult by the large
number of servers involved.
7. Introduction
• Although it might be theoretically possible to
prevent hardware failures in a collection of
10,000 servers, it would surely be extremely
expensive.
• Consequently, WSC workloads must be
designed to gracefully tolerate large numbers
of component faults with little or no impact
on service level performance and availability.
8. ARCHITECTURAL OVERVIEW OF WSCS
• The hardware implementation of a WSC will differ
significantly from one installation to the next.
• Even within a single organization such as Google,
systems deployed in different years use different
basic elements, reflecting the hardware
improvements provided by the industry.
• However, the architectural organization of these
systems has been relatively stable over the last few
years.
• Therefore, it is useful to describe this general
architecture at a high level as it sets the background for
subsequent discussions.
9. ARCHITECTURAL OVERVIEW OF WSCS
• Being satisfied with neither the metric nor the
US system, rack designers use “rack units” to
measure the height of servers.
• 1U is 1.75 inches or 44.45 mm; a typical rack is
42U high.
• The 19-inch (48.26-cm) rack is still the
standard framework to hold servers, despite
this standard going back to railroad hardware
from the 1930s.
10. ARCHITECTURAL OVERVIEW OF WSCS
Sketch of the typical elements in warehouse-scale systems: 1U server (left), 7’ rack with
Ethernet switch (middle), and diagram of a small cluster with a cluster-level Ethernet switch/
router (right).
11. ARCHITECTURAL OVERVIEW OF WSCS
• Previous Figure depicts the high-level building blocks
for WSCs.
• A set of low-end servers, typically in a 1U or blade
enclosure format, are mounted within a rack and
interconnected using a local Ethernet switch.
• These rack-level switches, which can use 1- or 10-Gbps
links, have a number of uplink connections to one or
more cluster-level (or datacenter-level) Ethernet
switches.
• This second-level switching domain can potentially
span more than ten thousand individual servers.
12. ARCHITECTURAL OVERVIEW OF WSCS
• In the case of a blade enclosure there is an
additional first level of networking
aggregation within the enclosure where
multiple processing blades connect to a small
number of networking blades through an I/O
bus such as PCIe.
13. ARCHITECTURAL OVERVIEW OF WSCS
• A 7-foot (213.36-cm) rack offers 48 U, so it’s not a
coincidence that the most popular switch for a
rack is a 48-port Ethernet switch.
• This product has become a commodity that costs
as little as $30 per port for a 1 Gbit/sec Ethernet
link in 2011.
• Note that the bandwidth within the rack is the
same for each server, so it does not matter where
the software places the sender and the receiver
as long as they are within the same rack.
14. ARCHITECTURAL OVERVIEW OF WSCS
• This flexibility is ideal from a software
perspective.
• These switches typically offer two to eight
uplinks, which leave the rack to go to the next
higher switch in the network hierarchy.
• Thus, the bandwidth leaving the rack is 6 to 24
times smaller—48/8 to 48/2—than the
bandwidth within the rack. This ratio is called
oversubscription.
• Uplink has 48 / n times lower bandwidth, where
n= # of uplink ports
15. ARCHITECTURAL OVERVIEW OF WSCS
• Alas, large oversubscription means
programmers must be aware of the
performance consequences when placing
senders and receivers in different racks.
• This increased software-scheduling burden is
another argument for network switches
designed specifically for the datacenter.
17. ARCHITECTURAL OVERVIEW OF WSCS
• Array Switch
• Switch that connects an array of racks.
• Array switch should have 10 X the bisection
bandwidth⌘ of a rack switch
• Cost of n-port switch grows as n2
• Often utilize content addressable memory
chips and FPGAs to support high-speed packet
inspection.
20. ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
• Previous Figures shows the latency, bandwidth, and
capacity of memory hierarchy inside a WSC, and also
shows the same data visually.
• Each server contains:
16 GBytes of memory with a 100-nanosecond access
time and transfers at 20 GBytes/sec and
2 terabytes of disk that offers a 10-millisecond access
time and transfers at 200 MBytes/sec.
• There are two sockets per board, and they share one
1 Gbit/sec Ethernet port.
21. ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
• Every pair of racks includes one rack switch and holds
80 2U servers.
• Networking software plus switch overhead increases
the latency to DRAM to 100 microseconds and the disk
access latency to 11 milliseconds.
• Thus, the total storage capacity of a rack is roughly 1
terabyte of DRAM and 160 terabytes of disk storage.
• The 1 Gbit/sec Ethernet limits the remote bandwidth
to DRAM or disk within the rack to 100 MBytes/sec.
22. ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
• The array switch can handle 30 racks, so storage
capacity of an array goes up by a factor of 30: 30
terabytes of DRAM and 4.8 petabytes of disk.
• The array switch hardware and software
increases latency to DRAM within an array to 500
microseconds and disk latency to 12 milliseconds.
• The bandwidth of the array switch limits the
remote bandwidth to either array DRAM or array
disk to 10 MBytes/sec.
23. ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
• Previous figures show that network overhead
dramatically increases latency from local
DRAM to rack DRA M and array DRAM, but
both still have more than 10 times better
latency than the local disk.
• The network collapses the difference in
bandwidth between rack DRAM and rack disk
and between array DRAM and array disk.
24. ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
• What is the average latency assuming that
90% of accesses are local to the server, 9% are
outside the server but local to the rack , and
1% are outside the rack but within the array?
• (90%x0.1)+(9%100)+(1%x300)=12.09 msec
25. ARCHITECTURAL OVERVIEW OF WSCS
• WSC Memory Hierarchy
• How long does it take to transfer 1000MB between disks
within the server, between servers in the rack, and
between servers in different racks of an array?
• Within server: 1000/200=5 sec
• Within rack: 1000/100=10 sec
• Within array: 1000/10= 100 sec
28. ARCHITECTURAL OVERVIEW OF WSCS
• The WSC needs 20 arrays to reach 50,000
servers, so there is one more level of the
networking hierarchy.
• Next Figure shows the conventional Layer 3
routers to connect the arrays together and to
the Internet.
29. ARCHITECTURAL OVERVIEW OF WSCS
The Layer 3 network used to link arrays together and to the Internet
[Greenberg et al. 2009].
Some WSCs use a separate border router to connect the Internet to the
datacenter Layer 3 switches.
31. ARCHITECTURAL OVERVIEW OF WSCS
• Another way to tackle network scalability is to
offload some traffic to a special-purpose
network.
• For example, if storage traffic is a big component
of overall traffic, we could build a separate
network to connect servers to storage units.
• If that traffic is more localized (not all servers
need to be attached to all storage units) we can
build smaller-scale networks, thus reducing costs.
32. ARCHITECTURAL OVERVIEW OF WSCS
• Historically, that’s how all storage was networked:
a SAN (storage area network) connected servers
to disks, typically using FibreChannel networks
rather than Ethernet.
• Today, Ethernet is becoming more common since
it offers comparable speeds, and protocols such
as FCoE (FibreChannel over Ethernet) and iSCSI
(SCSI over IP) allow Ethernet networks to
integrate well with traditional SANs.
33. ARCHITECTURAL OVERVIEW OF WSCS
• WSCs using VMs (or, more generally, task
migration) pose further challenges to
networks since connection endpoints (i.e., IP
address/port combinations) can move from
one physical machine to another.
• Typical networking hardware as well as
network management software doesn’t
anticipate such moves and in fact often
explicitly assume that they’re not possible.
34. ARCHITECTURAL OVERVIEW OF WSCS
• For example, network designs often assume that
all machines in a given rack have IP addresses in a
common subnet, which simplifies administration
and minimizes the number of required
forwarding table entries routing tables.
• More importantly, frequent migration makes it
impossible to manage the network manually--
programming network elements needs to be
automated, so the same cluster manager that
decides the placement of computations also
needs to update the network state.
35. ARCHITECTURAL OVERVIEW OF WSCS
• The Need of SDN
• The need for a programmable network has led
to much interest in OpenFlow
[http://www.openflow.org/] and software-
defined networking (SDN), which moves the
network control plane out of the individual
switches into a logically centralized controller.
37. ARCHITECTURAL OVERVIEW OF WSCS
• The Need of SDN
• Controlling a network from a logically centralized
server offers many advantages; in particular, common
networking algorithms such as computing reachability,
shortest paths, or max-flow traffic placement become
much simpler to solve, compared to their
implementation in current networks where each
individual router must solve the same problem while
dealing with limited visibility (direct neighbors only),
inconsistent network state (routers that are out of
synch with the current network state), and many
independent and concurrent actors (routers).
38. ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• Disk drives or Flash devices are connected
directly to each individual server and managed by
a global distributed file system (such as Google’s
GFS) or they can be part of Network Attached
Storage (NAS) devices directly connected to the
cluster-level switching fabric.
• A NAS tends to be a simpler solution to deploy
initially because it allows some of the data
management responsibilities to be outsourced to
a NAS appliance vendor.
39. ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• Keeping storage separate from computing nodes also
makes it easier to enforce quality of service guarantees
since the NAS runs no compute jobs besides the
storage server.
• In contrast, attaching disks directly to compute nodes
can reduce hardware costs (the disks leverage the
existing server enclosure) and improve networking
fabric utilization (each server network port is
effectively dynamically shared between the computing
tasks and the file system).
40. ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• The replication model between these two
approaches is also fundamentally different.
• A NAS tends to provide high availability through
replication or error correction capabilities within
each appliance, whereas systems like GFS
implement replication across different machines
and consequently will use more networking
bandwidth to complete write operations.
41. ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• However, GFS-like systems are able to keep data
available even after the loss of an entire server
enclosure or rack and may allow higher aggregate
read bandwidth because the same data can be
sourced from multiple replicas.
• Trading off higher write overheads for lower cost,
higher availability, and increased read bandwidth
was the right solution for many of Google’s early
workloads.
42. ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• An additional advantage of having disks co-
located with compute servers is that it enables
distributed system software to exploit data
locality.
• Given how networking performance has
outpaced disk performance for the last decades
such locality advantages are less useful for disks
but may remain beneficial to faster modern
storage devices such as those using Flash storage.
43. ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• NAND Flash technology has made Solid State Drives
(SSDs) affordable for a growing class of storage needs
in WSCs.
• While the cost per byte stored in SSDs will remain
much higher than in disks for the foreseeable future,
many Web services have I/O rates that cannot be easily
achieved with disk based systems.
• Since SSDs can deliver IO rates many orders of
magnitude higher than disks, they are increasingly
displacing disk drives as the repository of choice for
databases in Web services.
44. ARCHITECTURAL OVERVIEW OF WSCS
HDD interiors almost resemble a high-tech record player.
OCZ's Vector SSD is one of the fastest around
The OCZ RevoDrive Hybrid.
45. ARCHITECTURAL OVERVIEW OF WSCS
• STORAGE
• Types of NAND Flash
• There are primarily two types of NAND Flash widely used
today, Single-Level Cell (SLC) and Multi-Level Cell (MLC).
NAND Flash stores data in a large array of cells.
• Each cell can store data — one bit for cell for SLC NAND,
and two bits per cell for MLC. So, SLC NAND would store a
“0” or “1” in each cell, and MLC NAND would store “00”,
“01”, “10”, or “11” in each cell.
• SLC and MLC NAND offer different levels of performance
and endurance characteristics at different price points, with
SLC being the higher performing and more costly of the
two.
46. ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• The data manipulated by WSC workloads tends to fall into
two categories:
• data that is private to individual running tasks and data that
is part of the shared state of the distributed workload.
• Private data tends to reside in local DRAM or disk, is rarely
replicated, and its management is simplified by virtue of its
single user semantics.
• In contrast, shared data must be much more durable and is
accessed by a large number of clients, and thus requires a
much more sophisticated distributed storage system.
47. ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• UNSTRUCTURED WSC STORAGE
• Google’s GFS is an example of a storage system with a
simple file-like abstraction (Google’s Colossus system has
since replaced GFS, but follows a similar architectural
philosophy so we choose to describe the better known GFS
here).
• GFS was designed to support the Web search indexing
system (the system that turned crawled Web pages into
index files for use in Web search), and therefore focuses on
high throughput for thousands of concurrent
readers/writers and robust performance under high
hardware failures rates.
48. ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• UNSTRUCTURED WSC STORAGE
• GFS users typically manipulate large quantities of
data, and thus GFS is further optimized for large
operations.
• The system architecture consists of a master,
which handles metadata operations, and
thousands of chunk server (slave) processes
running on every server with a disk drive, to
manage the data chunks on those drives.
49. ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• UNSTRUCTURED WSC STORAGE
• In GFS, fault tolerance is provided by replication
across machines instead of within them, as is the
case in RAID systems.
• Cross-machine replication allows the system to
tolerate machine and network failures and
enables fast recovery, since replicas for a given
disk or machine can be spread across thousands
of other machines.
50. ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• UNSTRUCTURED WSC STORAGE
• Although the initial version of GFS only
supported simple replication, today’s version
(Colossus) has added support for more space-
efficient Reed-Solomon codes, which tend to
reduce the space overhead of replication by
roughly a factor of two over simple replication
for the same level of availability.
51. ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• UNSTRUCTURED WSC STORAGE
• An important factor in maintaining high availability is distributing file
chunks across the whole cluster in such a way that a small number of
correlated failures is extremely unlikely to lead to data loss.
• GFS takes advantage of knowledge about the known possible correlated
fault scenarios and attempts to distribute replicas in a way that avoids
their co-location in a single fault domain.
• Wide distribution of chunks across disks over a whole cluster is also key for
speeding up recovery.
• Since replicas of chunks in a given disk are spread across possibly all
machines in a storage cluster, reconstruction of lost data chunks is
performed in parallel at high speed.
• Quick recovery is important since long recovery time windows leave
under-replicated chunks vulnerable to data loss should additional faults
hit the cluster.
52. ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• STRUCTURED WSC STORAGE
• The simple file abstraction of GFS and Colossus may suffice
for systems that manipulate large blobs of data, but
application developers also need the WSC equivalent of
database-like functionality, where data sets can be
structured and indexed for easy small updates or complex
queries.
• Blobs (binary large object, basic large object, BLOB, or
BLOb) is a collection of binary data stored as a single entity
in a database management system. Blobs are typically
images, audio or other multimedia objects, though
sometimes binary executable code is stored as a blob.
53. ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• STRUCTURED WSC STORAGE
• Structured distributed storage systems such as Google’s BigTable
and Amazon’s Dynamo were designed to fulfill those needs.
• Compared to traditional database systems, BigTable and Dynamo
sacrifice some features, such as richness of schema representation
and strong consistency, in favor of higher performance and
availability at massive scales.
• BigTable, for example, presents a simple multi-dimensional sorted
map consisting of row keys (strings) associated with multiple values
organized in columns, forming a distributed sparse table space.
Column values are associated with timestamps in order to support
versioning and time-series.
54. ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• STRUCTURED WSC STORAGE
• The choice of eventual consistency in BigTable and Dynamo shifts
the burden of resolving temporary inconsistencies to the
applications using these systems.
• A number of application developers within Google have found it
inconvenient to deal with weak consistency models and the
limitations of the simple data schemes in BigTable.
• Second-generation structured storage systems such as MegaStore
and subsequently Spanner have been designed to address such
• concerns.
• Both MegaStore and Spanner provide richer schemas and SQL-like
functionality while providing simpler, stronger consistency models.
55. ARCHITECTURAL OVERVIEW OF WSCS
Weak Consistency
• The protocol is said to support weak
consistency if:
• All accesses to synchronization
variables are seen by all processes (or
nodes, processors) in the same order
(sequentially) - these are
synchronization operations.
• Accesses to critical sections are seen
sequentially.
• All other accesses may be seen in
different order on different processes
(or nodes, processors).
• The set of both read and write
operations in between different
synchronization operations is the same
in each process.
Strong Consistency
• The protocol is said to support
strong consistency if:
• All accesses are seen by all
parallel processes (or nodes,
processors etc.) in the same
order (sequentially)
• Therefore only one consistent
state can be observed, as
opposed to weak consistency,
where different parallel
processes (or nodes etc.) can
perceive variables in different
states.
56. ARCHITECTURAL OVERVIEW OF WSCS
• WSC STORAGE
• INTERPLAY OF STORAGE AND NETWORKING TECHNOLOGY
• The success of WSC distributed storage systems can be
partially attributed to the evolution of datacenter
networking fabrics.
• The observe that the gap between networking and disk
performance has widened to the point that disk locality is
no longer relevant in intra-datacenter computations.
• This observation enables dramatic simplifications in the
design of distributed disk-based storage systems as well as
utilization improvements since any disk byte in a WSC
facility can in principle be utilized by any task regardless of
their relative locality.
57. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER TIER CLASSIFICATIONS AND
SPECIFICATIONS
• The design of a datacenter is often classified as
belonging to “Tier I–IV”.
• The Uptime Institute, a professional services
organization specializing in datacenters, and the
Telecommunications Industry Association (TIA), an
industry group accredited by ANSI and made up of
approximately 400 member companies, both advocate
a 4-tier classification loosely based on the power
distribution, uninterruptible power supply (UPS),
cooling delivery and redundancy of the datacenter.
58. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER TIER CLASSIFICATIONS AND SPECIFICATIONS
• Tier I datacenters have a single path for power distribution, UPS,
and cooling distribution, without redundant components.
• Tier II adds redundant components to this design (N + 1), improving
availability.
• Tier III datacenters have one active and one alternate distribution
path for utilities. Each path has redundant components and are
concurrently maintainable, that is, they provide redundancy even
during maintenance.
• Tier IV datacenters have two simultaneously active power and
cooling distribution paths, redundant components in each path, and
are supposed to tolerate any single equipment failure without
impacting the load.
59. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER TIER CLASSIFICATIONS AND SPECIFICATIONS
• The Uptime Institute’s specification is generally
performance-based (with notable exceptions for the
amount of backup diesel fuel, water storage, and ASHRAE
temperature design points ).
• The specification describes topology rather than
prescribing a specific list of components to meet the
requirements, so there are many architectures that can
achieve a given tier classification.
• In contrast, TIA-942 is very prescriptive and specifies a
variety of implementation details such as building
construction, ceiling height, voltage levels, types of racks,
and patch cord labeling, for example.
60. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER TIER CLASSIFICATIONS AND
SPECIFICATIONS
• Formally achieving tier classification certification is
difficult and requires a full review from one of the
granting bodies, and most datacenters are not formally
rated.
• Most commercial datacenters fall somewhere between
tiers III and IV, choosing a balance between
construction cost and reliability.
• Generally, the lowest of the individual subsystem
ratings (cooling, power, etc.) determines the overall
tier classification of the datacenter.
61. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER TIER CLASSIFICATIONS AND
SPECIFICATIONS
• Real-world datacenter reliability is strongly influenced
by the quality of the organization running the
datacenter, not just by the design.
• The Uptime Institute reports that over 70% of
datacenter outages are the result of human error,
including management decisions on staffing,
maintenance, and training.
• Theoretical availability estimates used in the industry
range from 99.7% for tier II datacenters to 99.98% and
99.995% for tiers III and IV, respectively.
62. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER ENERGY EFFICIENCY
• The broadest definition of WSC energy efficiency would
measure the energy used to run a particular workload
(say, to sort a petabyte of data).
• Unfortunately, no two companies run the same
workload and real-world application mixes change all
the time so it is hard to benchmark real-world WSCs
this way.
• Thus, even though such benchmarks have been
contemplated as far back as 2008 they haven’t yet
been found and we doubt they ever will.
63. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER ENERGY EFFICIENCY
• However, it is useful to view energy efficiency
as the product of three factors we can
independently measure and optimize:
• The first term (a) measures facility efficiency,
the second server power conversion efficiency,
and the third measures the server’s
architectural efficiency.
64. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER ENERGY EFFICIENCY
• THE PUE METRIC
• Power usage effectiveness (PUE) reflects the
quality of the datacenter building infrastructure
itself, and captures the ratio of total building
power to IT power (the power consumed by the
actual computing and network equipment, etc.).
(Sometimes IT power is also referred to as
“critical power.”)
• PUE = (Facility power) / (IT Equipment power)
65. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER ENERGY EFFICIENCY
• THE PUE METRIC
• PUE has gained a lot of traction as a datacenter
efficiency metric since widespread reporting
started around 2009.
• We can easily measure PUE by adding electrical
meters to the lines powering the various parts of
a datacenter, thus determining how much power
is used by chillers or a UPS.
66. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER ENERGY EFFICIENCY
• THE PUE METRIC
• Historically, the PUE for the average
datacenter has been embarrassingly poor.
• According to a 2006 study, 85% of current
datacenters were estimated to have a PUE of
greater than 3.0.
67. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER ENERGY EFFICIENCY
• THE PUE METRIC
• In other words, the building’s mechanical and electrical
systems consumed twice as much power as the actual
computing load! Only 5% had a PUE of 2.0 or better.
• A subsequent EPA survey of over 100 datacenters
reported an average PUE value of 1.91, and a 2012
Uptime Institute survey of over 1100 datacenters
covering a range of geographies and datacenter sizes
reported an average PUE value between 1.8 and 1.89.
69. ARCHITECTURAL OVERVIEW OF WSCS
• SOURCES OF EFFICIENCY LOSSES IN
DATACENTERS
• For illustration, let us walk through the losses
in a typical datacenter.
70. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER E
• The second term (b) accounts for overheads inside
servers or other IT equipment using a metric analogous
to PUE, server PUE (SPUE).
• SPUE consists of the ratio of total server input power to
its useful power, where useful power includes only the
power consumed by the electronic components
directly involved in the computation: motherboard,
disks, CPUs, DRAM, I/O cards, and so on.
• Substantial amounts of power may be lost in the
server’s power supply, voltage regulator modules
(VRMs), and cooling fans.
71. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER E
• The product of PUE and SPUE constitutes an
accurate assessment of the end-to-end
electromechanical efficiency of a WSC. Such a
true (or total) PUE metric (TPUE), defined as
PUE.
72. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER E
• MEASURING ENERGY EFFICIENCY
• Similarly, server-level benchmarks such as Joulesort and
SPECpower characterize other aspects of computing
efficiency.
• Joulesort measures the total system energy to perform an
out-of-core sort and derives a metric that enables the
comparison of systems ranging from embedded devices to
supercomputers.
• SPECpower focuses on server-class systems and computes
the performance-to-power ratio of a system running a
typical business application on an enterprise Java platform.
73. ARCHITECTURAL OVERVIEW OF WSCS
• DATACENTER E
• MEASURING ENERGY EFFICIENCY
• Two separate benchmarking efforts aim to
characterize the efficiency of storage systems: the
Emerald Program by the Storage Networking
Industry Association (SNIA) and the SPC-2/E by
the Storage Performance Council.
• Both benchmarks measure storage servers under
different kinds of request activity and report
ratios of transaction throughput per Watt.
74. ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
• To better understand the potential impact of energy-
related optimizations, let us examine the total cost of
ownership (TCO) of a datacenter.
• At the top level, costs split up into capital expenses
(Capex) and operational expenses (Opex).
• Capex refers to investments that must be made
upfront and that are then depreciated over a certain
time frame; examples are the construction cost of a
datacenter or the purchase price of a server.
75. ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
• Opex refers to the recurring monthly costs of
actually running the equipment, excluding
depreciation: electricity costs, repairs and
maintenance, salaries of on-site personnel,
and so on.
• Thus, we have:
TCO = datacenter depreciation + datacenter Opex + server
depreciation + server Opex
77. ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
• The monthly depreciation cost (or amortization
cost) that results from the initial construction
expense depends on the duration over which the
investment is amortized (which is related to its
expected lifetime) and the assumed interest rate.
• Typically, datacenters are depreciated over
periods of 10–15 years.
• Under U.S. accounting rules, it is common to use
straight-line depreciation where the value of the
asset declines by a fixed amount each month.
78. ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
• For example, if we depreciate a $12/W
datacenter over 12 years, the depreciation cost is
$0.08/W per month.
• If we had to take out a loan to finance
construction at an interest rate of 8%, the
associated monthly interest payments add an
additional cost of $0.05/W, for a total of $0.13/W
per month.
• Typical interest rates vary over time, but many
companies will pay interest in the 7–12% range.
79. ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
To put the cost of energy into
perspective, Hamilton did a case
study to estimate the costs of a WSC.
He determined that the CAPEX of this
8 MW facility was $88M, and
that the roughly 46,000 servers and
corresponding networking
equipment added another
$79M to the CAPEX for the WSC.
80. ARCHITECTURAL OVERVIEW OF WSCS
• Cost of a WSC
•We can now price the total cost of energy, since U.S . accounting rules allow us to
convert CAPEX into OPEX.
•We can just amortize CAPEX as a fixed amount each month for the effective life of the
equipment.
•Note that the amortization rates differ significantly, from 10 years for the facility to 4
years for the networking equipment and 3 years for the servers.
•Hence, the WSC facility lasts a decade, but you need to replace the servers every 3
years and the networking equipment every 4 years.
•By amortizing the CAPEX, Hamilton came up with a monthly OPEX, including accounting
for the cost of borrowing money (5% annually) to pay for the WSC.
•At $3.8M, the monthly OPEX is about 2% of the CAPEX.
81. ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Since many companies with WSCs are competing vigorously
in the marketplace, up until recently, they have been
reluctant to share their latest innovations with the public
(and each other).
• In 2009, Google described a state-of-the-art WSC as of
2005.
• Google graciously provided an update of the 2007 status of
their WS C, making this section the most up-to-date
description of a Google WS C.
• Even more recently, Facebook described their latest
datacenter as part of
• http://opencompute.org.
82. ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
• Both Google and Microsoft have built WSCs using shipping
containers.
• The idea of building a WSC from containers is to make WSC
design modular.
• Each container is independent, and the only external
connections are networking, power, and water.
• The containers in turn supply networking, power, and
cooling to the servers placed inside them, so the job of the
WSC is to supply networking, power, and cold water to the
containers and to pump the resulting warm water to
external cooling towers and chillers.
84. ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
• Diagram is a cutaway drawing of a Google container.
• A container holds up to 1160 servers, so 45 containers
have space for 52,200 servers. (This WSC has about
40,000 servers.)
• The servers are stacked 20 high in racks that form two
long rows of 29 racks (also called bays) each, with one
row on each side of the container.
• The rack switches are 48-port, 1 Gbit/sec Ethernet
switches, which are placed in every other rack.
85. ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
• The Google WSC that we are looking at contains 45 40-
foot-long containers in a 300- foot by 250-foot space,
or 75,000 square feet (about 7000 square meters).
• To fit in the warehouse, 30 of the containers are
stacked two high, or 15 pairs of stacked containers.
• Although the location was not revealed, it was built at
the time that Google developed WSCs in The Dallas,
Oregon, which provides a moderate climate and is near
cheap hydroelectric power and Internet backbone
fiber.
86. ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
• This WSC offers 10 megawatts with a PUE of 1.23 over the prior 12
months.
• Of that 0.230 of PUE overhead, 85% goes to cooling losses (0.195
PUE) and 15% (0.035) goes to power losses.
• The system went live in November 2005, and this section describes
its state as of 2007.
• A Google container can handle up to 250 kilowatts. That means the
container can handle 780 watts per square foot (0.09 square
meters), or 133 watts per square foot across the entire 75,000-
square-foot space with 40 containers.
• However, the containers in this WSC average just 222 kilowatts
88. ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
• Servers In A Google WSC
• The server in Figure 6.21 has two sockets, each containing a
dual-core AMD Opteron processor running at 2.2 GHz. The
photo shows eight DIMMS, and these servers are typically
deployed with 8 GB of DDR2 DRA M.
• A novel feature is that the memory bus is down clocked to
533 MHz from the standard 666 MHz since the slower bus
has little impact on performance but a significant impact on
power.
• The baseline design has a single network interface card
(NIC) for a 1 Gbit/sec Ethernet link.
89. ARCHITECTURAL OVERVIEW OF WSCS
• A Google Warehouse-Scale Computer
• Containers
• Servers In A Google WSC
• Although the photo in Figure 6.21 shows two SATA disk drives, the
baseline server has just one.
• The peak power of the baseline is about 160 watts, and idle power is 85
watts.
• This baseline node is supplemented to offer a storage (or “diskfull”) node.
• First, a second tray containing 10 S ATA disks is connected to the server.
• To get one more disk, a second disk is placed into the empty spot on the
motherboard, giving the storage node 12 S ATA disks.
• Finally, since a storage node could saturate a single 1 Gbit/sec Ethernet
link, a second Ethernet NIC was added.
• Peak power for a storage node is about 300 watts, and it idles at 198
watts.