SlideShare a Scribd company logo
1 of 62
Presented by:
-Ankita Duggal
-Gurkamal Deep Singh Rakhra
-Keerthana Muniraj
-Preeti Sawant
Data Center Workload Measurement and
Analysis
1
What is a Data center ?
•A large group of networked computer servers typically used by
organizations for the remote storage, processing, or distribution of
large amounts of data.
•It doesn’t house only servers but also contains backup power
supplies, communication connections, air conditioning, fire
supplies etc.
•“A data center is a factory that transforms and stores bits”
A few glimpses of Data Center of a few organizations …
Rackspace - Richardson,TX Facebook – Lulea, Sweden
Google- Douglas County, Georgia Amazon – Virginia, outside Washington D.C
3
Google’s floating data center Aliyun (Alibaba) – Hangshou, China
Data Center workload
• Amount of processing that the computer has been given to do at a
given time.
• Workload — in the form of web requests, data analysis, multimedia rendering,
or other applications – is placed in the data center
Ref: http://searchdatacenter.techtarget.com/definition/workload
5
Classification of workloads based on time criticality
Critical Workloads Non-critical Workloads
“Cannot tolerate even a few minutes
of downtime”
can tolerate a wide range of outage times
Ways to improve data protection
• Prevent downtime by reducing resource contention :
Managers accommodates drastically changing demands on workloads by allowing easy creation of
additional workloads without changing or customizing its applications.
• Replicate workloads into cloud to create asymmetric “Hot back-ups”:
Clone the complete workload stack. Import into public/private cloud
• Using dissimilar infrastructure for off-premises redundancies:
Workloads are replicated off-site to different cloud providers.
• Concept of “Failures or Failback”  reserved only for critical workloads:
Automating the switching of users or processes from production to recovery instances
Characterizing Data Analysis workloads in Data Centers
• Data Analysis is important improving future performance of data center
• Data center workloads services workload (web search, media streaming)
data analysis workload ( business intelligence, machine learning )
• We concentrate on internet services workload here
• Data analysis workloads are diverse in speedup performance and micro-architectural characteristics.
Therefore, there is a need to analyze many applications
• 3 important application domains are in internet services are : 1) search engine 2) social networks 3) electronic
commerce
8
Workload requirements :
1)most important application domain
2) data is distributed, data can not be processed on single node
3)consider recently used data
9
Breakdown of Executed Instructions
10
DCBench :
• Benchmarks used to evaluate new designs and systems benefit
• DCBench is a benchmark suite for data center computing, with an
open source license.
• Includes online and offline workload
• Includes different programming model like MPI versus MapReduce
• Helpful for performing architecture and small to medium scale system
researches for data center computing.
11
Methodologies
12
Workflow Phases
Extract
• Look for raw data
• Generates stream
of data
Partition
• Divides stream into
buckets
Aggregate
• Combines/reduces
13
Patterns comprising traffic in Data Center
Work-seeks-
bandwidth
Scatter gather
pattern
14
Work-seeks-bandwidth
• chip designers prefer placing components that interact often (e.g.,
cpu-L1 cache, multiple CPU cores) close by to get high bandwidth
interconnections on the cheap
• Jobs are placed in data center that rely on heavy traffic exchanges
with each other in areas where high network bandwidth is available.
15
Contd..
This translates to the engineering decision of placing jobs within the
same server, within servers on the same rack or within servers in
the same VLAN and so on with decreasing order of preference and
hence the work-seeks-bandwidth pattern.
16
Scatter gather pattern
• data is partitioned into small chunks, each of which is worked on by
different servers, and the resulting answers are later aggregated.
17
Congestion
• Periods of low network utilization indicate
 Application that demands more of other resources- CPU, disk than network
 Application can be rewritten to make better use of available bandwidth
18
Evacuation event (congestion)
• When a server repeatedly experiences problems, the automated
management system in the cluster evacuates all the usable blocks on
that server prior to alerting a human that the server is ready to be re-
imaged.
19
Read failure
• When a job does not make any progress it is killed (unable to find
input data, or unable to connect to a machine)
20
Contd.
• To attribute network traffic to the applications that generate it, the
network event logs and logs at the application-level were merged that
describe which job and phase (e.g., map, reduce) were active at that
time. Results showed that, jobs in the reduce phase are responsible
for a fair amount of the network traffic.
• Note that in the reduce phase of a map-reduce job, data in each
partition that is present at multiple servers in the cluster (e.g., all
personnel records that start with ‘A‘) has to be pulled to the server
that handles the reduce for the partition .
21
Monitoring Data Center Workload
• For coordinated monitoring and control of data centers, the most
commonly approaches are based on Monitor, Analyze ,Plan and
Execute (MAPE ) control loops.
Overview
22
Modern Data Center Operation
• Workload in the form of web requests, data analysis, etc is placed in the
data center.
• An instrumentation infrastructure logs sensor readings.
• The results are fed into a policy engine that creates a plan to utilize
resources.
• External interfaces or Actuators implement the plan.
23
Workload Monitoring using Splice
• Splice aggregates sensor and performance data in a relational
database.
• It also gathers data from many sources through different interfaces
with different formats.
• Splice uses change of value filter that retains only those values that
differ significantly from the previously logged values.
• It reduces minimal loss of information.
24
Database Schema Of Splice
25
Implementation
• Splice uses change of value filter that retains only those values that
differ significantly from the previously logged values.
• It reduces minimal loss of information.
26
Analysis
• Data analysis is done by two main classes- attribute behavior and
correlation.
• Attribute behavior describes the value of the observed readings and
how those values change over time.
• Data correlation methods determine the strength of the correlations
among the attributes affecting each other.
27
Virtualization in Data Centers
• Virtualization is a combination of software and hardware features that creates
virtual CPUs (vCPU) or virtual systems-on-chip (vSoC).
• Virtualization provides the required level of isolation and partitioning of resources.
• Each VM is protected from interference from another VM.
Reference: Multicore Processing: Virtualization and Data Center By: Syed Shah, Nikolay Guenov
Why Virtualization
• Reduced power consumption and building space, providing high availability
for critical applications and streamlining application deployment and
migration.
• To support multiple operating systems and consolidation of services on a
single server by defining multiple VMs.
• Multiple VMs can run on a single server, the advantage is of reduced server
inventory and better server utilization.
Reference: Multicore Processing: Virtualization and Data Center By: Syed Shah,
Nikolay Guenov
Benefits Of Virtualization
Reference: Multicore Processing: Virtualization and Data Center By: Syed Shah,
Nikolay Guenov
Multi Core Processing
• A multi-core processor is a single computing component with two or more
independent actual processing units (called "cores"), which are the units that read
and execute program instructions.
Reference: Multicore Processing: Virtualization and Data Center By: Syed Shah,
Nikolay Guenov
Virtualization and
Multicore Processing
• With multicore SoCs, given enough processing capacity and virtualization, control
plane applications and data plane applications can be run without one affecting the
other.
• Data or control traffic that is relevant to the customized application and operating
system (OS) can be directed to the appropriate virtualized core without impacting
or compromising the rest of the system.
Reference: Multicore Processing: Virtualization and Data Center By: Syed Shah,
Nikolay Guenov
Control and Data Plane Application Consolidation in
virtualized Multicore SoC
• Functions that were previously implemented on different boards now can be
consolidated onto a single card and a single multicore SoC.
Reference: Multicore Processing: Virtualization and Data Center By: Syed Shah,
Nikolay Guenov
Data center Reliability
Network Reliability
Characterizing most failure
prone network elements
Estimating the impact of
failures
Analyzing the
effectiveness of network
redundancy
Reference: Understanding Network Failures in Data Centers: Measurement,
Analysis, and Implications By: Phillipa Gill, Navendu Jain, Microsoft Research
Key Observations
• Data center networks are reliable
• Low-cost, commodity switches are highly reliable
• Load balancers experience a high number of software faults
• Failures potentially cause loss of a large number of small packets.
• Network redundancy helps, but it is not entirely effective
Reference: Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications By: Phillipa
Gill, Navendu Jain Microsoft Research
Reasons to change from traditional
Significant changes in computing power, network bandwidth, and
network file system usage
• Network file system workloads
• No CIFS protocol studies
• Limited file system workloads
Reference: Measurement and Analysis of Large-Scale Network File System Workloads by Andrew W. Leung, Shankar
Pasupathy, Garth Goodson, Ethan L. Miller
Access Pattern
Read Only Write Only Read and Write
Analysis
• File Access Patterns:
Reference: Measurement and Analysis of Large-Scale Network File System Workloads by Andrew W. Leung, Shankar
Pasupathy, Garth Goodson, Ethan L. Miller
Sequential Access
Entire Partial
• Sequentiality Analysis:
Reference: Measurement and Analysis of Large-Scale Network File System Workloads
by Andrew W. Leung, Shankar Pasupathy, Garth Goodson, Ethan L. Miller
File Lifetime
• CIFS, files can be either deleted through an explicit delete request,
which frees the entire file and its name, or through truncation, which
only frees the data
• CIFS users begin a connection to the file server by creating an
authenticated user session and end by eventually logging off.
Reference: Measurement and Analysis of Large-Scale Network File System Workloads by Andrew W. Leung,
Shankar Pasupathy, Garth Goodson, Ethan L. M.
Architecture
Load Balancer
IP address to which requests are
sent is called a virtual IP address
(VIP)
IP addresses of the servers over
which the requests are spread are
known as direct IP addresses
(DIPs).
• Inside the data center, requests are spread among a pool of front- end
servers that process the requests. This spreading is typically performed
by a specialized load balancer.
Reference: Towards a Next Generation Data Center Architecture: Scalability and
Commoditization By Albert Greenberg, David A. Maltz Microsoft Research, WA, USA
Challenges and Requirements
Challenges
• Fragmentation of resources
• Poor server to server connectivity
• Proprietary hardware that scales up, not out
Requirements:
• Placement anywhere
• Server to server bandwidth
• Commodity hardware that scales out
• Support 100,000 servers
Reference: Towards a Next Generation Data Center Architecture: Scalability and
Commoditization By Albert Greenberg, David A. Maltz Microsoft Research, WA, USA
Load Balancing
Load Balancing
Load Spreading:
requests spread evenly over a
pool of servers
Load Balancing:
place load balancers in front of the
actual servers
Reference: Towards a Next Generation Data Center Architecture: Scalability and
Commoditization By Albert Greenberg, David A. Maltz Microsoft Research, WA, USA
Case studies
44
– a few real-time
scenarios
Why build a Data center at Virginia when there is one at California?
• Reduce the time to send a page to users on the East Coast
• California – running out of space
Virginia – lots of room to grow
• restricting to one datacenter meant that in the event of disaster(earthquake,
power failure, Godzilla) Facebook could be usable for extended amount of time.
The hardware and network were set up soon..but how to
handle cache consistency?
Master
DB
Sl
Facebook’s Scheduling with Corona
• With Facebook’s user base expanding at an enormous rate, the
development of a new scheduling framework called CORONA came
into place.
• Initially a MapReduce implementation of Apache Hadoop served as
the foundation of the infrastructure. But this system over the years
developed several issues. These were:
 Scheduling overhead
 Pull based scheduling model
 Static slot-based resource management model
Facebook’s Solution • Corona introduces a cluster manager
whose only purpose is to track the
nodes in the cluster and the amount of
free resources.
• Corona uses push based scheduling.
This reduces scheduling latency.
• The separation of duties allows Corona
to manage a lot more jobs and achieve
better cluster utilization.
• The cluster manager also implements
fair-share scheduling.
Future of Corona
• New features such as
Resource based scheduling than slot based model
Online upgrades to the cluster manager
Expansion of user base by scheduling applications such as Peregrine
Characterizing backend
workload(at Google)
Ref: Towards Characterizing Cloud Backend Workloads: Insights from Google Compute Clusters (Asit K. Mishra Joseph L.
Hellerstein Walfredo Cirne Chita R. Das)
Pre-requisites
• Capacity planning to determine which machine resources must grow
and by how much and
• Task scheduling to achieve high machine utilization and to meet
service level objectives
• Both these require good understanding of task resource consumption
i.e CPU and memory usage.
The approaches
1. Make each task its own workload
Scales poorly since tens of thousands of tasks execute daily on google
computes clusters.
2. View all tasks as belonging to one single task
Results on large variances in predicted resource consumptions.
The proposed methodology
• identifying the workload dimensions
• constructing task classes using an off-the-shelf algorithm such as k-
means
• determining the break points for qualitative coordinates within the
workload dimensions
• merging adjacent task classes to reduce the number of workloads
Based on
• the duration of task executions is bimodal in that tasks either have a
short duration or a long duration
• most tasks have short durations
• Most resources are consumed by a few tasks with long duration that
have large demands for CPU and memory
Objective
• construct a small number of task classes such that tasks within each
class have similar resource usage.
• We use qualitative coordinates to distinguish workload- small(s),
medium(m), large(l)
First step
• Identify the workload dimensions.
• For example, in analysis of the Google Cloud Backend, the workload
dimensions are task duration, average core usage, and average
memory usage
Second step
• Constructs preliminary task classes that have fairly homogeneous
resource usage. It is done by using the workload dimensions as a
feature vector and applying an off-the-shelf clustering algorithm such
as k-means
Third step
• determining the break points for the qualitative coordinates of the
workload dimensions. It has two considerations. First, break points
must be consistent across workloads. For example, the qualitative
coordinate small for duration must have the same break point (e.g., 2
hours) for all workloads. Second, the result should produce low
within-class variability
Fourth step
merges classes to form the final set of task classes. These classes define
our workloads. This involves combining “adjacent” preliminary task
classes. Adjacency is based on the qualitative coordinates of the class.
For example, in the Google data, duration has qualitative coordinates
small and large; for cores and memory, the qualitative coordinates are
small, medium, large. Thus, the workload smm is adjacent to sms and
sml in the third dimension. Two preliminary classes are merged if the
CV(coefficient of variance) of the merged classes does not differ much
from the CVs of each of the preliminary classes. Merged classes are
denoted by the wild card “*”. For example, merging the classes sms,
smm and sml yields the class sm*
• click me
61
Questions?
62

More Related Content

Recently uploaded

PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 

Recently uploaded (17)

PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Data Center Workload Measurement and Analysis

  • 1. Presented by: -Ankita Duggal -Gurkamal Deep Singh Rakhra -Keerthana Muniraj -Preeti Sawant Data Center Workload Measurement and Analysis 1
  • 2. What is a Data center ? •A large group of networked computer servers typically used by organizations for the remote storage, processing, or distribution of large amounts of data. •It doesn’t house only servers but also contains backup power supplies, communication connections, air conditioning, fire supplies etc. •“A data center is a factory that transforms and stores bits”
  • 3. A few glimpses of Data Center of a few organizations … Rackspace - Richardson,TX Facebook – Lulea, Sweden Google- Douglas County, Georgia Amazon – Virginia, outside Washington D.C 3
  • 4. Google’s floating data center Aliyun (Alibaba) – Hangshou, China
  • 5. Data Center workload • Amount of processing that the computer has been given to do at a given time. • Workload — in the form of web requests, data analysis, multimedia rendering, or other applications – is placed in the data center Ref: http://searchdatacenter.techtarget.com/definition/workload 5
  • 6. Classification of workloads based on time criticality Critical Workloads Non-critical Workloads “Cannot tolerate even a few minutes of downtime” can tolerate a wide range of outage times
  • 7. Ways to improve data protection • Prevent downtime by reducing resource contention : Managers accommodates drastically changing demands on workloads by allowing easy creation of additional workloads without changing or customizing its applications. • Replicate workloads into cloud to create asymmetric “Hot back-ups”: Clone the complete workload stack. Import into public/private cloud • Using dissimilar infrastructure for off-premises redundancies: Workloads are replicated off-site to different cloud providers. • Concept of “Failures or Failback”  reserved only for critical workloads: Automating the switching of users or processes from production to recovery instances
  • 8. Characterizing Data Analysis workloads in Data Centers • Data Analysis is important improving future performance of data center • Data center workloads services workload (web search, media streaming) data analysis workload ( business intelligence, machine learning ) • We concentrate on internet services workload here • Data analysis workloads are diverse in speedup performance and micro-architectural characteristics. Therefore, there is a need to analyze many applications • 3 important application domains are in internet services are : 1) search engine 2) social networks 3) electronic commerce 8
  • 9. Workload requirements : 1)most important application domain 2) data is distributed, data can not be processed on single node 3)consider recently used data 9
  • 10. Breakdown of Executed Instructions 10
  • 11. DCBench : • Benchmarks used to evaluate new designs and systems benefit • DCBench is a benchmark suite for data center computing, with an open source license. • Includes online and offline workload • Includes different programming model like MPI versus MapReduce • Helpful for performing architecture and small to medium scale system researches for data center computing. 11
  • 13. Workflow Phases Extract • Look for raw data • Generates stream of data Partition • Divides stream into buckets Aggregate • Combines/reduces 13
  • 14. Patterns comprising traffic in Data Center Work-seeks- bandwidth Scatter gather pattern 14
  • 15. Work-seeks-bandwidth • chip designers prefer placing components that interact often (e.g., cpu-L1 cache, multiple CPU cores) close by to get high bandwidth interconnections on the cheap • Jobs are placed in data center that rely on heavy traffic exchanges with each other in areas where high network bandwidth is available. 15
  • 16. Contd.. This translates to the engineering decision of placing jobs within the same server, within servers on the same rack or within servers in the same VLAN and so on with decreasing order of preference and hence the work-seeks-bandwidth pattern. 16
  • 17. Scatter gather pattern • data is partitioned into small chunks, each of which is worked on by different servers, and the resulting answers are later aggregated. 17
  • 18. Congestion • Periods of low network utilization indicate  Application that demands more of other resources- CPU, disk than network  Application can be rewritten to make better use of available bandwidth 18
  • 19. Evacuation event (congestion) • When a server repeatedly experiences problems, the automated management system in the cluster evacuates all the usable blocks on that server prior to alerting a human that the server is ready to be re- imaged. 19
  • 20. Read failure • When a job does not make any progress it is killed (unable to find input data, or unable to connect to a machine) 20
  • 21. Contd. • To attribute network traffic to the applications that generate it, the network event logs and logs at the application-level were merged that describe which job and phase (e.g., map, reduce) were active at that time. Results showed that, jobs in the reduce phase are responsible for a fair amount of the network traffic. • Note that in the reduce phase of a map-reduce job, data in each partition that is present at multiple servers in the cluster (e.g., all personnel records that start with ‘A‘) has to be pulled to the server that handles the reduce for the partition . 21
  • 22. Monitoring Data Center Workload • For coordinated monitoring and control of data centers, the most commonly approaches are based on Monitor, Analyze ,Plan and Execute (MAPE ) control loops. Overview 22
  • 23. Modern Data Center Operation • Workload in the form of web requests, data analysis, etc is placed in the data center. • An instrumentation infrastructure logs sensor readings. • The results are fed into a policy engine that creates a plan to utilize resources. • External interfaces or Actuators implement the plan. 23
  • 24. Workload Monitoring using Splice • Splice aggregates sensor and performance data in a relational database. • It also gathers data from many sources through different interfaces with different formats. • Splice uses change of value filter that retains only those values that differ significantly from the previously logged values. • It reduces minimal loss of information. 24
  • 25. Database Schema Of Splice 25
  • 26. Implementation • Splice uses change of value filter that retains only those values that differ significantly from the previously logged values. • It reduces minimal loss of information. 26
  • 27. Analysis • Data analysis is done by two main classes- attribute behavior and correlation. • Attribute behavior describes the value of the observed readings and how those values change over time. • Data correlation methods determine the strength of the correlations among the attributes affecting each other. 27
  • 28. Virtualization in Data Centers • Virtualization is a combination of software and hardware features that creates virtual CPUs (vCPU) or virtual systems-on-chip (vSoC). • Virtualization provides the required level of isolation and partitioning of resources. • Each VM is protected from interference from another VM. Reference: Multicore Processing: Virtualization and Data Center By: Syed Shah, Nikolay Guenov
  • 29. Why Virtualization • Reduced power consumption and building space, providing high availability for critical applications and streamlining application deployment and migration. • To support multiple operating systems and consolidation of services on a single server by defining multiple VMs. • Multiple VMs can run on a single server, the advantage is of reduced server inventory and better server utilization. Reference: Multicore Processing: Virtualization and Data Center By: Syed Shah, Nikolay Guenov
  • 30. Benefits Of Virtualization Reference: Multicore Processing: Virtualization and Data Center By: Syed Shah, Nikolay Guenov
  • 31. Multi Core Processing • A multi-core processor is a single computing component with two or more independent actual processing units (called "cores"), which are the units that read and execute program instructions. Reference: Multicore Processing: Virtualization and Data Center By: Syed Shah, Nikolay Guenov
  • 32. Virtualization and Multicore Processing • With multicore SoCs, given enough processing capacity and virtualization, control plane applications and data plane applications can be run without one affecting the other. • Data or control traffic that is relevant to the customized application and operating system (OS) can be directed to the appropriate virtualized core without impacting or compromising the rest of the system. Reference: Multicore Processing: Virtualization and Data Center By: Syed Shah, Nikolay Guenov
  • 33. Control and Data Plane Application Consolidation in virtualized Multicore SoC
  • 34. • Functions that were previously implemented on different boards now can be consolidated onto a single card and a single multicore SoC. Reference: Multicore Processing: Virtualization and Data Center By: Syed Shah, Nikolay Guenov
  • 35. Data center Reliability Network Reliability Characterizing most failure prone network elements Estimating the impact of failures Analyzing the effectiveness of network redundancy Reference: Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications By: Phillipa Gill, Navendu Jain, Microsoft Research
  • 36. Key Observations • Data center networks are reliable • Low-cost, commodity switches are highly reliable • Load balancers experience a high number of software faults • Failures potentially cause loss of a large number of small packets. • Network redundancy helps, but it is not entirely effective Reference: Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications By: Phillipa Gill, Navendu Jain Microsoft Research
  • 37. Reasons to change from traditional Significant changes in computing power, network bandwidth, and network file system usage • Network file system workloads • No CIFS protocol studies • Limited file system workloads Reference: Measurement and Analysis of Large-Scale Network File System Workloads by Andrew W. Leung, Shankar Pasupathy, Garth Goodson, Ethan L. Miller
  • 38. Access Pattern Read Only Write Only Read and Write Analysis • File Access Patterns: Reference: Measurement and Analysis of Large-Scale Network File System Workloads by Andrew W. Leung, Shankar Pasupathy, Garth Goodson, Ethan L. Miller
  • 39. Sequential Access Entire Partial • Sequentiality Analysis: Reference: Measurement and Analysis of Large-Scale Network File System Workloads by Andrew W. Leung, Shankar Pasupathy, Garth Goodson, Ethan L. Miller
  • 40. File Lifetime • CIFS, files can be either deleted through an explicit delete request, which frees the entire file and its name, or through truncation, which only frees the data • CIFS users begin a connection to the file server by creating an authenticated user session and end by eventually logging off. Reference: Measurement and Analysis of Large-Scale Network File System Workloads by Andrew W. Leung, Shankar Pasupathy, Garth Goodson, Ethan L. M.
  • 41. Architecture Load Balancer IP address to which requests are sent is called a virtual IP address (VIP) IP addresses of the servers over which the requests are spread are known as direct IP addresses (DIPs). • Inside the data center, requests are spread among a pool of front- end servers that process the requests. This spreading is typically performed by a specialized load balancer. Reference: Towards a Next Generation Data Center Architecture: Scalability and Commoditization By Albert Greenberg, David A. Maltz Microsoft Research, WA, USA
  • 42. Challenges and Requirements Challenges • Fragmentation of resources • Poor server to server connectivity • Proprietary hardware that scales up, not out Requirements: • Placement anywhere • Server to server bandwidth • Commodity hardware that scales out • Support 100,000 servers Reference: Towards a Next Generation Data Center Architecture: Scalability and Commoditization By Albert Greenberg, David A. Maltz Microsoft Research, WA, USA
  • 43. Load Balancing Load Balancing Load Spreading: requests spread evenly over a pool of servers Load Balancing: place load balancers in front of the actual servers Reference: Towards a Next Generation Data Center Architecture: Scalability and Commoditization By Albert Greenberg, David A. Maltz Microsoft Research, WA, USA
  • 45. – a few real-time scenarios Why build a Data center at Virginia when there is one at California? • Reduce the time to send a page to users on the East Coast • California – running out of space Virginia – lots of room to grow • restricting to one datacenter meant that in the event of disaster(earthquake, power failure, Godzilla) Facebook could be usable for extended amount of time.
  • 46. The hardware and network were set up soon..but how to handle cache consistency? Master DB Sl
  • 47. Facebook’s Scheduling with Corona • With Facebook’s user base expanding at an enormous rate, the development of a new scheduling framework called CORONA came into place. • Initially a MapReduce implementation of Apache Hadoop served as the foundation of the infrastructure. But this system over the years developed several issues. These were:  Scheduling overhead  Pull based scheduling model  Static slot-based resource management model
  • 48. Facebook’s Solution • Corona introduces a cluster manager whose only purpose is to track the nodes in the cluster and the amount of free resources. • Corona uses push based scheduling. This reduces scheduling latency. • The separation of duties allows Corona to manage a lot more jobs and achieve better cluster utilization. • The cluster manager also implements fair-share scheduling.
  • 49. Future of Corona • New features such as Resource based scheduling than slot based model Online upgrades to the cluster manager Expansion of user base by scheduling applications such as Peregrine
  • 50. Characterizing backend workload(at Google) Ref: Towards Characterizing Cloud Backend Workloads: Insights from Google Compute Clusters (Asit K. Mishra Joseph L. Hellerstein Walfredo Cirne Chita R. Das)
  • 51. Pre-requisites • Capacity planning to determine which machine resources must grow and by how much and • Task scheduling to achieve high machine utilization and to meet service level objectives • Both these require good understanding of task resource consumption i.e CPU and memory usage.
  • 52. The approaches 1. Make each task its own workload Scales poorly since tens of thousands of tasks execute daily on google computes clusters. 2. View all tasks as belonging to one single task Results on large variances in predicted resource consumptions.
  • 53. The proposed methodology • identifying the workload dimensions • constructing task classes using an off-the-shelf algorithm such as k- means • determining the break points for qualitative coordinates within the workload dimensions • merging adjacent task classes to reduce the number of workloads
  • 54. Based on • the duration of task executions is bimodal in that tasks either have a short duration or a long duration • most tasks have short durations • Most resources are consumed by a few tasks with long duration that have large demands for CPU and memory
  • 55. Objective • construct a small number of task classes such that tasks within each class have similar resource usage. • We use qualitative coordinates to distinguish workload- small(s), medium(m), large(l)
  • 56.
  • 57. First step • Identify the workload dimensions. • For example, in analysis of the Google Cloud Backend, the workload dimensions are task duration, average core usage, and average memory usage
  • 58. Second step • Constructs preliminary task classes that have fairly homogeneous resource usage. It is done by using the workload dimensions as a feature vector and applying an off-the-shelf clustering algorithm such as k-means
  • 59. Third step • determining the break points for the qualitative coordinates of the workload dimensions. It has two considerations. First, break points must be consistent across workloads. For example, the qualitative coordinate small for duration must have the same break point (e.g., 2 hours) for all workloads. Second, the result should produce low within-class variability
  • 60. Fourth step merges classes to form the final set of task classes. These classes define our workloads. This involves combining “adjacent” preliminary task classes. Adjacency is based on the qualitative coordinates of the class. For example, in the Google data, duration has qualitative coordinates small and large; for cores and memory, the qualitative coordinates are small, medium, large. Thus, the workload smm is adjacent to sms and sml in the third dimension. Two preliminary classes are merged if the CV(coefficient of variance) of the merged classes does not differ much from the CVs of each of the preliminary classes. Merged classes are denoted by the wild card “*”. For example, merging the classes sms, smm and sml yields the class sm*