SlideShare a Scribd company logo
1 of 25
Evaluating and reducing cloud waste and
cost—A data-driven case study from Azure
workloads
Brad Everman, Maxim Gao, Ziliang Zong
Sustainable Computing: Informatics and Systems 35 (2022)
Mehedi Hasan Raju
Outline
 Introduction
 Related Work
 Workload Analysis
 Cloud waste and cost analysis
 Metrics- CWP, CWI, CUS
 Limitations
 Conclusions
2
Introduction
 Cloud waste is common when users provision resources beyond
what they need.
 The user behaviors in the cloud could provide viable solutions to
reduce cloud cost and waste.
 This paper addresses these concerns by conducting a
comprehensive analysis of the Microsoft Azure 2019 traces.
 A large portion of VMs are under-utilized or over-provisioned for
resources.
 Cloud Waste Points (CWP) for quantitatively evaluating the waste of
each VM.
 Categorizing VMs based on cloud resource utilization.
3
Introduction (contd.)
 Cloud Waste Indicator (CWI) to classify Azure users as red, green,
and normal users, depending on their efficiency in utilizing cloud
resources.
 In addition, we introduce Cloud Utilization Score (CUS) to rank the
relative performance of Azure users in term of cloud waste.
 Lastly, we propose an algorithm to identify red VMs and
recommend lower priced VMs that can help users reduce cost
without compromising quality of service (QoS).
4
Related works
 Studies related to cost reduction of large-scale cloud systems
 Survey on various techniques that can reduce the energy usage of
MapReduce in Hadoop systems [1]
 Analysis tool to measure energy consumption in cloud environments
based on different runtime tasks [2]
 a genetic-based optimization algorithm to reduce the energy cost of
cloud systems with phase change memory [3]
 an intra and inter-server smart task scheduling algorithm, which can
jointly optimize profit and energy when allocating jobs to datacenters [4]
5
Related works (contd.)
 Three papers considered the Microsoft Azure cloud platform
directly.
 Design and implementation of Protean, which allocates VMs on the Azure
platform [5]
 Shahrad et al. focused specifically on Function as a Service (FaaS) in the
Azure system [6]
 The most relevant work was published by Cortez[7]
 It utilized the 2017 Azure data traces.
 discussed certain behaviors that can be used to predict future
behavior of VM workloads.
 However, their predictive model aimed to increase the utilization of
Azure from the system.
6
Workload Analysis
 Microsoft Azure is a public cloud computing platform.
 We analyze the 2019 trace
 subset of applications running on Azure during July of 2019
 235 GB of data contained within 198 files
 30 consecutive days of VM readings
 ~2.7 M total VMs
 6,687 individual users
7
Descriptive analysis
 How users utilize Azure cloud
8
Descriptive analysis
9
Descriptive analysis
10
Cloud waste and cost analysis
 Azure Pricing Model
 VM is priced based on requested core count and memory size
 Factors – choice of operating systems, cloud services region, server types
etc.
 Information to apply complex pricing is missing in the Azure traces
11
Cloud waste and cost analysis
 Assumptions
 Deployed VMs will run on Linux (CentOS or Ubuntu).
 VM will be deployed to the US-West (California) region.
 VM is general purpose, not CPU or Memory-optimized.
 Minimal storage is available for each VM.
 Users ‘‘pay as they go’’, and do not receive any discounts for pre-paying
nor volume purchasing.
 Interactive VMs costs 3.33x of the price of a Delay-Insensitive VMs
 based on the information from Google Cloud [8]
 Unknown categorized VMs as are considered Delay-Insensitive when
calculating price.
12
Cloud waste and cost analysis
13
 VM Cost Calculation
VM Cost = VM Lifetime * VM Price
• It was not provided in the Azure traces.
• To calculate the cost of each VM, we need to know the price of each VM
and its corresponding lifetime.
• VM Price
• VM Lifetime - the length of time in hours a VM exists, which is calculated
by the difference between the creation and deletion timestamps.
• Core Hours= VM lifetime * number of cores of that VM.
• Core hours are used to indicate the computation resources utilized
by a VM.
Cloud waste and cost analysis
 Green and red VMs
 10% is a very conservative threshold
 higher threshold will yield more cost savings
14
Cloud waste and cost analysis
Why is the utilization so low ?
 Not enough work
 Lack of parallel computing
• sequential application cannot leverage multiple virtual cores
• requesting more cores for such applications would decrease
overall CPU utilization
 Improvement of hardware
15
Cloud waste and cost analysis
 VM cloud waste points (CWP)
• CWP = VM lifetime * corresponding waste factor
16
Cloud waste and cost analysis
 VM cloud waste points (CWP) distribution
17
Cloud waste and cost analysis
 Cloud Waste Indicator (CWI)
• average CWP of all VMs deployed by an individual user
• Categorize users into three different groups—the green, normal, red
users
 To determine the delineation
• Remove users with less than 200 total core hours.
• Dropped 6,121 users (i.e. ∼92% of all users).
• calculate the CWI of each user then normalized their CWI.
• 𝐶𝑊𝐼𝑛𝑜𝑟𝑚 = (𝐶𝑊𝐼𝑖 − 𝑚𝑖𝑛(𝐶𝑊𝐼))/(𝑚𝑎𝑥(𝐶𝑊𝐼) − 𝑚𝑖𝑛(𝐶𝑊𝐼))
• CWInorm is a value between 0 and 1.
• CWInorm threshold
• green users- 0.01
• red users – 0.05
18
Cloud waste and cost analysis
 Cloud utilization score (CUS)
 calculated by providing the percentile rank of their CWInorm in
comparison with other users in the cloud
19
Cloud waste and cost analysis
 Recommendation algorithm
• It creates a red VMs list for each user.
• It calculates the CWP of all VMs in the list and sorts the VMs in the list in
descending order.
• Third, it sends recommendations to that user asking to migrate down the
top n number of VMs in the list by one level
• This algorithm repeats for all users.
20
Cloud waste and cost analysis
21
 Cost savings (from all red VMs)
• Original Cost: $61,595,170.23
• Total VMs: 2,695,548
• VMs with Savings: 1,369,364
• Percent of VMs with Savings: 51%
• New Cost: $39,341,202.17
• Total Savings: $22,253,968.06
Limitations
 Lack of information regarding the nature of jobs and applications
running on each VM
• could affect the quality of our recommendations.
 Lack of information about memory usage of each VM (in traces).
 Over-provisioning of memory
 Multiple assumptions
• assumptions made in pricing model
• assumptions on user behavior
22
Conclusions
 Studying the user behaviors in the cloud, providing viable solution
to reduce cloud cost and waste
 comprehensive analysis of the Microsoft Azure 2019 traces
• VMs are underutilized or over-provisioned for resources
 Mitigate the cloud waste problem and save cost
 proposing some metrics – CWP, CWI, CUS
 Experimental results show that over $22 million savings can be
achieved.
23
Reference
1. M. Alalawi, H. Daly, A survey on hadoop MapReduce energy efficient techniques for intensive workload, in: Proceedings of
the International Conference on Big Data and Internet of Thing, in: BDIOT2017, Association for Computing Machinery, New
York, NY, USA, 2017, pp. 62–66
2. F. Chen, J.-G. Schneider, Y. Yang, J. Grundy, Q. He, An energy consumption model and analysis tool for cloud computing
environments, in: 2012 First International Workshop on Green and Sustainable Software (GREENS), 2012, pp. 45–50
3. M. Qiu, Z. Ming, J. Li, K. Gai, Z. Zong, Phase-change memory optimization for green cloud with genetic algorithm, IEEE Trans.
Comput. 64 (12) (2015) 3528–3540
4. S. Mamun, A. Gilday, A. Singh, A. Ganguly, G. Merrett, X. Wang, B. AIHashimi, Intra- and inter-server smart task scheduling
for profit and energy optimization of HPC data centers, J. Low Power Electron. Appl. 10 (2020) 32
5. O. Hadary, L. Marshall, I. Menache, A. Pan, E.E. Greeff, D. Dion, S. Dorminey, S. Joshi, Y. Chen, M. Russinovich, T. Moscibroda,
Protean: VM allocation service at scale, in: 14th USENIX Symposium on Operating Systems Design and Implementation
(OSDI 20), USENIX Association, 2020, pp. 845–861
6. M. Shahrad, R. Fonseca, I. Goiri, G. Chaudhry, P. Batum, J. Cooke, E. Laureano, C. Tresness, M. Russinovich, R. Bianchini,
Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider, in: 2020 USENIX
Annual Technical Conference (USENIX ATC 20), USENIX Association, 2020, pp. 205–218
7. E. Cortez, A. Bonde, A. Muzio, M. Russinovich, M. Fontoura, R. Bianchini, Resource central: Understanding and predicting
workloads for improved resource management in large cloud platforms, in: Proceedings of the 26th Symposium on
Operating Systems Principles, in: SOSP ’17, Association for Computing Machinery, New York, NY, USA, 2017, pp. 153–167
8. Google, Google cloud pricing, 2021, https://cloud.google.com/pricing
24
Questions?
25

More Related Content

Similar to Evaluating and reducing cloud waste and cost—A data-driven case study from Azure workloads

Energy efficient resource allocation in cloud computing
Energy efficient resource allocation in cloud computingEnergy efficient resource allocation in cloud computing
Energy efficient resource allocation in cloud computingDivaynshu Totla
 
IRJET- Time and Resource Efficient Task Scheduling in Cloud Computing Environ...
IRJET- Time and Resource Efficient Task Scheduling in Cloud Computing Environ...IRJET- Time and Resource Efficient Task Scheduling in Cloud Computing Environ...
IRJET- Time and Resource Efficient Task Scheduling in Cloud Computing Environ...IRJET Journal
 
Task Performance Analysis in Virtual Cloud Environment
Task Performance Analysis in Virtual Cloud EnvironmentTask Performance Analysis in Virtual Cloud Environment
Task Performance Analysis in Virtual Cloud EnvironmentRSIS International
 
IRJET- Cloud Cost Analyzer and Optimizer
IRJET- Cloud Cost Analyzer and OptimizerIRJET- Cloud Cost Analyzer and Optimizer
IRJET- Cloud Cost Analyzer and OptimizerIRJET Journal
 
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...IRJET Journal
 
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...AM Publications
 
Survey: An Optimized Energy Consumption of Resources in Cloud Data Centers
Survey: An Optimized Energy Consumption of Resources in Cloud Data CentersSurvey: An Optimized Energy Consumption of Resources in Cloud Data Centers
Survey: An Optimized Energy Consumption of Resources in Cloud Data CentersIJCSIS Research Publications
 
G-SLAM:OPTIMIZING ENERGY EFFIIENCY IN CLOUD
G-SLAM:OPTIMIZING ENERGY EFFIIENCY IN CLOUDG-SLAM:OPTIMIZING ENERGY EFFIIENCY IN CLOUD
G-SLAM:OPTIMIZING ENERGY EFFIIENCY IN CLOUDAlfiya Mahmood
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
排隊理論_An Exploration of The Optimization of Executive Scheduling in The Cloud ...
排隊理論_An Exploration of The Optimization of Executive Scheduling in The Cloud ...排隊理論_An Exploration of The Optimization of Executive Scheduling in The Cloud ...
排隊理論_An Exploration of The Optimization of Executive Scheduling in The Cloud ...婉萍 蔡
 
Power consumption prediction in cloud data center using machine learning
Power consumption prediction in cloud data center using machine learningPower consumption prediction in cloud data center using machine learning
Power consumption prediction in cloud data center using machine learningIJECEIAES
 
Green cloud computing
Green cloud computingGreen cloud computing
Green cloud computingNalini Mehta
 
Green cloud computing
Green cloud computing Green cloud computing
Green cloud computing JauwadSyed
 
Performance analysis of an energy efficient virtual machine consolidation alg...
Performance analysis of an energy efficient virtual machine consolidation alg...Performance analysis of an energy efficient virtual machine consolidation alg...
Performance analysis of an energy efficient virtual machine consolidation alg...IAEME Publication
 
IRJET- Efficient Resource Allocation for Heterogeneous Workloads in Iaas Clouds
IRJET- Efficient Resource Allocation for Heterogeneous Workloads in Iaas CloudsIRJET- Efficient Resource Allocation for Heterogeneous Workloads in Iaas Clouds
IRJET- Efficient Resource Allocation for Heterogeneous Workloads in Iaas CloudsIRJET Journal
 
IRJET- An Energy-Saving Task Scheduling Strategy based on Vacation Queuing & ...
IRJET- An Energy-Saving Task Scheduling Strategy based on Vacation Queuing & ...IRJET- An Energy-Saving Task Scheduling Strategy based on Vacation Queuing & ...
IRJET- An Energy-Saving Task Scheduling Strategy based on Vacation Queuing & ...IRJET Journal
 

Similar to Evaluating and reducing cloud waste and cost—A data-driven case study from Azure workloads (20)

Energy efficient resource allocation in cloud computing
Energy efficient resource allocation in cloud computingEnergy efficient resource allocation in cloud computing
Energy efficient resource allocation in cloud computing
 
IRJET- Time and Resource Efficient Task Scheduling in Cloud Computing Environ...
IRJET- Time and Resource Efficient Task Scheduling in Cloud Computing Environ...IRJET- Time and Resource Efficient Task Scheduling in Cloud Computing Environ...
IRJET- Time and Resource Efficient Task Scheduling in Cloud Computing Environ...
 
Task Performance Analysis in Virtual Cloud Environment
Task Performance Analysis in Virtual Cloud EnvironmentTask Performance Analysis in Virtual Cloud Environment
Task Performance Analysis in Virtual Cloud Environment
 
IRJET- Cloud Cost Analyzer and Optimizer
IRJET- Cloud Cost Analyzer and OptimizerIRJET- Cloud Cost Analyzer and Optimizer
IRJET- Cloud Cost Analyzer and Optimizer
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
 
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
ANALYSIS ON LOAD BALANCING ALGORITHMS IMPLEMENTATION ON CLOUD COMPUTING ENVIR...
 
Survey: An Optimized Energy Consumption of Resources in Cloud Data Centers
Survey: An Optimized Energy Consumption of Resources in Cloud Data CentersSurvey: An Optimized Energy Consumption of Resources in Cloud Data Centers
Survey: An Optimized Energy Consumption of Resources in Cloud Data Centers
 
G-SLAM:OPTIMIZING ENERGY EFFIIENCY IN CLOUD
G-SLAM:OPTIMIZING ENERGY EFFIIENCY IN CLOUDG-SLAM:OPTIMIZING ENERGY EFFIIENCY IN CLOUD
G-SLAM:OPTIMIZING ENERGY EFFIIENCY IN CLOUD
 
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
排隊理論_An Exploration of The Optimization of Executive Scheduling in The Cloud ...
排隊理論_An Exploration of The Optimization of Executive Scheduling in The Cloud ...排隊理論_An Exploration of The Optimization of Executive Scheduling in The Cloud ...
排隊理論_An Exploration of The Optimization of Executive Scheduling in The Cloud ...
 
Power consumption prediction in cloud data center using machine learning
Power consumption prediction in cloud data center using machine learningPower consumption prediction in cloud data center using machine learning
Power consumption prediction in cloud data center using machine learning
 
Green cloud computing
Green cloud computingGreen cloud computing
Green cloud computing
 
Green cloud computing
Green cloud computing Green cloud computing
Green cloud computing
 
Performance analysis of an energy efficient virtual machine consolidation alg...
Performance analysis of an energy efficient virtual machine consolidation alg...Performance analysis of an energy efficient virtual machine consolidation alg...
Performance analysis of an energy efficient virtual machine consolidation alg...
 
IRJET- Efficient Resource Allocation for Heterogeneous Workloads in Iaas Clouds
IRJET- Efficient Resource Allocation for Heterogeneous Workloads in Iaas CloudsIRJET- Efficient Resource Allocation for Heterogeneous Workloads in Iaas Clouds
IRJET- Efficient Resource Allocation for Heterogeneous Workloads in Iaas Clouds
 
IRJET- An Energy-Saving Task Scheduling Strategy based on Vacation Queuing & ...
IRJET- An Energy-Saving Task Scheduling Strategy based on Vacation Queuing & ...IRJET- An Energy-Saving Task Scheduling Strategy based on Vacation Queuing & ...
IRJET- An Energy-Saving Task Scheduling Strategy based on Vacation Queuing & ...
 
D04573033
D04573033D04573033
D04573033
 
Cloud sim report
Cloud sim reportCloud sim report
Cloud sim report
 

More from Mehedi Hasan Raju (8)

FPGAs versus GPUs in Data centers
FPGAs versus GPUs in Data centersFPGAs versus GPUs in Data centers
FPGAs versus GPUs in Data centers
 
2D arrays
2D arrays2D arrays
2D arrays
 
Result Management System
Result Management SystemResult Management System
Result Management System
 
Waveguide
WaveguideWaveguide
Waveguide
 
Representation of signals
Representation of signalsRepresentation of signals
Representation of signals
 
Bit error rate
Bit error rateBit error rate
Bit error rate
 
Vector space
Vector spaceVector space
Vector space
 
Inverse function
Inverse functionInverse function
Inverse function
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Evaluating and reducing cloud waste and cost—A data-driven case study from Azure workloads

  • 1. Evaluating and reducing cloud waste and cost—A data-driven case study from Azure workloads Brad Everman, Maxim Gao, Ziliang Zong Sustainable Computing: Informatics and Systems 35 (2022) Mehedi Hasan Raju
  • 2. Outline  Introduction  Related Work  Workload Analysis  Cloud waste and cost analysis  Metrics- CWP, CWI, CUS  Limitations  Conclusions 2
  • 3. Introduction  Cloud waste is common when users provision resources beyond what they need.  The user behaviors in the cloud could provide viable solutions to reduce cloud cost and waste.  This paper addresses these concerns by conducting a comprehensive analysis of the Microsoft Azure 2019 traces.  A large portion of VMs are under-utilized or over-provisioned for resources.  Cloud Waste Points (CWP) for quantitatively evaluating the waste of each VM.  Categorizing VMs based on cloud resource utilization. 3
  • 4. Introduction (contd.)  Cloud Waste Indicator (CWI) to classify Azure users as red, green, and normal users, depending on their efficiency in utilizing cloud resources.  In addition, we introduce Cloud Utilization Score (CUS) to rank the relative performance of Azure users in term of cloud waste.  Lastly, we propose an algorithm to identify red VMs and recommend lower priced VMs that can help users reduce cost without compromising quality of service (QoS). 4
  • 5. Related works  Studies related to cost reduction of large-scale cloud systems  Survey on various techniques that can reduce the energy usage of MapReduce in Hadoop systems [1]  Analysis tool to measure energy consumption in cloud environments based on different runtime tasks [2]  a genetic-based optimization algorithm to reduce the energy cost of cloud systems with phase change memory [3]  an intra and inter-server smart task scheduling algorithm, which can jointly optimize profit and energy when allocating jobs to datacenters [4] 5
  • 6. Related works (contd.)  Three papers considered the Microsoft Azure cloud platform directly.  Design and implementation of Protean, which allocates VMs on the Azure platform [5]  Shahrad et al. focused specifically on Function as a Service (FaaS) in the Azure system [6]  The most relevant work was published by Cortez[7]  It utilized the 2017 Azure data traces.  discussed certain behaviors that can be used to predict future behavior of VM workloads.  However, their predictive model aimed to increase the utilization of Azure from the system. 6
  • 7. Workload Analysis  Microsoft Azure is a public cloud computing platform.  We analyze the 2019 trace  subset of applications running on Azure during July of 2019  235 GB of data contained within 198 files  30 consecutive days of VM readings  ~2.7 M total VMs  6,687 individual users 7
  • 8. Descriptive analysis  How users utilize Azure cloud 8
  • 11. Cloud waste and cost analysis  Azure Pricing Model  VM is priced based on requested core count and memory size  Factors – choice of operating systems, cloud services region, server types etc.  Information to apply complex pricing is missing in the Azure traces 11
  • 12. Cloud waste and cost analysis  Assumptions  Deployed VMs will run on Linux (CentOS or Ubuntu).  VM will be deployed to the US-West (California) region.  VM is general purpose, not CPU or Memory-optimized.  Minimal storage is available for each VM.  Users ‘‘pay as they go’’, and do not receive any discounts for pre-paying nor volume purchasing.  Interactive VMs costs 3.33x of the price of a Delay-Insensitive VMs  based on the information from Google Cloud [8]  Unknown categorized VMs as are considered Delay-Insensitive when calculating price. 12
  • 13. Cloud waste and cost analysis 13  VM Cost Calculation VM Cost = VM Lifetime * VM Price • It was not provided in the Azure traces. • To calculate the cost of each VM, we need to know the price of each VM and its corresponding lifetime. • VM Price • VM Lifetime - the length of time in hours a VM exists, which is calculated by the difference between the creation and deletion timestamps. • Core Hours= VM lifetime * number of cores of that VM. • Core hours are used to indicate the computation resources utilized by a VM.
  • 14. Cloud waste and cost analysis  Green and red VMs  10% is a very conservative threshold  higher threshold will yield more cost savings 14
  • 15. Cloud waste and cost analysis Why is the utilization so low ?  Not enough work  Lack of parallel computing • sequential application cannot leverage multiple virtual cores • requesting more cores for such applications would decrease overall CPU utilization  Improvement of hardware 15
  • 16. Cloud waste and cost analysis  VM cloud waste points (CWP) • CWP = VM lifetime * corresponding waste factor 16
  • 17. Cloud waste and cost analysis  VM cloud waste points (CWP) distribution 17
  • 18. Cloud waste and cost analysis  Cloud Waste Indicator (CWI) • average CWP of all VMs deployed by an individual user • Categorize users into three different groups—the green, normal, red users  To determine the delineation • Remove users with less than 200 total core hours. • Dropped 6,121 users (i.e. ∼92% of all users). • calculate the CWI of each user then normalized their CWI. • 𝐶𝑊𝐼𝑛𝑜𝑟𝑚 = (𝐶𝑊𝐼𝑖 − 𝑚𝑖𝑛(𝐶𝑊𝐼))/(𝑚𝑎𝑥(𝐶𝑊𝐼) − 𝑚𝑖𝑛(𝐶𝑊𝐼)) • CWInorm is a value between 0 and 1. • CWInorm threshold • green users- 0.01 • red users – 0.05 18
  • 19. Cloud waste and cost analysis  Cloud utilization score (CUS)  calculated by providing the percentile rank of their CWInorm in comparison with other users in the cloud 19
  • 20. Cloud waste and cost analysis  Recommendation algorithm • It creates a red VMs list for each user. • It calculates the CWP of all VMs in the list and sorts the VMs in the list in descending order. • Third, it sends recommendations to that user asking to migrate down the top n number of VMs in the list by one level • This algorithm repeats for all users. 20
  • 21. Cloud waste and cost analysis 21  Cost savings (from all red VMs) • Original Cost: $61,595,170.23 • Total VMs: 2,695,548 • VMs with Savings: 1,369,364 • Percent of VMs with Savings: 51% • New Cost: $39,341,202.17 • Total Savings: $22,253,968.06
  • 22. Limitations  Lack of information regarding the nature of jobs and applications running on each VM • could affect the quality of our recommendations.  Lack of information about memory usage of each VM (in traces).  Over-provisioning of memory  Multiple assumptions • assumptions made in pricing model • assumptions on user behavior 22
  • 23. Conclusions  Studying the user behaviors in the cloud, providing viable solution to reduce cloud cost and waste  comprehensive analysis of the Microsoft Azure 2019 traces • VMs are underutilized or over-provisioned for resources  Mitigate the cloud waste problem and save cost  proposing some metrics – CWP, CWI, CUS  Experimental results show that over $22 million savings can be achieved. 23
  • 24. Reference 1. M. Alalawi, H. Daly, A survey on hadoop MapReduce energy efficient techniques for intensive workload, in: Proceedings of the International Conference on Big Data and Internet of Thing, in: BDIOT2017, Association for Computing Machinery, New York, NY, USA, 2017, pp. 62–66 2. F. Chen, J.-G. Schneider, Y. Yang, J. Grundy, Q. He, An energy consumption model and analysis tool for cloud computing environments, in: 2012 First International Workshop on Green and Sustainable Software (GREENS), 2012, pp. 45–50 3. M. Qiu, Z. Ming, J. Li, K. Gai, Z. Zong, Phase-change memory optimization for green cloud with genetic algorithm, IEEE Trans. Comput. 64 (12) (2015) 3528–3540 4. S. Mamun, A. Gilday, A. Singh, A. Ganguly, G. Merrett, X. Wang, B. AIHashimi, Intra- and inter-server smart task scheduling for profit and energy optimization of HPC data centers, J. Low Power Electron. Appl. 10 (2020) 32 5. O. Hadary, L. Marshall, I. Menache, A. Pan, E.E. Greeff, D. Dion, S. Dorminey, S. Joshi, Y. Chen, M. Russinovich, T. Moscibroda, Protean: VM allocation service at scale, in: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), USENIX Association, 2020, pp. 845–861 6. M. Shahrad, R. Fonseca, I. Goiri, G. Chaudhry, P. Batum, J. Cooke, E. Laureano, C. Tresness, M. Russinovich, R. Bianchini, Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider, in: 2020 USENIX Annual Technical Conference (USENIX ATC 20), USENIX Association, 2020, pp. 205–218 7. E. Cortez, A. Bonde, A. Muzio, M. Russinovich, M. Fontoura, R. Bianchini, Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms, in: Proceedings of the 26th Symposium on Operating Systems Principles, in: SOSP ’17, Association for Computing Machinery, New York, NY, USA, 2017, pp. 153–167 8. Google, Google cloud pricing, 2021, https://cloud.google.com/pricing 24