SlideShare a Scribd company logo
1 of 61
AI Sustainability
Dr. Tamar Eilam
IBM Fellow, Chief Scientist Sustainable Computing, IBM Research
2
Approximate and Partial list
of contributors in arbitrary
order
3
Energy modeling and quantification Marcelo Amaral, Huamin Chen, Tatsuhiro Chiba,
Rina Nakazawa, Sunyanan Choochotkaew, Eun K Lee, Umamaheswari Devi, Aanchal
Goyal Workload Classification Xi Yang, Rohan R Arora, Chandra Narayanaswami,
Cheuk Lam, Jerrold Leichter, Yu Deng, Daby Sow Energy Aware Optimization Tatebeh
Bahreini, Asser Tantawi, Alaa Youssef, Chen Wang, AI System Jeffrey Burns, Leland
Chang, Ankur Agrawal, Kailash Gopalakrishnan, Pradip Bose AI Quantification and
Metric Pedro Bello-Maldonado, Bishwaranjan Bhattacharjee, Carlos Costa, AI
Infrastructure Innovation Seelam Seetharami Model Architecture Innovation David Cox,
Rameswar Panda, Rogerio Feris, Leonid Karlinsky
The Climate Impact Chain
Human
activity
Increased
Green House
Gas (GHG) in
atmosphere
Global
warming
Global
climate
change
Physical
&
biological
impact
Human socio-
economic
impact
$150 billion
Average cost in damages per year
100M+
Increase in population facing hunger
IBM Research | © 2022 IBM Corporation
5
Mitigation
Carbon
Capture
Geo-
engine
ering
Reduce Carbon
Emission
Sustainable
Computing
Part1: Sustainable
Computing
IBM Research / Doc ID / Month XX, 2020 / © 2020 IBM Corporation 6
What is Sustainable Computing ?
7
Ability to measure, quantify, and ultimately reduce carbon
footprint at every layer of the computing stack, in- and across-
data centers, and across the entire life cycle.
IBM Research | © 2022 IBM Corporation
The Computer Energy Problem
8
We are at an inflection point :
3. The end of Dennard
Scaling means we can’t
keep up
 Some predict that electricity consumed by Data Centers will increase to 8% by 2030
 Golden Era for Chip Design
1. Demand is growing at
exponential scale
How to stop data centers from gobbling up the
world’s electricity
https://www.nature.com/articles/d41586-018-
06610-y
2. The emergence of
energy-demanding
workloads(AI)
AI power consumption doubles
every 3-4 months
* Green AI, R. Schwartz, J. Dodge,
N. A. Smith, O. Etzioni 2019
Ever rising energy demands
for computing vs. global
energy production is
creating new risk, and new
opportunities for radically
different computing
to drastically improve
efficiency
31%
a years the energy consumption
increase trend for hyperscalers in
North America
>10%
of the world's power will be
consumed by hyperscalers by 2030
IBM Research | © 2022 IBM Corporation 9
Sustainable Computing epochs
Making the Current State
More Sustainable
Introducing Accelerators
(Digital)
& Hardware and Software
co-design / co-optimization
New Computational Models
(beyond digital)
 Understanding the As-Is
 Hot Spot Detection
 Remediation and Optimization
 Coupling Power and Cloud
 Cooling, Data Center Planning, etc
 Storage AutoTiering.
 HW and SW co-design (scalable
approach)
 Reduced precision chips – 8bit
precision approximate computing
 Voltage scaling with error correction
 Runtime management of dis-
aggregated & composable
heterogenous DC
 New computational
models that completely
break the relationship
between energy & computation:
neuromorphic, analog AI, data-
centric,
quantum, etc.
https://research.ibm.com/blog/telum-
processor
https://www.esp.cs.columbia.edu
https://research.ibm.com/blog/the-
hardware-behind-analog-ai
https://www.zurich.ibm.com/sto/memory/
IBM Research | © 2022 IBM Corporation 10
Carbon Intensity
The emission rate: grams of carbon
dioxide released
per megajoule of energy produced
—
With coal power stations, the carbon intensity
is high as CO2 is produced as part of the
power generation process.
Carbon intensity is >1 kg/kWh for coal;
—
Renewable energy such as hydro or solar
produce almost no emissions, so their carbon
intensity is very low.
Carbon intensity is ~0 for solar/wind
Modeling the Data Center Carbon
Footprint
11
x Carbon Intensity
Power usage effectiveness (PUE)
A predominant metric used to measure the energy
efficiency of a data center.
—
PUE = (Total Facility Energy) / ( IT Equipment
Energy)
Efficiency improves as the quotient decreases
towards 1.
1 is optimal, 2 is very bad.
Total Carbon Footprint
The total amount of carbon dioxide (CO2) and
equivalent green house gas emissions associated
with powering a data center.
CFP >= 0.
Carbon Footprint = IT Equipment Energy x Power Usage Effectiveness
CFP =EIT × PUE × CI
EIT
PUE
An example DC Energy Breakdown
IBM Research | © 2022 IBM Corporation
Reducing the Data Center Carbon
Footprint: Research Opportunities
12
x Carbon Intensity
Carbon Footprint = IT Equipment Energy x Power Usage Effectiveness
CFP =EIT × ERE × CI
• Data Center Design, Cooling and Heat-
Reuse
• Rack Design to optimize power
conversion, and direct liquid cooling
• Improving power conversion in the data
center
• Energy Aware Scheduling, Vertical Scaling,
Dispatching
• Power Management
• Accelerators for Green AI: Tradeoffs
between accuracy and efficiency
• Chip Design
• Dispatching of batch workload such as AI
Training Jobs across time and space to
maximize renewable energy use.
• Forecasting of renewable energy (time
series composition)
• Can the cloud sense renewable energy and
adapt?
https://research.ibm.com/blog/ibm-artificial-intelligence-
unit-aiu
https://www.zurich.ibm.com/st/energy_efficiency/zeroemiss
ion.html
IBM Research | © 2022 IBM Corporation
13
Act
13
Energy and CFP
per workload, tenant,
VM, container, Service,
Etc.
Identify hotspots
and applicable
strategies.
Calculate potential
savings.
Assess
Estimate
A set of controllers
to dynamically optimize the
Carbon footprint at
operation.
Design efficient systems
Report
Report CFP across your
entire organization in a
consistent fashion factoring
in requirements
Carbon Assessment & Reduction Framework
An Approach for Sustainable Computing
Energy Quantification
Challenge
• How do you estimate the power
consumption of applications running on
shared servers?
• How do you do that when you do not have
on-line power measurement at the server
level?
• How do you do that if you do not know what
else is running on the machine?
14
Energy Quantification
Challenge
• How do you estimate the power
consumption of applications running on
shared servers?
=> ratio based approach
• How do you do that when you do not have
on-line power measurement at the server
level?
=> power modeling
• How do you do that if you do not know what
else is running on the machine?
=> dynamic power estimation only
• How do you scale the approach to
developing power models (combinatorial
explosion problem)?
15
The Kepler Project
https://github.com/sustainable-computing-io/kepler
16
[1] https://github.com/sustainable-computing-io/kepler
Kepler Architecture
• eBPF metrics:
hardware
counters, cpu
time and soft IRQ
• System Power
metrics from BMs
and VMs
• Ratio Power
Model for
containers
• Trained Power
Model to estimate
the VM’s
component
power
consumption
17
Kepler Deployment Approaches
- Ratio Power Model for Dynamic CPU Power
with Hardware Counter:
DynPowerprocess i =
𝐶𝑃𝑈 𝑐𝑦𝑐𝑙𝑒𝑠 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖
𝛴𝐶𝑃𝑈 𝑐𝑦𝑐𝑙𝑒𝑠
𝑥 DynPowerhost_CPU
without Hardware Counter:
DynPowerprocess i =
𝐵𝐹𝑃 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖
𝛴𝐵𝑃𝐹 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒
𝑥
DynPowerhost_CPU
DynPowercontainer j = Σ 𝑖 𝜖 𝑗 DynPowerprocess i
- Evenly distribution of Idle Power
Powercontainer j = IdlePowerhost_CPU / numContainers GPU (nvml)
Kepler Model Server Project
facilitate training power model for server without power meter
Bare-metal (BM)
Kepler
Estimated System
Power Metrics
Ratio Power Model
Process/Container
Power Consumption
Virtual Machine (VM)
Trained Power Model
Bare-metal (BM)
RAPL ACPI/Sensors
Redfish/IPMI GPU (nvml)
Kepler
Ratio Power Model
Process/Container
Power Consumption
Server with
power meter Server without
power meter
Kepler Model Server
Motivation:
• No power measurement exposed or instrumented in some running systems
Challenges:
• No or not-enough data to train power model specific to all available metrics and emerging system platform and
settings (e.g., variety of CPU architecture, Frequency governor)
• Dynamicity of control plane processes
Collect
Data
Train
Model
Export
Model
Serve
Model
Estimate
Power
core of Kepler model server
Pipeline Framework (one extractor, one isolator, multiple trainers )
Extract
…
Prometheus query result Extracted data Isolated data
Power models
Node-level
Train
Container-level
Train
Isolate
Energy metric
Energy-related
metric (s)
with background power
without background power
https://www.cncf.io/blog/2023/10/11/exploring-keplers-potentials-unveiling-cloud-application-power-consumption/
The Issue with Third-
Party Clouds
 No server power metric available
 No knowledge of what else is running on my machine
 how to split idle power? 
 Limited knowledge of the architecture and configuration of the bare metal servers
 Challenge for applying separately trained power models… 
 ALL Cloud Native calculators are too coarse grained to be useful for optimization .. 
Generated with Dall-E
https://adrianco.medium.com/proposal-for-a-realtime-carbon-footprint-standard-60b71c269948
Adrian Cockcroft
How can we get to real time monitoring
of application carbon consumption in third party
clouds?
Consistent
Trustworthy
Transparent
Explainable
Can Kepler help?
What else do we need?
WIP: Reference
Implementation to be
Open Sourced.
23
Act
23
Energy and CFP
per workload, tenant,
VM, container, Service,
Etc.
Identify hotspots
and applicable
strategies.
Calculate potential
savings.
Assess
Estimate
A set of controllers
to dynamically optimize the
Carbon footprint at
operation.
Design efficient systems
Report
Report CFP across your
entire organization in a
consistent fashion factoring
in requirements
Carbon Assessment & Reduction Framework
An Approach for Sustainable Computing
Detect non-productive workloads
• Virtual Machines
• Cloud-native deployments
• Cloud services
Can schedules be drawn up for a
few (if not all) productive
workloads?
Workload Classification: Motivation
Methodology
Workload*
Classification
Phase
Abstraction
Inactive/
active
phases
Non-repeatable
Constantly
Productive
Alternating
Workload
Timetabling
Candidate for
Termination
Candidate for
Parking
No Action
Repeatable
Recommendation
Metrics
Non-productive
• Non-productive: Remaining in the Inactive Phase
• Constantly Productive: Remaining in the Active Phase
• Alternating: Switching between the two Phases
VM1
VM2
VM𝑁
𝑇 − 𝑤𝑐
7/14/21 Days
𝑇
26
Act
26
Energy and CFP
per workload, tenant,
VM, container, Service,
Etc.
Identify hotspots
and applicable
strategies.
Calculate potential
savings.
Assess
Estimate
A set of controllers
to dynamically optimize the
Carbon footprint at
operation.
Design efficient systems
Report
Report CFP across your
entire organization in a
consistent fashion factoring
in requirements
Carbon Assessment & Reduction Framework
An Approach for Sustainable Computing
CARE: Carbon Quantification &
Reduction
Coordinated set of controllers to
dynamically quantify and
optimize the carbon footprint in
every level of the hybrid cloud
stack in and across on and off
prem data centers
Container
Right-Sizing
Dynamic
dispatching
Energy aware
scheduler
VM
placement
Power
management
Container
Right-Sizing
Energy aware
scheduler
VM
placement
Power
management
CFP =EIT × PUE × CI
Leverage renewable energy
when and where it is
available across datacenters.
Efficiency with container
resource consumption
within a datacenter.
Efficient infrastructure with
VM and power
management
27
Part2: AI Sustainability
IBM Research / Doc ID / Month XX, 2020 / © 2020 IBM Corporation 28
The energy cost of AI
 Deep learning is computationally intensive
 Time consuming even with high-performance computing resources
Take for example: Training Image recognition model
Dataset: ImageNet-22K
Network: ResNet-101
256 GPUs
7 hours
~450kWh
4 GPUs
16 days
~385
kWh
1 model training run is ~2 weeks of
home energy consumption
https://arxiv.org/abs/1708.02188
AI demand keeps surging Training requirements
are doubling every 3.5
months
Source: Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2019. Energy and
Policy Considerations for Deep Learning in NLP. CoRR abs/1906.02243 (2019).
arXiv:1906.02243
Source: Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2019. Green AI.
arXiv:1907.10597 [cs.CY]
The emergence of foundation models
Homogenization: a broad foundation
model is adapted to perform specific tasks.
Almost all state-of- the-art NLP models are
now adapted from one of a few foundation
models, such as BERT, RoBERTa, BART,
T5, etc.
Multi modal, and cross domains are next.
Source: RishiBommasani,DrewA.Hudson,EhsanAdeli,RussAltman,SimranArora, Sydney von Arx, Michael S.
Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card,
Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora
Dora Demszky, and Chris Donahue et al. 2022. On the Opportunities and Risks of Foundation Models.
Models. arXiv:2108.07258 [cs.LG]
Sizes of Language Models Training Cost of Language Model
GPT-3 needs 1024 A100 GPUs for 34 days for training!
Large language models are getting larger
Some say that this is okay, because they are re-used for multiple tasks*
This claim is yet to be substantiated based on a sound analysis
*E.g., DavidPatterson,JosephGonzalez,QuocLe,ChenLiang,Lluis-MiquelMunguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. 2021. Carbon Emissions and Large Neural Network Training.
Data Scientist Dilemma: to adapt or not to
adapt
• To adapt from a broad model,
or, to train a smaller model on a more specific data set?
• How much data to use?
• Can I synthesize a few smaller models?
• Neural Architecture Search? Hyper Parameter Optimization?
Is it worth the cost? well, it depends….
• What is the optimal frequency of re-training?
Daily? Weekly?
what data shall I use for re-training? incremental? Complete?
Sustainable AI platform principles
Transparency dynamically track
energy and carbon across the data
and model life cycle
Traceability and Governance track
the ‘supply chain’ of models and
data-sets and associated energy
and carbon
Energy Efficiency Innovation across
all layers of the stack
Meaningful
Metrics
10/17/2023 35
Meaningful Metrics categories
data-
set
model
Products
Core Metric
Life-cycle
Efficiency
Construction Operation Construction
pre-training
10/17/2023
Operation
re-
training
Inference
Life-cycle
factor-in the provenance of models and data-sets and their associate
energy and carbon footprint (Life-Cycle-Assessment principles)
D FM M
Efficiency efficiency =
𝑐𝑜𝑠𝑡
𝑤𝑜𝑟𝑘 𝑝𝑟𝑜𝑑𝑢𝑐𝑒𝑑
what goes into ‘cost’?
 compute for inference
 +training
 +bill of material ‘tax’
holistic approach to Sustainable AI
Factor-in the entire life cycle of models
Sustainable strategy exploration and what-if analysis
Provenance, Governance, and reporting
Holistic impact analysis and tradeoff based planning
AI Sustainability Metrics
Transparency
10/17/2023 38
The life-cycle of a model as a state machine
Each ‘state transition’ is associated with a significant energy/carbon cost,
and involve critical decisions, that will affect cost of this and downstream tasks.
• Tradeoffs between
accuracy,
time-to-value, and
energy/carbon
• Cost of one phase
may depend on
decisions taken
at a prior stage.
save now, pay
later….
• The particulars of
the target task are
important to factor in
early on.
On-Line Fine Grain monitoring of Energy and
Carbon with Kepler
• An open-source project pioneered by
RedHat and IBM Research to quantify
cloud native applications
energy/carbon.
• On road map to deliver in OCP and
integrate in Rosa
• Adrian Cockcroft advocating use
of Kepler across all cloud providers
“Real Time Energy and Carbon Standard
for Cloud Providers”
10/17/2023 40
SusQL: Context aware aggregation and energy accounting
Infrastructure: Kubernetes controller with its own CRD that gets data from Kepler for
aggregation
susql-controller
map[labels]->energy table
1 2
3
4
apiVersion: …
kind: LabelGroup
metadata: …
spec:
labels:
- <label-1>
- <label-2>
- <label-3>
- <label-4>
status:
totalEnergy: <total energy>
Governance
10/17/2023 42
A ‘Supply Chain’ of models
Models are created (‘manufactured’)
distilled, fine tuned, and rer-used
(adapted) to created new models
Deployment is just the beginning of
the journey.
How do we reason about the Life-Cycle
Cost of models?
Product Life Cycle Assessment Principles
for Sustainable AI:
Products = data-set | model
We need to factor in the cost of the Bill of Material used in the creation of a new model
If B (a product or a service) is used in the process of creation of A1, A2, … An, then the carbon cost of B
is inherited by A1, A2, …, An in proportion to their use.
The Governance Chain
Efficiency at
Every Layer
10/17/2023 46
Efficiency at every layer of the AI Stack
• Every layer of the FM stack offer opportunity for efficiencies gains
Model Quantization,
architecture innovation
Tools dynamic batching
Platform Multiplexing, dispatching
Infrastructure DVFS, power param
optimization, caching,
Systems Approximate computing
and other system
innovations
• Empower the data scientist to make choices and explore tradeoffs between accuracy, performance, energy
• Empower the data scientist to reason about life-cycle strategies: e.g., if/what/when to re-use, and how much
to retrain
Systems innovation
10/17/2023 48
49
IBM Research’s Artificial Intelligence Unit (AIU)
Chip architecture optimized for enterprise AI workloads
Enabled for Foundation Models
Enabled in the Red Hat software stack
Supports multi-precision inference (& training)
FP16, FP8, INT8, INT4, INT2
Implemented in leading edge 5nm technology
https://research.ibm.com/blog/ibm-artificial-intelligence-unit-aiu
SoC implements IBM’s leadership innovations
in low-precision AI arithmetic and algorithms
IBM Research AI Hardware Center / © 2023 IBM Corporation
50
Vision for AI Performance Scaling
• Applying Approximate Computing techniques to AI compute
• Critical requirement: maintain model accuracy
• Advantage: Quadratic improvement in performance
• IBM Research has been at the forefront of every major
technical advancement on bit-precision scaling
• 16-bit training (2015)
• 8-bit training (2018, 2019)
• 4-bit training (2020)
• 2/4-bit Inference (2018-2020)
• Complemented by
• Sparsity support
• Analog Computing
• 3D Stacking
Digital AI Cores
Scaling precision for quadratic gains in performance with iso-accuracy
4-bit Inference ASICs
J.Choi et al., https://arxiv.org/pdf/1805.06085.pdf
J.McKinstry et al., https://arxiv.org/abs/1809.04191
2-bit Inference ASICs
J.Choi et al., SysML 2019
0.1
1
10
100
2012 2015 2018 2021 2024
16-bit
32-bit
16-bit
8-bit
8-bit
2-bit
4-bit
4-bit
16-bit Training
ICML 2015
Training
Inference
4-bit Training
X. Sun et al NeurIPS 2020
8-bit Training
NeurIPS 2018, 2019
4-bit Inference
J.Choi et al.,arxiv 2018
2-bit Inference
J.Choi et al., SysML 2019
Bit
Precision
https://research.ibm.com/blog/ibm-artificial-intelligence-unit-aiu
https://research.ibm.com/blog/ai-chip-precision-scaling
Infrastructure innovation
10/17/2023 51
Vela: A Cloud Native Supercomputer for the Foundation Model Age (Kepler inside)
System specifications
– Nodes with 8 x A100 GPUs (80GB)
– GPUs interconnected with NVLink, NVSwitch
– Cascade Lake CPUs, 1.5TB of DRAM,
– Four 3.2TB NVMe drives
– Redundant connections between nodes, TORs and
spines
– 2 x 100G NICs from each node – NCCL benchmarks
show we drive close to line rate
https://research.ibm.com/blog/AI-supercomputer-Vela-GPU-cluster
– Configure resources through software (APIs)
– Broad ecosystem of available cloud services
– Leverage data sets on Cloud Object Store
– Standard, flexible, scalable infrastructure design (vs
traditional HPC)
– Near bare metal performance (within 5%, single node)
How do you evolve from
specialized (monolithic), costly,
and inflexible HPC stack to Cloud
Native Stack without
compromising efficiency ?
- Programmability
- Scalability
- Re-use
- Observability
- Agility
- Democratization
10/17/2023
53
Platform Innovation
Dispatching of jobs based on renewable energy
54
Motivation:
 Carbon intensity of the energy mix of different
regions of IBM data centers varies over time.
 Renewable energy is not available all the time
and in all places.
Workload Optimization: Placement and scheduling
of workloads based on carbon-free energy
availability.
Ideal dispatching: High CPU utilization when
carbon intensity is low and low CPU utilization when
carbon intensity is high.
T. Bahreini, A. Tantawi and A. Youssef, "An Approximation Algorithm for Minimizing the Cloud Carbon Footprint
through Workload Scheduling," 2022 IEEE 15th International Conference on Cloud Computing (CLOUD), 2022, pp.
522-531,
Challenge: Ideal dispatching might be practically
infeasible.
 Short jobs may have short deadline.
 Some jobs are not interruptible.
 Jobs have heterogenous resource demands.
Obtaining the optimal packing is intractable.
55
In a single data center how to
order batch jobs for minimum carbon
while meeting deadlines.
polynomial approximation algorithms that works
across data centers (space x time).
10/17/2023
56
Models & Tools Innovation
57
58
Call to action:
AI Platform providers:
- Build-in transparency and governance
- Incorporate platform and system innovation for efficiency.
Academia & Industry: Focus you Research on Efficiency not just
accuracy
Data Scientists / Practitioners: Develop a sustainability
mind-set
Re-use where it makes sense
Domain specific, smaller models are better!
Explore tradeoffs (accuracy vs cost)
59
Tokyo
Shin-Kawasaki
Delhi
Bangalore
Singapore
Nairobi
Haifa
Zurich
Warrington
Dublin
Cambridge
Albany
Yorktown
Almaden
Rio de Janeiro
Sao Paulo Johannesburg
6 Nobel Laureates 10 Medals of Technology 5 National Medals of Science 6 Turing Awards
IBM Research
Questions?
61
AI Sustainability: Measuring and Reducing the Carbon Footprint of Computing

More Related Content

Similar to AI Sustainability: Measuring and Reducing the Carbon Footprint of Computing

Koomeyondatacenterelectricityuse v24
Koomeyondatacenterelectricityuse v24Koomeyondatacenterelectricityuse v24
Koomeyondatacenterelectricityuse v24Jonathan Koomey
 
empirical analysis modeling of power dissipation control in internet data ce...
 empirical analysis modeling of power dissipation control in internet data ce... empirical analysis modeling of power dissipation control in internet data ce...
empirical analysis modeling of power dissipation control in internet data ce...saadjamil31
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersDilum Bandara
 
Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...
Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...
Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...Tarik Reza Toha
 
Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...
Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...
Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...IBMAsean
 
Optimization of power consumption in data centers using machine learning bas...
Optimization of power consumption in data centers using  machine learning bas...Optimization of power consumption in data centers using  machine learning bas...
Optimization of power consumption in data centers using machine learning bas...IJECEIAES
 
energysimulation01-170601095924.pdf
energysimulation01-170601095924.pdfenergysimulation01-170601095924.pdf
energysimulation01-170601095924.pdfShree Sowmya Chinta
 
BGPC: Energy-Efficient Parallel Computing Considering Both Computational and ...
BGPC: Energy-Efficient Parallel Computing Considering Both Computational and ...BGPC: Energy-Efficient Parallel Computing Considering Both Computational and ...
BGPC: Energy-Efficient Parallel Computing Considering Both Computational and ...Tarik Reza Toha
 
greendatacenter
greendatacentergreendatacenter
greendatacenterkorzay
 
Energy Efficient Data Center
Energy Efficient Data CenterEnergy Efficient Data Center
Energy Efficient Data CenterGunawan Jusuf
 
Koomeyoncloudcomputing V5
Koomeyoncloudcomputing V5Koomeyoncloudcomputing V5
Koomeyoncloudcomputing V5Jonathan Koomey
 
⭐⭐⭐⭐⭐ Learning-based Energy Consumption Prediction
⭐⭐⭐⭐⭐ Learning-based Energy Consumption Prediction⭐⭐⭐⭐⭐ Learning-based Energy Consumption Prediction
⭐⭐⭐⭐⭐ Learning-based Energy Consumption PredictionVictor Asanza
 
Green Computing: A Methodology of Saving Energy by Resource Virtualization.
Green Computing: A Methodology of Saving Energy by Resource Virtualization.Green Computing: A Methodology of Saving Energy by Resource Virtualization.
Green Computing: A Methodology of Saving Energy by Resource Virtualization.IJCERT
 
A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...eSAT Publishing House
 
A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...eSAT Journals
 
LSI Seminar on Marina Zapater's PhD Thesis
LSI Seminar on Marina Zapater's PhD ThesisLSI Seminar on Marina Zapater's PhD Thesis
LSI Seminar on Marina Zapater's PhD ThesisGreenLSI Team, LSI, UPM
 

Similar to AI Sustainability: Measuring and Reducing the Carbon Footprint of Computing (20)

Koomeyondatacenterelectricityuse v24
Koomeyondatacenterelectricityuse v24Koomeyondatacenterelectricityuse v24
Koomeyondatacenterelectricityuse v24
 
empirical analysis modeling of power dissipation control in internet data ce...
 empirical analysis modeling of power dissipation control in internet data ce... empirical analysis modeling of power dissipation control in internet data ce...
empirical analysis modeling of power dissipation control in internet data ce...
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale Computers
 
Energy Efficiency in Data Centers
Energy Efficiency in Data CentersEnergy Efficiency in Data Centers
Energy Efficiency in Data Centers
 
HYPPO - NECSTTechTalk 23/04/2020
HYPPO - NECSTTechTalk 23/04/2020HYPPO - NECSTTechTalk 23/04/2020
HYPPO - NECSTTechTalk 23/04/2020
 
Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...
Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...
Workload-Based Prediction of CPU Temperature and Usage for Small-Scale Distri...
 
Energy simulation & analysis
Energy simulation & analysisEnergy simulation & analysis
Energy simulation & analysis
 
Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...
Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...
Green & Beyond: Data Center Actions to Increase Business Responsiveness and R...
 
Optimization of power consumption in data centers using machine learning bas...
Optimization of power consumption in data centers using  machine learning bas...Optimization of power consumption in data centers using  machine learning bas...
Optimization of power consumption in data centers using machine learning bas...
 
energysimulation01-170601095924.pdf
energysimulation01-170601095924.pdfenergysimulation01-170601095924.pdf
energysimulation01-170601095924.pdf
 
BGPC: Energy-Efficient Parallel Computing Considering Both Computational and ...
BGPC: Energy-Efficient Parallel Computing Considering Both Computational and ...BGPC: Energy-Efficient Parallel Computing Considering Both Computational and ...
BGPC: Energy-Efficient Parallel Computing Considering Both Computational and ...
 
greendatacenter
greendatacentergreendatacenter
greendatacenter
 
Energy Efficient Data Center
Energy Efficient Data CenterEnergy Efficient Data Center
Energy Efficient Data Center
 
Koomeyoncloudcomputing V5
Koomeyoncloudcomputing V5Koomeyoncloudcomputing V5
Koomeyoncloudcomputing V5
 
⭐⭐⭐⭐⭐ Learning-based Energy Consumption Prediction
⭐⭐⭐⭐⭐ Learning-based Energy Consumption Prediction⭐⭐⭐⭐⭐ Learning-based Energy Consumption Prediction
⭐⭐⭐⭐⭐ Learning-based Energy Consumption Prediction
 
Green Computing: A Methodology of Saving Energy by Resource Virtualization.
Green Computing: A Methodology of Saving Energy by Resource Virtualization.Green Computing: A Methodology of Saving Energy by Resource Virtualization.
Green Computing: A Methodology of Saving Energy by Resource Virtualization.
 
A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...
 
A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...A survey on energy efficient with task consolidation in the virtualized cloud...
A survey on energy efficient with task consolidation in the virtualized cloud...
 
Green Cloud Computing
Green Cloud ComputingGreen Cloud Computing
Green Cloud Computing
 
LSI Seminar on Marina Zapater's PhD Thesis
LSI Seminar on Marina Zapater's PhD ThesisLSI Seminar on Marina Zapater's PhD Thesis
LSI Seminar on Marina Zapater's PhD Thesis
 

Recently uploaded

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 

Recently uploaded (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 

AI Sustainability: Measuring and Reducing the Carbon Footprint of Computing

  • 1. AI Sustainability Dr. Tamar Eilam IBM Fellow, Chief Scientist Sustainable Computing, IBM Research
  • 2. 2
  • 3. Approximate and Partial list of contributors in arbitrary order 3 Energy modeling and quantification Marcelo Amaral, Huamin Chen, Tatsuhiro Chiba, Rina Nakazawa, Sunyanan Choochotkaew, Eun K Lee, Umamaheswari Devi, Aanchal Goyal Workload Classification Xi Yang, Rohan R Arora, Chandra Narayanaswami, Cheuk Lam, Jerrold Leichter, Yu Deng, Daby Sow Energy Aware Optimization Tatebeh Bahreini, Asser Tantawi, Alaa Youssef, Chen Wang, AI System Jeffrey Burns, Leland Chang, Ankur Agrawal, Kailash Gopalakrishnan, Pradip Bose AI Quantification and Metric Pedro Bello-Maldonado, Bishwaranjan Bhattacharjee, Carlos Costa, AI Infrastructure Innovation Seelam Seetharami Model Architecture Innovation David Cox, Rameswar Panda, Rogerio Feris, Leonid Karlinsky
  • 4. The Climate Impact Chain Human activity Increased Green House Gas (GHG) in atmosphere Global warming Global climate change Physical & biological impact Human socio- economic impact $150 billion Average cost in damages per year 100M+ Increase in population facing hunger IBM Research | © 2022 IBM Corporation
  • 6. Part1: Sustainable Computing IBM Research / Doc ID / Month XX, 2020 / © 2020 IBM Corporation 6
  • 7. What is Sustainable Computing ? 7 Ability to measure, quantify, and ultimately reduce carbon footprint at every layer of the computing stack, in- and across- data centers, and across the entire life cycle. IBM Research | © 2022 IBM Corporation
  • 8. The Computer Energy Problem 8 We are at an inflection point : 3. The end of Dennard Scaling means we can’t keep up  Some predict that electricity consumed by Data Centers will increase to 8% by 2030  Golden Era for Chip Design 1. Demand is growing at exponential scale How to stop data centers from gobbling up the world’s electricity https://www.nature.com/articles/d41586-018- 06610-y 2. The emergence of energy-demanding workloads(AI) AI power consumption doubles every 3-4 months * Green AI, R. Schwartz, J. Dodge, N. A. Smith, O. Etzioni 2019
  • 9. Ever rising energy demands for computing vs. global energy production is creating new risk, and new opportunities for radically different computing to drastically improve efficiency 31% a years the energy consumption increase trend for hyperscalers in North America >10% of the world's power will be consumed by hyperscalers by 2030 IBM Research | © 2022 IBM Corporation 9
  • 10. Sustainable Computing epochs Making the Current State More Sustainable Introducing Accelerators (Digital) & Hardware and Software co-design / co-optimization New Computational Models (beyond digital)  Understanding the As-Is  Hot Spot Detection  Remediation and Optimization  Coupling Power and Cloud  Cooling, Data Center Planning, etc  Storage AutoTiering.  HW and SW co-design (scalable approach)  Reduced precision chips – 8bit precision approximate computing  Voltage scaling with error correction  Runtime management of dis- aggregated & composable heterogenous DC  New computational models that completely break the relationship between energy & computation: neuromorphic, analog AI, data- centric, quantum, etc. https://research.ibm.com/blog/telum- processor https://www.esp.cs.columbia.edu https://research.ibm.com/blog/the- hardware-behind-analog-ai https://www.zurich.ibm.com/sto/memory/ IBM Research | © 2022 IBM Corporation 10
  • 11. Carbon Intensity The emission rate: grams of carbon dioxide released per megajoule of energy produced — With coal power stations, the carbon intensity is high as CO2 is produced as part of the power generation process. Carbon intensity is >1 kg/kWh for coal; — Renewable energy such as hydro or solar produce almost no emissions, so their carbon intensity is very low. Carbon intensity is ~0 for solar/wind Modeling the Data Center Carbon Footprint 11 x Carbon Intensity Power usage effectiveness (PUE) A predominant metric used to measure the energy efficiency of a data center. — PUE = (Total Facility Energy) / ( IT Equipment Energy) Efficiency improves as the quotient decreases towards 1. 1 is optimal, 2 is very bad. Total Carbon Footprint The total amount of carbon dioxide (CO2) and equivalent green house gas emissions associated with powering a data center. CFP >= 0. Carbon Footprint = IT Equipment Energy x Power Usage Effectiveness CFP =EIT × PUE × CI EIT PUE An example DC Energy Breakdown IBM Research | © 2022 IBM Corporation
  • 12. Reducing the Data Center Carbon Footprint: Research Opportunities 12 x Carbon Intensity Carbon Footprint = IT Equipment Energy x Power Usage Effectiveness CFP =EIT × ERE × CI • Data Center Design, Cooling and Heat- Reuse • Rack Design to optimize power conversion, and direct liquid cooling • Improving power conversion in the data center • Energy Aware Scheduling, Vertical Scaling, Dispatching • Power Management • Accelerators for Green AI: Tradeoffs between accuracy and efficiency • Chip Design • Dispatching of batch workload such as AI Training Jobs across time and space to maximize renewable energy use. • Forecasting of renewable energy (time series composition) • Can the cloud sense renewable energy and adapt? https://research.ibm.com/blog/ibm-artificial-intelligence- unit-aiu https://www.zurich.ibm.com/st/energy_efficiency/zeroemiss ion.html IBM Research | © 2022 IBM Corporation
  • 13. 13 Act 13 Energy and CFP per workload, tenant, VM, container, Service, Etc. Identify hotspots and applicable strategies. Calculate potential savings. Assess Estimate A set of controllers to dynamically optimize the Carbon footprint at operation. Design efficient systems Report Report CFP across your entire organization in a consistent fashion factoring in requirements Carbon Assessment & Reduction Framework An Approach for Sustainable Computing
  • 14. Energy Quantification Challenge • How do you estimate the power consumption of applications running on shared servers? • How do you do that when you do not have on-line power measurement at the server level? • How do you do that if you do not know what else is running on the machine? 14
  • 15. Energy Quantification Challenge • How do you estimate the power consumption of applications running on shared servers? => ratio based approach • How do you do that when you do not have on-line power measurement at the server level? => power modeling • How do you do that if you do not know what else is running on the machine? => dynamic power estimation only • How do you scale the approach to developing power models (combinatorial explosion problem)? 15 The Kepler Project https://github.com/sustainable-computing-io/kepler
  • 16. 16 [1] https://github.com/sustainable-computing-io/kepler Kepler Architecture • eBPF metrics: hardware counters, cpu time and soft IRQ • System Power metrics from BMs and VMs • Ratio Power Model for containers • Trained Power Model to estimate the VM’s component power consumption
  • 17. 17 Kepler Deployment Approaches - Ratio Power Model for Dynamic CPU Power with Hardware Counter: DynPowerprocess i = 𝐶𝑃𝑈 𝑐𝑦𝑐𝑙𝑒𝑠 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖 𝛴𝐶𝑃𝑈 𝑐𝑦𝑐𝑙𝑒𝑠 𝑥 DynPowerhost_CPU without Hardware Counter: DynPowerprocess i = 𝐵𝐹𝑃 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖 𝛴𝐵𝑃𝐹 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑥 DynPowerhost_CPU DynPowercontainer j = Σ 𝑖 𝜖 𝑗 DynPowerprocess i - Evenly distribution of Idle Power Powercontainer j = IdlePowerhost_CPU / numContainers GPU (nvml)
  • 18. Kepler Model Server Project facilitate training power model for server without power meter Bare-metal (BM) Kepler Estimated System Power Metrics Ratio Power Model Process/Container Power Consumption Virtual Machine (VM) Trained Power Model Bare-metal (BM) RAPL ACPI/Sensors Redfish/IPMI GPU (nvml) Kepler Ratio Power Model Process/Container Power Consumption Server with power meter Server without power meter Kepler Model Server Motivation: • No power measurement exposed or instrumented in some running systems Challenges: • No or not-enough data to train power model specific to all available metrics and emerging system platform and settings (e.g., variety of CPU architecture, Frequency governor) • Dynamicity of control plane processes Collect Data Train Model Export Model Serve Model Estimate Power
  • 19. core of Kepler model server Pipeline Framework (one extractor, one isolator, multiple trainers ) Extract … Prometheus query result Extracted data Isolated data Power models Node-level Train Container-level Train Isolate Energy metric Energy-related metric (s) with background power without background power https://www.cncf.io/blog/2023/10/11/exploring-keplers-potentials-unveiling-cloud-application-power-consumption/
  • 20. The Issue with Third- Party Clouds  No server power metric available  No knowledge of what else is running on my machine  how to split idle power?   Limited knowledge of the architecture and configuration of the bare metal servers  Challenge for applying separately trained power models…   ALL Cloud Native calculators are too coarse grained to be useful for optimization ..  Generated with Dall-E
  • 21. https://adrianco.medium.com/proposal-for-a-realtime-carbon-footprint-standard-60b71c269948 Adrian Cockcroft How can we get to real time monitoring of application carbon consumption in third party clouds? Consistent Trustworthy Transparent Explainable Can Kepler help? What else do we need? WIP: Reference Implementation to be Open Sourced.
  • 22. 23 Act 23 Energy and CFP per workload, tenant, VM, container, Service, Etc. Identify hotspots and applicable strategies. Calculate potential savings. Assess Estimate A set of controllers to dynamically optimize the Carbon footprint at operation. Design efficient systems Report Report CFP across your entire organization in a consistent fashion factoring in requirements Carbon Assessment & Reduction Framework An Approach for Sustainable Computing
  • 23. Detect non-productive workloads • Virtual Machines • Cloud-native deployments • Cloud services Can schedules be drawn up for a few (if not all) productive workloads? Workload Classification: Motivation
  • 24. Methodology Workload* Classification Phase Abstraction Inactive/ active phases Non-repeatable Constantly Productive Alternating Workload Timetabling Candidate for Termination Candidate for Parking No Action Repeatable Recommendation Metrics Non-productive • Non-productive: Remaining in the Inactive Phase • Constantly Productive: Remaining in the Active Phase • Alternating: Switching between the two Phases VM1 VM2 VM𝑁 𝑇 − 𝑤𝑐 7/14/21 Days 𝑇
  • 25. 26 Act 26 Energy and CFP per workload, tenant, VM, container, Service, Etc. Identify hotspots and applicable strategies. Calculate potential savings. Assess Estimate A set of controllers to dynamically optimize the Carbon footprint at operation. Design efficient systems Report Report CFP across your entire organization in a consistent fashion factoring in requirements Carbon Assessment & Reduction Framework An Approach for Sustainable Computing
  • 26. CARE: Carbon Quantification & Reduction Coordinated set of controllers to dynamically quantify and optimize the carbon footprint in every level of the hybrid cloud stack in and across on and off prem data centers Container Right-Sizing Dynamic dispatching Energy aware scheduler VM placement Power management Container Right-Sizing Energy aware scheduler VM placement Power management CFP =EIT × PUE × CI Leverage renewable energy when and where it is available across datacenters. Efficiency with container resource consumption within a datacenter. Efficient infrastructure with VM and power management 27
  • 27. Part2: AI Sustainability IBM Research / Doc ID / Month XX, 2020 / © 2020 IBM Corporation 28
  • 28.
  • 29. The energy cost of AI  Deep learning is computationally intensive  Time consuming even with high-performance computing resources Take for example: Training Image recognition model Dataset: ImageNet-22K Network: ResNet-101 256 GPUs 7 hours ~450kWh 4 GPUs 16 days ~385 kWh 1 model training run is ~2 weeks of home energy consumption https://arxiv.org/abs/1708.02188
  • 30. AI demand keeps surging Training requirements are doubling every 3.5 months Source: Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2019. Energy and Policy Considerations for Deep Learning in NLP. CoRR abs/1906.02243 (2019). arXiv:1906.02243 Source: Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2019. Green AI. arXiv:1907.10597 [cs.CY]
  • 31. The emergence of foundation models Homogenization: a broad foundation model is adapted to perform specific tasks. Almost all state-of- the-art NLP models are now adapted from one of a few foundation models, such as BERT, RoBERTa, BART, T5, etc. Multi modal, and cross domains are next. Source: RishiBommasani,DrewA.Hudson,EhsanAdeli,RussAltman,SimranArora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Dora Demszky, and Chris Donahue et al. 2022. On the Opportunities and Risks of Foundation Models. Models. arXiv:2108.07258 [cs.LG]
  • 32. Sizes of Language Models Training Cost of Language Model GPT-3 needs 1024 A100 GPUs for 34 days for training! Large language models are getting larger Some say that this is okay, because they are re-used for multiple tasks* This claim is yet to be substantiated based on a sound analysis *E.g., DavidPatterson,JosephGonzalez,QuocLe,ChenLiang,Lluis-MiquelMunguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. 2021. Carbon Emissions and Large Neural Network Training.
  • 33. Data Scientist Dilemma: to adapt or not to adapt • To adapt from a broad model, or, to train a smaller model on a more specific data set? • How much data to use? • Can I synthesize a few smaller models? • Neural Architecture Search? Hyper Parameter Optimization? Is it worth the cost? well, it depends…. • What is the optimal frequency of re-training? Daily? Weekly? what data shall I use for re-training? incremental? Complete?
  • 34. Sustainable AI platform principles Transparency dynamically track energy and carbon across the data and model life cycle Traceability and Governance track the ‘supply chain’ of models and data-sets and associated energy and carbon Energy Efficiency Innovation across all layers of the stack Meaningful Metrics 10/17/2023 35
  • 35. Meaningful Metrics categories data- set model Products Core Metric Life-cycle Efficiency Construction Operation Construction pre-training 10/17/2023 Operation re- training Inference Life-cycle factor-in the provenance of models and data-sets and their associate energy and carbon footprint (Life-Cycle-Assessment principles) D FM M Efficiency efficiency = 𝑐𝑜𝑠𝑡 𝑤𝑜𝑟𝑘 𝑝𝑟𝑜𝑑𝑢𝑐𝑒𝑑 what goes into ‘cost’?  compute for inference  +training  +bill of material ‘tax’
  • 36. holistic approach to Sustainable AI Factor-in the entire life cycle of models Sustainable strategy exploration and what-if analysis Provenance, Governance, and reporting Holistic impact analysis and tradeoff based planning AI Sustainability Metrics
  • 38. The life-cycle of a model as a state machine Each ‘state transition’ is associated with a significant energy/carbon cost, and involve critical decisions, that will affect cost of this and downstream tasks. • Tradeoffs between accuracy, time-to-value, and energy/carbon • Cost of one phase may depend on decisions taken at a prior stage. save now, pay later…. • The particulars of the target task are important to factor in early on.
  • 39. On-Line Fine Grain monitoring of Energy and Carbon with Kepler • An open-source project pioneered by RedHat and IBM Research to quantify cloud native applications energy/carbon. • On road map to deliver in OCP and integrate in Rosa • Adrian Cockcroft advocating use of Kepler across all cloud providers “Real Time Energy and Carbon Standard for Cloud Providers” 10/17/2023 40
  • 40. SusQL: Context aware aggregation and energy accounting Infrastructure: Kubernetes controller with its own CRD that gets data from Kepler for aggregation susql-controller map[labels]->energy table 1 2 3 4 apiVersion: … kind: LabelGroup metadata: … spec: labels: - <label-1> - <label-2> - <label-3> - <label-4> status: totalEnergy: <total energy>
  • 42. A ‘Supply Chain’ of models Models are created (‘manufactured’) distilled, fine tuned, and rer-used (adapted) to created new models Deployment is just the beginning of the journey. How do we reason about the Life-Cycle Cost of models?
  • 43. Product Life Cycle Assessment Principles for Sustainable AI: Products = data-set | model We need to factor in the cost of the Bill of Material used in the creation of a new model If B (a product or a service) is used in the process of creation of A1, A2, … An, then the carbon cost of B is inherited by A1, A2, …, An in proportion to their use.
  • 46. Efficiency at every layer of the AI Stack • Every layer of the FM stack offer opportunity for efficiencies gains Model Quantization, architecture innovation Tools dynamic batching Platform Multiplexing, dispatching Infrastructure DVFS, power param optimization, caching, Systems Approximate computing and other system innovations • Empower the data scientist to make choices and explore tradeoffs between accuracy, performance, energy • Empower the data scientist to reason about life-cycle strategies: e.g., if/what/when to re-use, and how much to retrain
  • 48. 49 IBM Research’s Artificial Intelligence Unit (AIU) Chip architecture optimized for enterprise AI workloads Enabled for Foundation Models Enabled in the Red Hat software stack Supports multi-precision inference (& training) FP16, FP8, INT8, INT4, INT2 Implemented in leading edge 5nm technology https://research.ibm.com/blog/ibm-artificial-intelligence-unit-aiu SoC implements IBM’s leadership innovations in low-precision AI arithmetic and algorithms IBM Research AI Hardware Center / © 2023 IBM Corporation
  • 49. 50 Vision for AI Performance Scaling • Applying Approximate Computing techniques to AI compute • Critical requirement: maintain model accuracy • Advantage: Quadratic improvement in performance • IBM Research has been at the forefront of every major technical advancement on bit-precision scaling • 16-bit training (2015) • 8-bit training (2018, 2019) • 4-bit training (2020) • 2/4-bit Inference (2018-2020) • Complemented by • Sparsity support • Analog Computing • 3D Stacking Digital AI Cores Scaling precision for quadratic gains in performance with iso-accuracy 4-bit Inference ASICs J.Choi et al., https://arxiv.org/pdf/1805.06085.pdf J.McKinstry et al., https://arxiv.org/abs/1809.04191 2-bit Inference ASICs J.Choi et al., SysML 2019 0.1 1 10 100 2012 2015 2018 2021 2024 16-bit 32-bit 16-bit 8-bit 8-bit 2-bit 4-bit 4-bit 16-bit Training ICML 2015 Training Inference 4-bit Training X. Sun et al NeurIPS 2020 8-bit Training NeurIPS 2018, 2019 4-bit Inference J.Choi et al.,arxiv 2018 2-bit Inference J.Choi et al., SysML 2019 Bit Precision https://research.ibm.com/blog/ibm-artificial-intelligence-unit-aiu https://research.ibm.com/blog/ai-chip-precision-scaling
  • 51. Vela: A Cloud Native Supercomputer for the Foundation Model Age (Kepler inside) System specifications – Nodes with 8 x A100 GPUs (80GB) – GPUs interconnected with NVLink, NVSwitch – Cascade Lake CPUs, 1.5TB of DRAM, – Four 3.2TB NVMe drives – Redundant connections between nodes, TORs and spines – 2 x 100G NICs from each node – NCCL benchmarks show we drive close to line rate https://research.ibm.com/blog/AI-supercomputer-Vela-GPU-cluster – Configure resources through software (APIs) – Broad ecosystem of available cloud services – Leverage data sets on Cloud Object Store – Standard, flexible, scalable infrastructure design (vs traditional HPC) – Near bare metal performance (within 5%, single node) How do you evolve from specialized (monolithic), costly, and inflexible HPC stack to Cloud Native Stack without compromising efficiency ? - Programmability - Scalability - Re-use - Observability - Agility - Democratization
  • 53. Dispatching of jobs based on renewable energy 54 Motivation:  Carbon intensity of the energy mix of different regions of IBM data centers varies over time.  Renewable energy is not available all the time and in all places. Workload Optimization: Placement and scheduling of workloads based on carbon-free energy availability. Ideal dispatching: High CPU utilization when carbon intensity is low and low CPU utilization when carbon intensity is high. T. Bahreini, A. Tantawi and A. Youssef, "An Approximation Algorithm for Minimizing the Cloud Carbon Footprint through Workload Scheduling," 2022 IEEE 15th International Conference on Cloud Computing (CLOUD), 2022, pp. 522-531, Challenge: Ideal dispatching might be practically infeasible.  Short jobs may have short deadline.  Some jobs are not interruptible.  Jobs have heterogenous resource demands. Obtaining the optimal packing is intractable.
  • 54. 55 In a single data center how to order batch jobs for minimum carbon while meeting deadlines. polynomial approximation algorithms that works across data centers (space x time).
  • 56. 57
  • 57. 58
  • 58. Call to action: AI Platform providers: - Build-in transparency and governance - Incorporate platform and system innovation for efficiency. Academia & Industry: Focus you Research on Efficiency not just accuracy Data Scientists / Practitioners: Develop a sustainability mind-set Re-use where it makes sense Domain specific, smaller models are better! Explore tradeoffs (accuracy vs cost) 59
  • 59. Tokyo Shin-Kawasaki Delhi Bangalore Singapore Nairobi Haifa Zurich Warrington Dublin Cambridge Albany Yorktown Almaden Rio de Janeiro Sao Paulo Johannesburg 6 Nobel Laureates 10 Medals of Technology 5 National Medals of Science 6 Turing Awards IBM Research

Editor's Notes

  1. energy-efficiency throughout the memory hierarchy. Since different tasks require a different system composition for best utilization, the data centers need to be rearchitected in the future using disaggregation and composability. This allows flexible composition and ystem configuration to optimally serve a particular task. Considering that the various technical components (CPUs, GPUs, memory, and storage) have different lifecycles, disaggregation additionally improves the system performance and reduces cost, as they can be replaced separately. Common memory systems for AI/ML applications include on-chip memory, high bandwidth memory (HBM), and GDDR—and all have different architectural implications. A universal goal is to realize memory technology with much higher bandwidth and lower latency, while consuming less energy. While HBM DRAMs are already very power-efficient, roughly 2/3 of the power budget is still spent moving data between an SoC and the DRAM (Figure 2.4)5. Reducing the volume of data moved provides an opportunity for large improvement, this requires further research. Different concepts for disaggregation of memory and storage are already proposed, but more research is needed to identify the best way to use disaggregation to achieve TCO benefits at scale and improve latency. To generate these benefits, a multi-tiered memory approach that includes the use of storage-class memories is needed. The new architectures can pose a challenge but can also provide an opportunity for application development. The impact to legacy code needs to be understood and mitigated.
  2. Foundation models have led to an unprecedented level of homogenization: Almost all state-of- the-art NLP models are now adapted from one of a few foundation models, such as BERT, RoBERTa, BART, T5, etc.
  3. Training GPT-3, which is a single general-purpose AI program that can generate language and has many different uses, took 1.287 gigawatt hours, according to a research paper published in 2021, or about as much electricity as 120 US homes would consume in a year. That training generated 502 tons of carbon emissions, according to the same paper, or about as much as 110 US cars emit in a year. That’s for just one program, or “model.” While training a model has a huge upfront power cost, researchers found in some cases it’s only about 40% of the power burned by the actual use of the model, with billions of requests pouring in for popular programs. Plus, the models are getting bigger. OpenAI’s GPT-3 uses 175 billion parameters, or variables, that the AI system has learned through its training and retraining. Its predecessor used just 1.5 billion. Another relative measure comes from Google, where researchers found that artificial intelligence made up 10 to 15% of the company’s total electricity consumption, which was 18.3 terawatt hours in 2021. That would mean that Google’s AI burns around 2.3 terawatt hours annually, about as much electricity each year as all the homes in a city the size of Atlanta. https://www.bloomberg.com/news/articles/2023-03-09/how-much-energy-do-ai-and-chatgpt-use-no-one-knows-for-sure
  4. Packaged as a PCIe card, for ease of integration into virtually any on-premises or cloud system Integration into the IBM Watson software stack underway, to power the AI inference infrastructure of IBM Research’s Foundation Model Big Bet Packaged as a PCIe card, for ease of integration into virtually any on-premises or cloud system Enabled in the Red Hat software stack including PyTorch and TensorFlow integration
  5. we can drop from 32-bit floating point arithmetic to bit-formats holding a quarter as much information. This simplified format dramatically cuts the amount of number crunching needed to train and run an AI model, without sacrificing accuracy. We leverage key IBM breakthroughs from the last five years to find the best tradeoff between speed and accuracy. This is not a chip we designed entirely from scratch. Rather, it’s the scaled version of an already proven AI accelerator built into our Telum chip that power Z 16 System.
  6. So, we asked ourselves: how do we deliver bare-metal performance inside of a VM? Following a significant amount of research and discovery, we devised a way to expose all of the capabilities on the node (GPUs, CPUs, networking, and storage) into the VM so that the virtualization overhead is less than 5%, which is the lowest overhead in the industry that we’re aware of. This work includes configuring the bare-metal host for virtualization with support for Virtual Machine Extensions (VMX), single-root IO virtualization (SR-IOV), and huge pages. We also needed to faithfully represent all devices and their connectivity inside the VM, such as which network cards are connected to which CPUs and GPUs, how GPUs are connected to the CPU sockets, and how GPUs are connected to each other. These, along with other hardware and software configurations, enabled our system to achieve close to bare metal performance. Bare Metal vs. VMs || Ethernet vs Infiniband || openhsift scheduling (MCAD) vs. LSF enabling SR-IOV for our network interface cards on each node, thereby exposing each 100G link directly into the VMs via virtual functions.  we can hide the communication time over the network behind compute time occurring on the GPUs. This approach is aided by our choice of GPUs with 80GB of memory (discussed above), which allows us to use bigger batch sizes (compared to the 40 GB model), and leverage the Fully Shared Data Parallel (FSDP) training strategy more efficiently.  Next we’ll be rolling out an implementation of remote direct memory access (RDMA) over converged ethernet (RoCE) at scale and GPU Direct RDMA (GDR), to deliver the performance benefits of RDMA and GDR while minimizing adverse impact to other traffic. Our lab measurements indicate that this will cut latency in half.