AI-Sustainability.pptx

AI &
Sustainability
Dr. Tamar Eilam
IBM Fellow, Chief Scientist Sustainable Computing,
IBM Research
Generated by Dall-E

Approximate and Partial list
of contributors in arbitrary
order
3
Energy modeling and quantification Marcelo Amaral, Huamin Chen, Tatsuhiro Chiba,
Rina Nakazawa, Sunyanan Choochotkaew, Eun K Lee, Umamaheswari Devi, Aanchal
Goyal Workload Classification Xi Yang, Rohan R Arora, Chandra Narayanaswami,
Cheuk Lam, Jerrold Leichter, Yu Deng, Daby Sow Energy Aware Optimization Tatebeh
Bahreini, Asser Tantawi, Alaa Youssef, Chen Wang, AI System Jeffrey Burns, Leland
Chang, Ankur Agrawal, Kailash Gopalakrishnan, Pradip Bose AI Quantification and
Metric Pedro Bello-Maldonado, Bishwaranjan Bhattacharjee, Carlos Costa, AI
Infrastructure Innovation Seelam Seetharami Model Architecture Innovation David Cox,
Rameswar Panda, Rogerio Feris, Leonid Karlinsky

The Climate Impact Chain
Human
activity
Increased
Green House
Gas (GHG) in
atmosphere
Global
warming
Global
climate
change
Physical
&
biological
impact
Human socio-
economic
impact
$150 billion
Average cost in damages per year
100M+
Increase in population facing hunger
IBM Research | © 2022 IBM Corporation

5
Mitigation
Carbon
Capture
Geo-
engine
ering
Reduce Carbon
Emission
Sustainable
Computing

6
Mitigation
Carbon
Capture
Geo-
engineering
Reduce Carbon Emission
Sustainable
Computing
adaptation
AI
Harness the power of AI to
fight climate change
material discovery
climate and risk monitoring we also have to mitigate its affect on the environment

Part1: Sustainable
Computing
IBM Research / Doc ID / Month XX, 2020 / © 2020 IBM Corporation 7

What is Sustainable Computing ?
8
Ability to measure, quantify, and ultimately reduce carbon
footprint at every layer of the computing stack, in- and across-
data centers, and across the entire life cycle.

The Computer Energy Problem
9
We are at an inflection point :
3. The end of Dennard
Scaling means we can’t
keep up
 Some predict that electricity consumed by Data Centers will increase to 8% by 2030
 Golden Era for Chip Design
1. Demand is growing at
exponential scale
How to stop data centers from gobbling up the
world’s electricity
https://www.nature.com/articles/d41586-018-
06610-y
2. The emergence of
energy-demanding
workloads(AI)
AI power consumption doubles
every 3-4 months
* Green AI, R. Schwartz, J. Dodge,
N. A. Smith, O. Etzioni 2019

Ever rising energy demands
for computing vs. global
energy production is
creating new risk, and new
opportunities for radically
different computing
to drastically improve
efficiency
31%
a years the energy consumption
increase trend for hyperscalers in
North America
>10%
of the world's power will be
consumed by hyperscalers by 2030
IBM Research | © 2022 IBM Corporation 10

Sustainable Computing epochs
Making the Current State
More Sustainable
Introducing Accelerators
(Digital)
& Hardware and Software
co-design / co-optimization
New Computational Models
(beyond digital)
 Understanding the As-Is
 Hot Spot Detection
 Remediation and Optimization
 Coupling Power and Cloud
 Cooling, Data Center Planning, etc
 Storage AutoTiering.
 HW and SW co-design (scalable
approach)
 Reduced precision chips – 8bit
precision approximate computing
 Voltage scaling with error correction
 Runtime management of dis-
aggregated & composable
heterogenous DC
 New computational
models that completely
break the relationship
between energy & computation:
neuromorphic, analog AI, data-
centric,
quantum, etc.
https://research.ibm.com/blog/telum-
processor
https://www.esp.cs.columbia.edu
https://research.ibm.com/blog/the-
hardware-behind-analog-ai
https://www.zurich.ibm.com/sto/memory/
IBM Research | © 2022 IBM Corporation 11

Carbon Intensity
The emission rate: grams of carbon
dioxide released
per megajoule of energy produced
—
With coal power stations, the carbon intensity
is high as CO2 is produced as part of the
power generation process.
Carbon intensity is >1 kg/kWh for coal;
—
Renewable energy such as hydro or solar
produce almost no emissions, so their carbon
intensity is very low.
Carbon intensity is ~0 for solar/wind
Modeling the Data Center Carbon
Footprint
12
x Carbon Intensity
Power usage effectiveness (PUE)
A predominant metric used to measure the energy
efficiency of a data center.
—
PUE = (Total Facility Energy) / ( IT Equipment
Energy)
Efficiency improves as the quotient decreases
towards 1.
1 is optimal, 2 is very bad.
Total Carbon Footprint
The total amount of carbon dioxide (CO2) and
equivalent green house gas emissions associated
with powering a data center.
CFP >= 0.
Carbon Footprint = IT Equipment Energy x Power Usage Effectiveness
CFP =EIT × PUE × CI
EIT
PUE
An example DC Energy Breakdown

Reducing the Data Center Carbon
Footprint: Research Opportunities
13
x Carbon Intensity
Carbon Footprint = IT Equipment Energy x Power Usage Effectiveness
CFP =EIT × ERE × CI
• Data Center Design, Cooling and Heat-
Reuse
• Rack Design to optimize power
conversion, and direct liquid cooling
• Improving power conversion in the data
center
• Energy Aware Scheduling, Vertical Scaling,
Dispatching
• Power Management
• Chip Design
• Dispatching of batch workload such as AI
Training Jobs across time and space to
maximize renewable energy use.
• Forecasting of renewable energy (time
series composition)
• Can the cloud sense renewable energy and
adapt?
https://research.ibm.com/blog/ibm-artificial-intelligence-
unit-aiu
https://www.zurich.ibm.com/st/energy_efficiency/zeroemiss
ion.html
https://research.ibm.com/blog/northpole-ibm-ai-chip

14
Act
14
Energy and CFP
per workload, tenant,
VM, container, Service,
Etc.
Identify hotspots
and applicable
strategies.
Calculate potential
savings.
Assess
Estimate
A set of controllers
to dynamically optimize the
Carbon footprint at
operation.
Design efficient systems
Report
Report CFP across your
entire organization in a
consistent fashion factoring
in requirements
Carbon Assessment & Reduction Framework
An Approach for Sustainable Computing

Energy Quantification
Challenge
• How do you
estimate the power
consumption of
applications
running on shared
servers?
• How do you do
that when you do
not have on-line
power
measurement at the
server level?
• How do you do that
if you do not know
what else is running
on the machine?
generated with Dal-E

Energy Quantification
Challenge
• How do you estimate the power
consumption of applications running on
shared servers?
=> ratio based approach
• How do you do that when you do not have
on-line power measurement at the server
level?
=> power modeling
• How do you do that if you do not know what
else is running on the machine?
=> dynamic power estimation only
• How do you scale the approach to
developing power models (combinatorial
explosion problem)?
16
The Kepler Project
https://github.com/sustainable-computing-io/kepler

17
[1] https://github.com/sustainable-computing-io/kepler
Kepler Architecture
• eBPF metrics:
hardware
counters, cpu
time and soft IRQ
• System Power
metrics from BMs
and VMs
• Ratio Power
Model for
containers
• Trained Power
Model to estimate
the VM’s
component
power
consumption

18
Kepler Deployment Approaches
- Ratio Power Model for Dynamic CPU Power
with Hardware Counter:
DynPowerprocess i =
𝐶𝑃𝑈 𝑐𝑦𝑐𝑙𝑒𝑠 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖
𝛴𝐶𝑃𝑈 𝑐𝑦𝑐𝑙𝑒𝑠
𝑥 DynPowerhost_CPU
without Hardware Counter:
DynPowerprocess i =
𝐵𝐹𝑃 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑝𝑟𝑜𝑐𝑒𝑠𝑠 𝑖
𝛴𝐵𝑃𝐹 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒
𝑥
DynPowerhost_CPU
DynPowercontainer j = Σ 𝑖 𝜖 𝑗 DynPowerprocess i
- Evenly distribution of Idle Power
Powercontainer j = IdlePowerhost_CPU / numContainers GPU (nvml)

Kepler Model Server Project
facilitate training power model for server without power meter
Bare-metal (BM)
Kepler
Estimated System
Power Metrics
Ratio Power Model
Process/Container
Power Consumption
Virtual Machine (VM)
Trained Power Model
Bare-metal (BM)
RAPL ACPI/Sensors
Redfish/IPMI GPU (nvml)
Kepler
Ratio Power Model
Process/Container
Power Consumption
Server with
power meter Server without
power meter
Kepler Model Server
Motivation:
• No power measurement exposed or instrumented in some running systems
Challenges:
• No or not-enough data to train power model specific to all available metrics and emerging system platform and
settings (e.g., variety of CPU architecture, Frequency governor)
• Dynamicity of control plane processes
Collect
Data
Train
Model
Export
Model
Serve
Model
Estimate
Power

core of Kepler model server
Pipeline Framework (one extractor, one isolator, multiple trainers )
Extract
…
Prometheus query result Extracted data Isolated data
Power models
Node-level
Train
Container-level
Train
Isolate
Energy metric
Energy-related
metric (s)
with background power
without background power
https://www.cncf.io/blog/2023/10/11/exploring-keplers-potentials-unveiling-cloud-application-power-consumption/

The Issue with Third-
Party Clouds
 No server power metric available
 No knowledge of what else is running on my machine
 how to split idle power? 
 Limited knowledge of the architecture and configuration of the bare metal servers
 Challenge for applying separately trained power models… 
 ALL Cloud Native calculators are too coarse grained to be useful for optimization .. 
Generated with Dall-E

https://adrianco.medium.com/proposal-for-a-realtime-carbon-footprint-standard-60b71c269948
Adrian Cockcroft
How can we get to real time monitoring
of application carbon consumption in third party
clouds?
Consistent
Trustworthy
Transparent
Explainable
Can Kepler help?
What else do we need?
WIP: Reference
Implementation to be
Open Sourced.

24
Act
24
Energy and CFP
Etc.
Identify hotspots
and applicable
strategies.
Calculate potential
savings.
Assess
Estimate
Carbon footprint at
operation.
Report
in requirements

Detect non-productive workloads
• Virtual Machines
• Cloud-native deployments
• Cloud services
Can schedules be drawn up for a
few (if not all) productive
workloads?
Workload Classification: Motivation

Methodology
Workload*
Classification
Phase
Abstraction
Inactive/
active
phases
Non-repeatable
Constantly
Productive
Alternating
Workload
Timetabling
Candidate for
Termination
Candidate for
Parking
No Action
Repeatable
Recommendation
Metrics
Non-productive
• Non-productive: Remaining in the Inactive Phase
• Constantly Productive: Remaining in the Active Phase
• Alternating: Switching between the two Phases
VM1
VM2
VM𝑁
𝑇 − 𝑤𝑐
7/14/21 Days
𝑇

28
Act
28
Energy and CFP
Etc.
Identify hotspots
and applicable
strategies.
Calculate potential
savings.
Assess
Estimate
Carbon footprint at
operation.
Report
in requirements

CARE: Carbon Quantification &
Reduction
Coordinated set of controllers to
dynamically quantify and
optimize the carbon footprint in
every level of the hybrid cloud
stack in and across on and off
prem data centers
Container
Right-Sizing
Dynamic
dispatching
Energy aware
scheduler
VM
placement
Power
management
Container
Right-Sizing
Energy aware
scheduler
VM
placement
Power
management
CFP =EIT × PUE × CI
Leverage renewable energy
when and where it is
available across datacenters.
Efficiency with container
resource consumption
within a datacenter.
Efficient infrastructure with
VM and power
management
29

Part2:
AI
Sustainability
Generated by Dall-E

The energy cost of AI
 Deep learning is computationally intensive
 Time consuming even with high-performance computing resources
Take for example: Training Image recognition model
Dataset: ImageNet-22K
Network: ResNet-101
256 GPUs
7 hours
~450kWh
4 GPUs
16 days
~385
kWh
1 model training run is ~2 weeks of
home energy consumption
https://arxiv.org/abs/1708.02188

AI demand keeps surging Training requirements
are doubling every 3.5
months
Source: Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2019. Energy and
Policy Considerations for Deep Learning in NLP. CoRR abs/1906.02243 (2019).
arXiv:1906.02243
Source: Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2019. Green AI.
arXiv:1907.10597 [cs.CY]

The emergence of foundation models
Homogenization: a broad foundation
model is adapted to perform specific tasks.
Almost all state-of- the-art NLP models are
now adapted from one of a few foundation
models, such as BERT, RoBERTa, BART,
T5, etc.
Multi modal, and cross domains are next.
Source: RishiBommasani,DrewA.Hudson,EhsanAdeli,RussAltman,SimranArora, Sydney von Arx, Michael S.
Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card,
Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora
Dora Demszky, and Chris Donahue et al. 2022. On the Opportunities and Risks of Foundation Models.
Models. arXiv:2108.07258 [cs.LG]

Sizes of Language Models Training Cost of Language Model
GPT-3 needs 1024 A100 GPUs for 34 days for training!
Large language models are getting larger
Some say that this is okay, because they are re-used for multiple tasks*
This claim is yet to be substantiated based on a sound analysis
*E.g., DavidPatterson,JosephGonzalez,QuocLe,ChenLiang,Lluis-MiquelMunguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. 2021. Carbon Emissions and Large Neural Network Training.

Data Scientist Dilemma: to adapt or not to
adapt
• To adapt from a broad model,
or, to train a smaller model on a more specific data set?
• How much data to use?
• Can I synthesize a few smaller models?
• Neural Architecture Search? Hyper Parameter Optimization?
Is it worth the cost? well, it depends….
• What is the optimal frequency of re-training?
Daily? Weekly?
what data shall I use for re-training? incremental? Complete?

Sustainable AI platform principles
Transparency dynamically track
energy and carbon across the data
and model life cycle
Traceability and Governance track
the ‘supply chain’ of models and
data-sets and associated energy
and carbon
Energy Efficiency Innovation across
all layers of the stack
Meaningful
Metrics
11/3/2023 37

Meaningful Metrics categories
data-
set
model
Products
Core Metric
Life-cycle
Efficiency
Construction Operation Construction
pre-training
11/3/2023
Operation
re-
training
Inference
Life-cycle
factor-in the provenance of models and data-sets and their associate
energy and carbon footprint (Life-Cycle-Assessment principles)
D FM M
Efficiency efficiency =
𝑐𝑜𝑠𝑡
𝑤𝑜𝑟𝑘 𝑝𝑟𝑜𝑑𝑢𝑐𝑒𝑑
what goes into ‘cost’?
 compute for inference
 +training
 +bill of material ‘tax’

holistic approach to Sustainable AI
Factor-in the entire life cycle of models
Sustainable strategy exploration and what-if analysis
Provenance, Governance, and reporting
Holistic impact analysis and tradeoff based planning
AI Sustainability Metrics

The life-cycle of a model as a state machine
Each ‘state transition’ is associated with a significant energy/carbon cost,
and involve critical decisions, that will affect cost of this and downstream tasks.
• Tradeoffs between
accuracy,
time-to-value, and
energy/carbon
• Cost of one phase
may depend on
decisions taken
at a prior stage.
save now, pay
later….
• The particulars of
the target task are
important to factor in
early on.

On-Line Fine Grain monitoring of Energy and
Carbon with Kepler
• An open-source project pioneered by
RedHat and IBM Research to quantify
cloud native applications
energy/carbon.
• On road map to deliver in OCP and
integrate in Rosa
• Adrian Cockcroft advocating use
of Kepler across all cloud providers
“Real Time Energy and Carbon Standard
for Cloud Providers”
11/3/2023 42

SusQL: Context aware aggregation and energy accounting
Infrastructure: Kubernetes controller with its own CRD that gets data from Kepler for
aggregation
susql-controller
map[labels]->energy table
1 2
3
4
apiVersion: …
kind: LabelGroup
metadata: …
spec:
labels:
- <label-1>
- <label-2>
- <label-3>
- <label-4>
status:
totalEnergy: <total energy>

Can we connect the dots? Kepler + Kubeflow
source: https://cloud.google.com/blog/topics/developers-practitioners/scalable-ml-workflows-using-pytorch-kubeflow-pipelines-and-vertex-pipelines
KubeFlow Pipeline Example Associated Meta-Data
Can we leverage Kepler
to add energy
data?

A ‘Supply Chain’ of models
Models are created (‘manufactured’)
distilled, fine tuned, and rer-used
(adapted) to created new models
Deployment is just the beginning of
the journey.
How do we reason about the Life-Cycle
Cost of models?

Product Life Cycle Assessment Principles
for Sustainable AI:
Products = data-set | model
We need to factor in the cost of the Bill of Material used in the creation of a new model
If B (a product or a service) is used in the process of creation of A1, A2, … An, then the carbon cost of B
is inherited by A1, A2, …, An in proportion to their use.

Efficiency at
Every Layer
11/3/2023 49

Efficiency at every layer of the AI Stack
• Every layer of the FM stack offer opportunity for efficiencies gains
Model Quantization,
architecture innovation
Tools dynamic batching
Platform Multiplexing, dispatching
Infrastructure DVFS, power param
optimization, caching,
Systems Approximate computing
and other system
innovations
• Empower the data scientist to make choices and explore tradeoffs between accuracy, performance, energy
• Empower the data scientist to reason about life-cycle strategies: e.g., if/what/when to re-use, and how much
to retrain

Systems innovation
11/3/2023 51

52
IBM Research’s Artificial Intelligence Unit (AIU)
Chip architecture optimized for enterprise AI workloads
Enabled for Foundation Models
Enabled in the Red Hat software stack
Supports multi-precision inference (& training)
FP16, FP8, INT8, INT4, INT2
Implemented in leading edge 5nm technology
https://research.ibm.com/blog/ibm-artificial-intelligence-unit-aiu
SoC implements IBM’s leadership innovations
in low-precision AI arithmetic and algorithms
IBM Research AI Hardware Center / © 2023 IBM Corporation

53
Vision for AI Performance Scaling
• Applying Approximate Computing techniques to AI compute
• Critical requirement: maintain model accuracy
• Advantage: Quadratic improvement in performance
• IBM Research has been at the forefront of every major
technical advancement on bit-precision scaling
• 16-bit training (2015)
• 8-bit training (2018, 2019)
• 4-bit training (2020)
• 2/4-bit Inference (2018-2020)
• Complemented by
• Sparsity support
• Analog Computing
• 3D Stacking
Digital AI Cores
Scaling precision for quadratic gains in performance with iso-accuracy
4-bit Inference ASICs
J.Choi et al., https://arxiv.org/pdf/1805.06085.pdf
J.McKinstry et al., https://arxiv.org/abs/1809.04191
2-bit Inference ASICs
J.Choi et al., SysML 2019
0.1
1
10
100
2012 2015 2018 2021 2024
16-bit
32-bit
16-bit
8-bit
8-bit
2-bit
4-bit
4-bit
16-bit Training
ICML 2015
Training
Inference
4-bit Training
X. Sun et al NeurIPS 2020
8-bit Training
NeurIPS 2018, 2019
4-bit Inference
J.Choi et al.,arxiv 2018
2-bit Inference
J.Choi et al., SysML 2019
Bit
Precision
https://research.ibm.com/blog/ibm-artificial-intelligence-unit-aiu
https://research.ibm.com/blog/ai-chip-precision-scaling

54
Northpole: Neural-inspired memory-on-chip architecture
to overcome the von-neumann bottleneck
NorthPole is 25 times more energy efficient,
when it comes to the number of frames
interpreted per joule of power.

Infrastructure innovation
11/3/2023 55

Vela: A Cloud Native Supercomputer for the Foundation Model Age (Kepler inside)
System specifications
– Nodes with 8 x A100 GPUs (80GB)
– GPUs interconnected with NVLink, NVSwitch
– Cascade Lake CPUs, 1.5TB of DRAM,
– Four 3.2TB NVMe drives
– Redundant connections between nodes, TORs and
spines
– 2 x 100G NICs from each node – NCCL benchmarks
show we drive close to line rate
https://research.ibm.com/blog/AI-supercomputer-Vela-GPU-cluster
– Configure resources through software (APIs)
– Broad ecosystem of available cloud services
– Leverage data sets on Cloud Object Store
– Standard, flexible, scalable infrastructure design (vs
traditional HPC)
– Near bare metal performance (within 5%, single node)
How do you evolve from
specialized (monolithic), costly,
and inflexible HPC stack to Cloud
Native Stack without
compromising efficiency ?
- Programmability
- Scalability
- Re-use
- Observability
- Agility
- Democratization

11/3/2023
57
Platform Innovation

Dispatching of jobs based on renewable energy
58
Motivation:
 Carbon intensity of the energy mix of different
regions of IBM data centers varies over time.
 Renewable energy is not available all the time
and in all places.
Workload Optimization: Placement and scheduling
of workloads based on carbon-free energy
availability.
Ideal dispatching: High CPU utilization when
carbon intensity is low and low CPU utilization when
carbon intensity is high.
T. Bahreini, A. Tantawi and A. Youssef, "An Approximation Algorithm for Minimizing the Cloud Carbon Footprint
through Workload Scheduling," 2022 IEEE 15th International Conference on Cloud Computing (CLOUD), 2022, pp.
522-531,
Challenge: Ideal dispatching might be practically
infeasible.
 Short jobs may have short deadline.
 Some jobs are not interruptible.
 Jobs have heterogenous resource demands.
Obtaining the optimal packing is intractable.

59
IEEE Cloud 2023 – dispatching
(placement & scheduling) across
data centers to minimize carbon
IEEE Cloud 2022 polynomial approximation algorithms.
scheduling in a single data center to minimize carbon.

Dispatch Workloads onto Clusters
Spoke Cluster 1
Spoke Cluster 2
Hub Cluster
MCAD
Dispatcher Spoke Cluster 3
MCAD
Runner
KubeStellar
MCAD Dispatcher
• queue & dispatch jobs
• resource allocation
• quota management
• requeue & retry jobs
MCAD Runner
• run & monitor jobs
• monitor cluster
KubeStellar
• downsync job spec
• upsync job status
• upsync cluster status
placement
engine

KubeStellar https://github.com/kubestellar/kubestellar

MCAD – Multi Cluster Application Dispatcher
https://github.com/project-codeflare/multi-cluster-app-dispatcher
• Hard constraints
• filter clusters
• filter workloads
• Soft constraints
• score clusters
• score workloads
• Space dimension
• global vs. individual decisions
• aggregates
• Time dimension
• in-time vs. ahead-of-time
• delayed start, suspend/resume

11/3/2023
63
Models & Tools Innovation

Call to action:
AI Platform providers:
- Build-in transparency and governance
- Incorporate platform and system innovation for efficiency.
Academia & Industry: Focus you Research on Efficiency not just
accuracy
Data Scientists / Practitioners: Develop a sustainability
mind-set
Re-use where it makes sense
Domain specific, smaller models are better!
Explore tradeoffs (accuracy vs cost)
66

Tokyo
Shin-Kawasaki
Delhi
Bangalore
Singapore
Nairobi
Haifa
Zurich
Warrington
Dublin
Cambridge
Albany
Yorktown
Almaden
Rio de Janeiro
Sao Paulo Johannesburg
6 Nobel Laureates 10 Medals of Technology 5 National Medals of Science 6 Turing Awards
IBM Research

AI-Sustainability.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AI-Sustainability.pptx

Similar to AI-Sustainability.pptx (20)

Recently uploaded

Recently uploaded (20)

AI-Sustainability.pptx

Editor's Notes