Workload Transformation &
Innovation in POWER
Architecture
16th
July 2021
IIT Roorkee – Invited Talk
Satish Kumar Sadasivam
satsadas@in.ibm.com
Senior Performance Architect, Master Inventor
IBM Systems
Agenda
l New Era of IT - Pillars of change
l Workload Transformation
l Building blocks of Compute Infrastructure
l AI & Hardware
l Introduction to POWER ISA & MMA
l POWER10 & its AI acceleration capabilities
New Era of IT is defined and driven by AI &
Cloud
AI helps enrich the user
experience
Cloud defines/drives the next
generation software design &
development architecture and
deployment Infrastructure
Enterprise Workload
Transformation (AI & Cloud)
AI
Workflow
Monolithic
vs
Microservice
https://docs.oracle.com/en/solutions/learn-architect-microservice/index.html#GUID-1A9ECC2B-F7E6-430F-8EDA-911712467953
AI as a service
Lot of cloud companies are adapting AI as a service
solution through their cloud offering
• IBM Watson Cloud
• Amazon Web Services (AWS)
• Microsoft Azure
• Google Cloud
• Advantages
• Advanced infrastructure with minimal cost
• Pay for What you Use
• Ease of Usage
• Options for Scalability
• Disadvantages
• Reduced Security
• Increasing reliance on Third-Parties
• Long-Term Costs
• Reduced Transparency https://www.researchgate.net/figure/AI-Platform-Architecture-21_fig7_332555792
https://www.geeksforgeeks.org/what-is-artificial-intelligence-as-a-service-aiaas-in-the-tech-industry/
Enterprise Database
architecture Domain:
• Three Tier Architecture
Characteristics:
• Highly Data centric with less
compute demand.
• System Performance
dominated by throughput and
bandwidth.
Applications & Languages:
• C,C++ and JAVA
• Applications were highly
monolithic
• Traditional SQL databases &
structured data.
Domain:
• Cloud Native Architecture
• Infusion of compute centric AI algorithms into
the enterprise workflow.
Characteristics:
• Highly Data centric with huge compute
demand.
• Throughput and Bandwidth still important
• AI Compute takes the center stage.
• High Sensitivity towards network performance
due to cloud architecture.
Applications & Languages:
• Traditional C,C++ and JAVA still has
importance
• Heavy use of new interpreter languages
(Python, Ruby, Golang etc)
• Applications are no longer monolithic. Moving
towards microservices.
• NoSQL databases takes dominance &
unstructured data.
Classical Enterprise
Workloads
Newer Generation Enterprise Workloads
What is the foundation to make
this transformation successful ?
7 Pillars of Compute Infrastructure
• Semiconductor Technology Process
• CPU – Architecture & Microarchitecture
• Accelerators (GPUs, DSAs, FPGAs)
• Memory
• Interconnect
• Storage
• Network
Performance at Scale
Reliability & Availability
Security
AI & Impact on Systems
AI Machine Learning/Deep Learning &
Workload Motivation
AI is impacting the entire computing stack
Although AI computing is associated with traditional computation, it also has new computational characteristics, including:
1. The processing content is often unstructured data, such as video, image, voice and texts.
2. Processing usually requires a large amount of computation. The basic calculation is mainly linear algebraic operations, such as tensor processing, while the
control flow is relatively simple. massively parallel compute hardware is more suitable.
3. Parameters are large, requiring huge storage capacity, high bandwidth, low latency memory access capacity, and rich and flexible connections between
computing units and memory devices. Data localization is prevalent and suitable for data reuse and near memory computation.
AI acceleration – Hardware Design approaches
14
CPU / On-chip hardware
acceleration for AI
GPU/DSA/FPGA off-chip hardware
acceleration for AI
Flexibility Efficiency
POWER ISA & Matrix Math Assist
(MMA)
AI Acceleration on CPU (POWER Architecture)
ISA Examples
• PowerPC
• RISC V
• SPARC
• IA32
• IA64(Itanium)
• ARM
POWER
ISA
Timeline
MMA	Architecture
• MMA	architecture	support	is	introduced	in	POWER	ISA	V3.1.
• MMA	architecture	introduces	new	set	of	instructions	to	support	dense	matrix	math	operations	
along	with	required	changes	for	register	handling	and	management.
• These	Matrix-Multiply	Assist	instructions	lead	to	very	efficient	implementations	for	key	algorithms	
in	technical	computing,	machine	learning,	deep	learning	and	business	analytics,	it	is	a	natural	
match	for	implementing	dense	numerical	linear	algebra	computations
• We	have	also	shown	application	to	other	computations	such	as	convolution
• Scope	for	future	research	to	support	other	computation	in	the	area	of	arbitrary	precision	
arithmetic,	discrete	Fourier	transform	etc.
Data	Types	&
Compute	Instructions
https://ibm.ent.box.com/s/hhjfw0x0lrbtyzmiaffnbxh2fuo0fog0
Data	types MMA	architecture	supports	4	floating-point	data	types	which	includes	FP32	(IEEE	single-precision),	FP64	(IEEE	
double-precision),	FP16	(IEEE	half-precision)	and	bfloat16.	In	addition	to	floating-point	data	types,	the	MMA	
architecture	also	supports	integer	operation	of	various	types.	Integer	operations	are	supported	at	INT16,	INT8	
and	INT4	types	(signed	16-,	8- and	4-bit	integers,	respectively).
These	data	types	support	the	diverse	needs	of	various	AI	models
Compute	
instructions
Outer-product	(xv<type>ger<rank-𝑘>)	instructions
Prefix	MMA	
instructions
Lane	masking	is	one	of	the	advanced	feature	available	with	the	MMA	architecture.	The	purpose	of	this	features	is	
to	perform	an	operation	of	a	lower	sized	input	or	to	skip	certain	elements.	
Data	Types	&	Compute	Instructions
MMA	Instructions
MMA	xvf32gerpp	instruction	operation MMA	xvi8ger4pp	instruction	operation
Introduction
to
Matrix
Multiplication
Outer
Product
Operation
Outer
Product
using
MMA
• 1 accumulator (4×4 result)
• 2 VSR loads/1 xv_ger instruction
• 2 VSR load/ xv_ger
IBM POWER10
Thank You

Workload Transformation and Innovations in POWER Architecture