HSA Powers the Holodeck: Heterogeneous Computing Enables Immersive Virtual Environments

HETEROGENEOUS SYSTEMS ARCHITECTURE:
THE NEXT AREA OF COMPUTING INNOVATION
CASE STUDY: THE HOLODECK
Dr. Lisa Su
Senior Vice President and GM, Global Business Units,
AMD

ISSCC Conference
February 18, 2013

CHALLENGES TO MOORE’S LAW SCALING

Area Scaling by Technology Generation Cost Per Transistor Scaling
1.0 1.0

Normalized Cost/Transistor
0.8 0.8
Normalized Area

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
45nm 40nm 32nm 28nm 20nm 20 45nm 40nm 32nm 28nm 20nm 20
FinFET FinFET

 Lithography challenges begin severely limiting area scaling at 20nm node
– Fewer 1X metals due to cost
– Less aggressive feature scaling due to lithography challenges

 Compounded by rapidly increasing lithography costs
– 28  20nm transition is inflection point with dual exposure
– No cost / transistor crossover for first time at 28  20nm transition

2 | ISSCC Keynote | February 18th, 2013

A PARADIGM SHIFT…

Microprocessor Advancement
CPU

Single-Core Multi-Core Heterogeneous
Era Era Systems Era

High-level
Heterogeneous programmable
Computing
OpenCL/DX
driver-based
Homogeneous programs
Programmability

Computing

Advancement
GPU
Graphics
driver-based
programs

Throughput Performance Accelerator


HETEROGENEOUS SYSTEMS ARCHITECTURE MEMORY MODEL
Today

To
64 bit

Yesterday

From
32 bit


ARCHITECTURES – A HISTORICAL PERSPECTIVE

Legacy Processing Era Surround Computing Era

Single Core CPUs

Traditionally Optimized Platforms

Multi-Core CPUs/GPUs

APUs and legacy SOC

Heterogeneous Architectures

1981 1990s 2000s 2010s


CHANGING THE THINKING, CHANGING THE GAME

HSA is designed to make the GPU hardware
directly accessible to the software, using the high
level languages programmers already in use on
the CPU
 C, C++, Java, Python…even JavaScript, HTML5
 ISA agnostic – e.g., x86, 64-bit ARM, Radeon, Mali

GPU becomes a peer processor to the CPU in
terms of system integration
 Full programming language features
 Shared virtual memory: pointer is a pointer
 Coherency
 Context switching

HSA Foundation – an
industry-wide initiative

BENEFITS OF HETEROGENEOUS SYSTEM ARCHITECTURE


EFFECTIVE COMPUTE OFFLOAD

APU Accelerated HSA Accelerated Processing Unit
Software Applications

Data Parallel Workloads

Serial and Task
Parallel Workloads

Made easy by HSA
Unleash the best compute elements depending on task


BRINGING IT ALL TOGETHER
MOTION DSP 720P

Power Performance
35 W 25 fps

30 W
DRAM 20 fps
25 W
NB+GPU DRAM
20 W 15 fps
NB+GPU
15 W
10 fps
10 W CPU Cores
CPU Cores 5 fps
5W

0W 0 fps
CPU CPU+GPU CPU CPU+GPU

Synergistic use of GPU compute
+ shared memory >4.0X Better Energy
= Efficiency1
lower power and higher performance

AMD internal testing: AMD E2-3200 APU (2 cores @ 2400Mhz, GPU:2 CU @ 444Mhz),
Windows 7 OS, MotionDSP vReveal Applications 720P MP4 input
(http://www.vreveal.com/stabilization)


TODAY’S DISCUSSION: FROM SURROUND COMPUTING TO
ENABLING THE HOLODECK

1. A fully featured Holodeck is
still many years away

2. Today our discussion will:
 Establish a Holodeck framework
 Identify Holodeck enabling technologies
 Discuss how Heterogeneous Systems
Architecture (HSA) accelerates these
technologies
 Undertake an HSA deep dive on one of
these enabling technologies
 Look at how new dedicated processors
will enable Holodeck functionality


WHAT IS A HOLODECK?


THE HOLODECK FRAMEWORK:
AN EVOLUTION OF SURROUND COMPUTING

 Natural User Interfaces
 Context Computing
 360 Degree Virtual
Environments


HOLODECK ENABLING TECHNOLOGIES:
PROFOUND IMPLICATIONS FOR COMPUTER ARCHITECTURE

Computational Photography
 Delivering seamless and immersive video environments

Directional Audio
 Using audio to enhance immersion and realism of our environments

Natural User Interfaces
 Enabling realistic, natural human
communication

Context Computing
 Delivering an intuitive understanding
of the user’s needs in real time

Augmented Reality
 Bringing it all together – combining the
real and the virtual


COMPUTATIONAL PHOTOGRAPHY
360 DEGREE VISUAL ENVIRONMENTS, PHOTOSTITCHING, PERIPHERAL VISION AND HSA

 Mapping real life scenes through finite images
 Photo stitching of tiled environments and
perceptual correction
 Detect interest points & match features
 Projecting geometry with point features
using algorithms like RANSAC
 Image processing to account for
curved screen surfaces
 Modulate brightness to account for
peripheral vision

HSA presents a unified view of the
system with shared memory so CPU and
GPU acceleration in the entire process


DIRECTIONAL AUDIO

 Couples computationally demanding 3D
audio and spatialization effects with
"always on" background processing like
(VAD) Voice Activity Detection
 Voice activity detection is best
implemented with special audio
processors and acceleration
techniques
 Spatialization effects such as
“Convolution Reverb” are best
done with GPU acceleration

HSA enables seamless
integration of CPU and GPU
acceleration with other
independent accelerators


NATURAL USER INTERFACES

 Speech Recognition:
 Background processing – echo
cancellation & noise suppression
 Audio feature extraction
 Voice pattern recognition through
Markov model or similar algorithm
 Gesture Recognition:
 Frame preprocessing & filtering
 Optical flow or object tracking
 Sophisticated computer vision
algorithms to delineate the hand or
body parts from the background

NUI algorithms all benefit from
CPU/GPU and audio processors to
efficiently perform these functions at
the lowest power

CONTEXT COMPUTING
BIOMETRICS EXAMPLE

• Facial Recognition:
• Face detection (is there a face) –
GPU acceleration
• Face identification (pattern
matching through algorithms like
Haar face detection) – CPU and
GPU acceleration
• Validation through blink detection
(make sure it is a real face) –
GPU acceleration

HSA enables mix and match of the best
acceleration for each phase of the
process


AUGMENTED REALITY

• Image Registration:
• Relies on robust and fast feature
detection – benefits from
CPU/GPU acceleration
• Object Tracking:
• Relies on “optical flow” algorithm
– benefits from CPU/GPU
acceleration
• Image Composition:
• Once information exists from the
above, becomes a classic
graphics rendering use case

The building blocks of HSA enable the
augmented reality world.


THE WAY FORWARD

 Many technologies required to
enable our vision
– Heterogeneous engines that
accelerate key client and server
workloads
– Datacenters optimized for
latency, scalability, and
efficiency
– Processors optimized for new
and emerging workloads
– Active research into new
algorithms


ENABLING TECHNOLOGY DEEP DIVE:
ACCELERATING NATURAL USER INTERFACES (HAAR
FACE DETECTION) WITH HETEROGENEOUS
SYSTEMS ARCHITECTURE

LOOKING FOR FACES IN ALL THE RIGHT PLACES


LOOKING FOR FACES IN ALL THE RIGHT PLACES

Quick HD Calculations
Search square = 21 x 21
Pixels = 1920 x 1080 = 2,073,600
Search squares = 1900 x 1060 = ~2 Million


LOOKING FOR DIFFERENT SIZE FACES
BY SCALING THE VIDEO FRAME


LOOKING FOR DIFFERENT SIZE FACES
BY SCALING THE VIDEO FRAME

More HD Calculations
70% scaling in H and V
Total Pixels = 4.07 Million
Search squares = 3.8 Million


HAAR CASCADE STAGES

Feature k

Feature l Stage N

Feature m

Face still
Yes possible?

Feature p
No
Feature r Stage N+1

Feature q REJECT
FRAME


22 CASCADE STAGES, EARLY OUT BETWEEN EACH

FACE
STAGE 1 STAGE 2 STAGE 21 STAGE 22 CONFIRMED

NO FACE

Final HD Calculations Calculation Rate
Search squares = 3.8 million 30 frames/sec = 1.4TCalcs/second
Average features per square = 124 60 frames/sec = 2.8TCalcs/second
Calculations per feature = 100
Calculations per frame = 47 GCalcs …and this only gets front-facing faces


CASCADE DEPTH ANALYSIS
Cascade 25
Depth
20-25 15-20 10-15 5-10 0-5

20

15

10

5

0


UNBALANCING DUE TO EXITS IN EARLIER CASCADE STAGES

Live
Dead

 When running on the GPU, we run each search rectangle on a separate
work item
 Early out algorithms, like HAAR, exhibit divergence between work items
– Some work items exit early
– Their neighbors continue
– SIMD packing suffers as a result


PROCESSING TIME/STAGE
A10-4600M (6CU@497Mhz, 4 cores@2700Mhz)

100
GPU CPU
90

80

70

60
Time (ms)

50

40

30

20

10

0
1 2 3 4 5 6 7 8 9-22
Cascade Stage

AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 GHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,
6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)


PERFORMANCE CPU-VS-GPU
AMD A10-4600M APU (6CU@497Mhz, 4 cores@2700Mhz)

12
CPU HSA GPU

10

8
Images/Sec

6

4

2

0
0 1 2 3 4 5 6 7 8 22
Number of Cascade Stages on GPU

AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,
6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)


HAAR SOLUTION
RUN DIFFERENT CASCADES ON GPU AND CPU

By seamlessly sharing data between CPU and GPU,
HSA allows the right processor to handle its appropriate
workload

+2.5x

-2.5x

INCREASED DECREASED ENERGY
PERFORMANCE PER FRAME


APPLICATION ACCELERATION USING HSA

Gesture recognition 12x
Photo indexing 10x
Voice recognition 10x
Visual Search 9x
Audio search 5x
Stereo vision 4x
Video stabilization 4x
Face detect 2x
0 2 4 6 8 10 12 14
Acceleration vs. CPU

AMD estimates Source:AMD Whitepaper, Accelerating Consumer/Prosumer Multimedia with HSA, June 2012


HSA EVOLUTION

Llano Trinity Kaveri Next Gen

Physical Optimized Architectural System
Integration Platforms Integration Integration

Integrate CPU & GPU GPU Compute C++ Unified Address Space GPU compute
in silicon support for CPU and GPU context switch

GPU uses pageable
Unified Memory GPU graphics
User mode scheduling system memory via
Controller pre-emption
CPU pointers

Common Bi-Directional Power
Fully coherent memory
Manufacturing Mgmt between CPU Quality of Service
between CPU & GPU
Technology and GPU


HSA PROGRAMMABILITY ADVANTAGE

Unified Programming Models Domain-
HSA OpenCL, C++ DX11, Specific
C, C++, Java … AMP, Java8 … OpenGL … Ext / APIs
Foundation
HSA Intermediate Language (HSAIL)
Compute Acceleration Graphics Acceleration

• Works with today’s programming models and languages

• Architected to enable CPU like programmability

• Promotes development and adoption of extended standards
• Write Once Run Anywhere – with Performance


CONCLUSION

 The age of traditional computing is
dead.
 A paradigm shift in processing has
brought about the Heterogeneous
Systems Era

 HSA will enable us to dramatically
scale processing power while
increasing power efficiency
 The Holodeck still years away, but
HSA and dedicated hardware
blocks will accelerate and enable
technologies as they emerge


ACKNOWLEDGEMENTS

 Bill Herz
 Phil Rogers

 Marty Johnson
 Chris Hook
 Sumant Subramanian


DISCLAIMER
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and
typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to
product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences
between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or
otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to
time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO
RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN
NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES
ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.

ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Radeon, and combinations thereof
are trademarks of Advanced Micro Devices, Inc. Other names and logos are used for informational purposes only and may
be trademarks of their respective owners.


HSA Powers the Holodeck: Heterogeneous Computing Enables Immersive Virtual Environments

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to HSA Powers the Holodeck: Heterogeneous Computing Enables Immersive Virtual Environments

Similar to HSA Powers the Holodeck: Heterogeneous Computing Enables Immersive Virtual Environments (20)

More from AMD

More from AMD (20)

Recently uploaded

Recently uploaded (20)

HSA Powers the Holodeck: Heterogeneous Computing Enables Immersive Virtual Environments