HETEROGENEOUS SYSTEMS ARCHITECTURE:THE NEXT AREA OF COMPUTING INNOVATION            CASE STUDY: THE HOLODECK              ...
CHALLENGES TO MOORE’S LAW SCALING                          Area Scaling by Technology Generation                          ...
A PARADIGM SHIFT…                       Microprocessor Advancement CPU                          Single-Core       Multi-Co...
HETEROGENEOUS SYSTEMS ARCHITECTURE MEMORY MODEL                                          Today                     To     ...
ARCHITECTURES – A HISTORICAL PERSPECTIVE  Legacy Processing Era                                      Surround Computing Er...
CHANGING THE THINKING, CHANGING THE GAMEHSA is designed to make the GPU hardwaredirectly accessible to the software, using...
BENEFITS OF HETEROGENEOUS SYSTEM ARCHITECTURE7 | ISSCC Keynote | February 18th, 2013
EFFECTIVE COMPUTE OFFLOAD  APU Accelerated                                            HSA Accelerated Processing Unit  Sof...
BRINGING IT ALL TOGETHER                                                                   MOTION DSP 720P                ...
TODAY’S DISCUSSION: FROM SURROUND COMPUTING TOENABLING THE HOLODECK1. A fully featured Holodeck is   still many years away...
WHAT IS A HOLODECK?11 | ISSCC Keynote | February 18th, 2013
THE HOLODECK FRAMEWORK:AN EVOLUTION OF SURROUND COMPUTING Natural User Interfaces Context Computing 360 Degree Virtual ...
HOLODECK ENABLING TECHNOLOGIES:PROFOUND IMPLICATIONS FOR COMPUTER ARCHITECTUREComputational Photography Delivering seamle...
COMPUTATIONAL PHOTOGRAPHY360 DEGREE VISUAL ENVIRONMENTS, PHOTOSTITCHING, PERIPHERAL VISION AND HSA Mapping real life scen...
DIRECTIONAL AUDIO Couples computationally demanding 3D  audio and spatialization effects with  "always on" background pro...
NATURAL USER INTERFACES  Speech Recognition:       Background processing – echo        cancellation & noise suppression ...
CONTEXT COMPUTINGBIOMETRICS EXAMPLE   • Facial Recognition:         • Face detection (is there a face) –           GPU acc...
AUGMENTED REALITY • Image Registration:       • Relies on robust and fast feature         detection – benefits from       ...
THE WAY FORWARD Many technologies required to  enable our vision    – Heterogeneous engines that      accelerate key clie...
ENABLING TECHNOLOGY DEEP DIVE:ACCELERATING NATURAL USER INTERFACES (HAAR      FACE DETECTION) WITH HETEROGENEOUS          ...
LOOKING FOR FACES IN ALL THE RIGHT PLACES21 | ISSCC Keynote | February 18th, 2013
LOOKING FOR FACES IN ALL THE RIGHT PLACES Quick HD Calculations Search square = 21 x 21 Pixels = 1920 x 1080 = 2,073,600 S...
LOOKING FOR DIFFERENT SIZE FACESBY SCALING THE VIDEO FRAME23 | ISSCC Keynote | February 18th, 2013
LOOKING FOR DIFFERENT SIZE FACESBY SCALING THE VIDEO FRAME   More HD Calculations   70% scaling in H and V   Total Pixels ...
HAAR CASCADE STAGES                                           Feature k                                           Feature ...
22 CASCADE STAGES, EARLY OUT BETWEEN EACH                                                                                 ...
CASCADE DEPTH ANALYSIS Cascade                                                                   25 Depth                 ...
UNBALANCING DUE TO EXITS IN EARLIER CASCADE STAGES   Live Dead        When running on the GPU, we run each search rectang...
PROCESSING TIME/STAGE                                                  A10-4600M (6CU@497Mhz, 4 cores@2700Mhz)            ...
PERFORMANCE CPU-VS-GPU                                                  AMD A10-4600M APU (6CU@497Mhz, 4 cores@2700Mhz)   ...
HAAR SOLUTIONRUN DIFFERENT CASCADES ON GPU AND CPU                                   By seamlessly sharing data between CP...
APPLICATION ACCELERATION USING HSA  Gesture recognition                                                                   ...
HSA EVOLUTION              Llano                              Trinity                Kaveri              Next Gen         ...
HSA PROGRAMMABILITY ADVANTAGE                                            Unified Programming Models              Domain-  ...
CONCLUSION The age of traditional computing is  dead. A paradigm shift in processing has  brought about the Heterogeneou...
ACKNOWLEDGEMENTS Bill Herz Phil Rogers Marty Johnson Chris Hook Sumant Subramanian36 | ISSCC Keynote | February 18th,...
THANK YOU
DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccur...
Upcoming SlideShare
Loading in …5
×

Heterogeneous Systems Architecture: The Next Area of Computing Innovation

39,355 views

Published on

Dr. Lisa Su, Senior Vice President and GM, Global Business Units, AMD keynote from ISSCC on Heterogeneous Systems Architecture: The Next Area of Computing Innovation - Case Study, The Holodeck.

Published in: Technology
2 Comments
8 Likes
Statistics
Notes
  • The semantics of the graphs on Slide 9 are inconsistent: http://jdrch.wordpress.com/2013/02/18/making-a-professional-presentation-avoid-inconsistent-semantics/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • tu es tellement encrée dans mon coeur que quand je ferme les yeux tu es la je te vois mais ce qui me manque c de pouvoir te serrer contre moi .
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
39,355
On SlideShare
0
From Embeds
0
Number of Embeds
21,012
Actions
Shares
0
Downloads
336
Comments
2
Likes
8
Embeds 0
No embeds

No notes for slide

Heterogeneous Systems Architecture: The Next Area of Computing Innovation

  1. 1. HETEROGENEOUS SYSTEMS ARCHITECTURE:THE NEXT AREA OF COMPUTING INNOVATION CASE STUDY: THE HOLODECK Dr. Lisa Su Senior Vice President and GM, Global Business Units, AMD ISSCC Conference February 18, 2013
  2. 2. CHALLENGES TO MOORE’S LAW SCALING Area Scaling by Technology Generation Cost Per Transistor Scaling 1.0 1.0 Normalized Cost/Transistor 0.8 0.8 Normalized Area 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 45nm 40nm 32nm 28nm 20nm 20 45nm 40nm 32nm 28nm 20nm 20 FinFET FinFET  Lithography challenges begin severely limiting area scaling at 20nm node – Fewer 1X metals due to cost – Less aggressive feature scaling due to lithography challenges  Compounded by rapidly increasing lithography costs – 28  20nm transition is inflection point with dual exposure – No cost / transistor crossover for first time at 28  20nm transition2 | ISSCC Keynote | February 18th, 2013
  3. 3. A PARADIGM SHIFT… Microprocessor Advancement CPU Single-Core Multi-Core Heterogeneous Era Era Systems Era High-level Heterogeneous programmable Computing OpenCL/DX driver-based Homogeneous programs Programmability Computing Advancement GPU Graphics driver-based programs Throughput Performance Accelerator3 | ISSCC Keynote | February 18th, 2013
  4. 4. HETEROGENEOUS SYSTEMS ARCHITECTURE MEMORY MODEL Today To 64 bit Yesterday From 32 bit4 | ISSCC Keynote | February 18th, 2013
  5. 5. ARCHITECTURES – A HISTORICAL PERSPECTIVE Legacy Processing Era Surround Computing Era Single Core CPUs Traditionally Optimized Platforms Multi-Core CPUs/GPUs APUs and legacy SOC Heterogeneous Architectures 1981 1990s 2000s 2010s5 | ISSCC Keynote | February 18th, 2013
  6. 6. CHANGING THE THINKING, CHANGING THE GAMEHSA is designed to make the GPU hardwaredirectly accessible to the software, using the highlevel languages programmers already in use onthe CPU C, C++, Java, Python…even JavaScript, HTML5 ISA agnostic – e.g., x86, 64-bit ARM, Radeon, MaliGPU becomes a peer processor to the CPU interms of system integration Full programming language features Shared virtual memory: pointer is a pointer Coherency Context switching HSA Foundation – an industry-wide initiative6 | ISSCC Keynote | February 18th, 2013
  7. 7. BENEFITS OF HETEROGENEOUS SYSTEM ARCHITECTURE7 | ISSCC Keynote | February 18th, 2013
  8. 8. EFFECTIVE COMPUTE OFFLOAD APU Accelerated HSA Accelerated Processing Unit Software Applications Data Parallel Workloads Serial and Task Parallel Workloads Made easy by HSA Unleash the best compute elements depending on task8 | ISSCC Keynote | February 18th, 2013
  9. 9. BRINGING IT ALL TOGETHER MOTION DSP 720P Power Performance 35 W 25 fps 30 W DRAM 20 fps 25 W NB+GPU DRAM 20 W 15 fps NB+GPU 15 W 10 fps 10 W CPU Cores CPU Cores 5 fps 5W 0W 0 fps CPU CPU+GPU CPU CPU+GPU Synergistic use of GPU compute + shared memory >4.0X Better Energy = Efficiency1 lower power and higher performance AMD internal testing: AMD E2-3200 APU (2 cores @ 2400Mhz, GPU:2 CU @ 444Mhz), Windows 7 OS, MotionDSP vReveal Applications 720P MP4 input (http://www.vreveal.com/stabilization)9 | ISSCC Keynote | February 18th, 2013
  10. 10. TODAY’S DISCUSSION: FROM SURROUND COMPUTING TOENABLING THE HOLODECK1. A fully featured Holodeck is still many years away2. Today our discussion will: Establish a Holodeck framework Identify Holodeck enabling technologies Discuss how Heterogeneous Systems Architecture (HSA) accelerates these technologies Undertake an HSA deep dive on one of these enabling technologies Look at how new dedicated processors will enable Holodeck functionality10 | ISSCC Keynote | February 18th, 2013
  11. 11. WHAT IS A HOLODECK?11 | ISSCC Keynote | February 18th, 2013
  12. 12. THE HOLODECK FRAMEWORK:AN EVOLUTION OF SURROUND COMPUTING Natural User Interfaces Context Computing 360 Degree Virtual Environments12 | ISSCC Keynote | February 18th, 2013
  13. 13. HOLODECK ENABLING TECHNOLOGIES:PROFOUND IMPLICATIONS FOR COMPUTER ARCHITECTUREComputational Photography Delivering seamless and immersive video environmentsDirectional Audio Using audio to enhance immersion and realism of our environmentsNatural User Interfaces Enabling realistic, natural human communicationContext Computing Delivering an intuitive understanding of the user’s needs in real timeAugmented Reality Bringing it all together – combining the real and the virtual13 | ISSCC Keynote | February 18th, 2013
  14. 14. COMPUTATIONAL PHOTOGRAPHY360 DEGREE VISUAL ENVIRONMENTS, PHOTOSTITCHING, PERIPHERAL VISION AND HSA Mapping real life scenes through finite images  Photo stitching of tiled environments and perceptual correction  Detect interest points & match features  Projecting geometry with point features using algorithms like RANSAC Image processing to account for curved screen surfaces Modulate brightness to account for peripheral vision HSA presents a unified view of the system with shared memory so CPU and GPU acceleration in the entire process14 | ISSCC Keynote | February 18th, 2013
  15. 15. DIRECTIONAL AUDIO Couples computationally demanding 3D audio and spatialization effects with "always on" background processing like (VAD) Voice Activity Detection  Voice activity detection is best implemented with special audio processors and acceleration techniques  Spatialization effects such as “Convolution Reverb” are best done with GPU acceleration HSA enables seamless integration of CPU and GPU acceleration with other independent accelerators15 | ISSCC Keynote | February 18th, 2013
  16. 16. NATURAL USER INTERFACES  Speech Recognition:  Background processing – echo cancellation & noise suppression  Audio feature extraction  Voice pattern recognition through Markov model or similar algorithm  Gesture Recognition:  Frame preprocessing & filtering  Optical flow or object tracking  Sophisticated computer vision algorithms to delineate the hand or body parts from the background NUI algorithms all benefit from CPU/GPU and audio processors to efficiently perform these functions at the lowest power16 | ISSCC Keynote | February 18th, 2013
  17. 17. CONTEXT COMPUTINGBIOMETRICS EXAMPLE • Facial Recognition: • Face detection (is there a face) – GPU acceleration • Face identification (pattern matching through algorithms like Haar face detection) – CPU and GPU acceleration • Validation through blink detection (make sure it is a real face) – GPU acceleration HSA enables mix and match of the best acceleration for each phase of the process17 | ISSCC Keynote | February 18th, 2013
  18. 18. AUGMENTED REALITY • Image Registration: • Relies on robust and fast feature detection – benefits from CPU/GPU acceleration • Object Tracking: • Relies on “optical flow” algorithm – benefits from CPU/GPU acceleration • Image Composition: • Once information exists from the above, becomes a classic graphics rendering use case The building blocks of HSA enable the augmented reality world.18 | ISSCC Keynote | February 18th, 2013
  19. 19. THE WAY FORWARD Many technologies required to enable our vision – Heterogeneous engines that accelerate key client and server workloads – Datacenters optimized for latency, scalability, and efficiency – Processors optimized for new and emerging workloads – Active research into new algorithms19 | ISSCC Keynote | February 18th, 2013
  20. 20. ENABLING TECHNOLOGY DEEP DIVE:ACCELERATING NATURAL USER INTERFACES (HAAR FACE DETECTION) WITH HETEROGENEOUS SYSTEMS ARCHITECTURE
  21. 21. LOOKING FOR FACES IN ALL THE RIGHT PLACES21 | ISSCC Keynote | February 18th, 2013
  22. 22. LOOKING FOR FACES IN ALL THE RIGHT PLACES Quick HD Calculations Search square = 21 x 21 Pixels = 1920 x 1080 = 2,073,600 Search squares = 1900 x 1060 = ~2 Million22 | ISSCC Keynote | February 18th, 2013
  23. 23. LOOKING FOR DIFFERENT SIZE FACESBY SCALING THE VIDEO FRAME23 | ISSCC Keynote | February 18th, 2013
  24. 24. LOOKING FOR DIFFERENT SIZE FACESBY SCALING THE VIDEO FRAME More HD Calculations 70% scaling in H and V Total Pixels = 4.07 Million Search squares = 3.8 Million24 | ISSCC Keynote | February 18th, 2013
  25. 25. HAAR CASCADE STAGES Feature k Feature l Stage N Feature m Face still Yes possible? Feature p No Feature r Stage N+1 Feature q REJECT FRAME25 | ISSCC Keynote | February 18th, 2013
  26. 26. 22 CASCADE STAGES, EARLY OUT BETWEEN EACH FACE STAGE 1 STAGE 2 STAGE 21 STAGE 22 CONFIRMED NO FACE Final HD Calculations Calculation Rate Search squares = 3.8 million 30 frames/sec = 1.4TCalcs/second Average features per square = 124 60 frames/sec = 2.8TCalcs/second Calculations per feature = 100 Calculations per frame = 47 GCalcs …and this only gets front-facing faces26 | ISSCC Keynote | February 18th, 2013
  27. 27. CASCADE DEPTH ANALYSIS Cascade 25 Depth 20-25 15-20 10-15 5-10 0-5 20 15 10 5 027 | ISSCC Keynote | February 18th, 2013
  28. 28. UNBALANCING DUE TO EXITS IN EARLIER CASCADE STAGES Live Dead  When running on the GPU, we run each search rectangle on a separate work item  Early out algorithms, like HAAR, exhibit divergence between work items – Some work items exit early – Their neighbors continue – SIMD packing suffers as a result28 | ISSCC Keynote | February 18th, 2013
  29. 29. PROCESSING TIME/STAGE A10-4600M (6CU@497Mhz, 4 cores@2700Mhz) 100 GPU CPU 90 80 70 60 Time (ms) 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9-22 Cascade Stage AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 GHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)29 | ISSCC Keynote | February 18th, 2013
  30. 30. PERFORMANCE CPU-VS-GPU AMD A10-4600M APU (6CU@497Mhz, 4 cores@2700Mhz) 12 CPU HSA GPU 10 8 Images/Sec 6 4 2 0 0 1 2 3 4 5 6 7 8 22 Number of Cascade Stages on GPU AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)30 | ISSCC Keynote | February 18th, 2013
  31. 31. HAAR SOLUTIONRUN DIFFERENT CASCADES ON GPU AND CPU By seamlessly sharing data between CPU and GPU, HSA allows the right processor to handle its appropriate workload +2.5x -2.5x INCREASED DECREASED ENERGY PERFORMANCE PER FRAME31 | ISSCC Keynote | February 18th, 2013
  32. 32. APPLICATION ACCELERATION USING HSA Gesture recognition 12x Photo indexing 10x Voice recognition 10x Visual Search 9x Audio search 5x Stereo vision 4x Video stabilization 4x Face detect 2x 0 2 4 6 8 10 12 14 Acceleration vs. CPU AMD estimates Source:AMD Whitepaper, Accelerating Consumer/Prosumer Multimedia with HSA, June 201232 | ISSCC Keynote | February 18th, 2013
  33. 33. HSA EVOLUTION Llano Trinity Kaveri Next Gen Physical Optimized Architectural System Integration Platforms Integration Integration Integrate CPU & GPU GPU Compute C++ Unified Address Space GPU compute in silicon support for CPU and GPU context switch GPU uses pageable Unified Memory GPU graphics User mode scheduling system memory via Controller pre-emption CPU pointers Common Bi-Directional Power Fully coherent memory Manufacturing Mgmt between CPU Quality of Service between CPU & GPU Technology and GPU33 | ISSCC Keynote | February 18th, 2013
  34. 34. HSA PROGRAMMABILITY ADVANTAGE Unified Programming Models Domain- HSA OpenCL, C++ DX11, Specific C, C++, Java … AMP, Java8 … OpenGL … Ext / APIs Foundation HSA Intermediate Language (HSAIL) Compute Acceleration Graphics Acceleration • Works with today’s programming models and languages • Architected to enable CPU like programmability • Promotes development and adoption of extended standards • Write Once Run Anywhere – with Performance34 | ISSCC Keynote | February 18th, 2013
  35. 35. CONCLUSION The age of traditional computing is dead. A paradigm shift in processing has brought about the Heterogeneous Systems Era HSA will enable us to dramatically scale processing power while increasing power efficiency The Holodeck still years away, but HSA and dedicated hardware blocks will accelerate and enable technologies as they emerge35 | ISSCC Keynote | February 18th, 2013
  36. 36. ACKNOWLEDGEMENTS Bill Herz Phil Rogers Marty Johnson Chris Hook Sumant Subramanian36 | ISSCC Keynote | February 18th, 2013
  37. 37. THANK YOU
  38. 38. DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Radeon, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other names and logos are used for informational purposes only and may be trademarks of their respective owners.38 | ISSCC Keynote | February 18th, 2013

×