SlideShare a Scribd company logo
1 of 38
Download to read offline
HETEROGENEOUS SYSTEMS ARCHITECTURE:
THE NEXT AREA OF COMPUTING INNOVATION
            CASE STUDY: THE HOLODECK
                                                   Dr. Lisa Su
               Senior Vice President and GM, Global Business Units,
                                                              AMD

                                                 ISSCC Conference
                                                  February 18, 2013
CHALLENGES TO MOORE’S LAW SCALING

                          Area Scaling by Technology Generation                                              Cost Per Transistor Scaling
                    1.0                                                                                1.0




                                                                          Normalized Cost/Transistor
                    0.8                                                                                0.8
  Normalized Area




                    0.6                                                                                0.6


                    0.4                                                                                0.4



                    0.2                                                                                0.2



                    0.0                                                                                0.0
                           45nm   40nm   32nm   28nm   20nm     20                                            45nm   40nm    32nm   28nm   20nm     20
                                                              FinFET                                                                              FinFET




  Lithography challenges begin severely limiting area scaling at 20nm node
                    – Fewer 1X metals due to cost
                    – Less aggressive feature scaling due to lithography challenges

  Compounded by rapidly increasing lithography costs
                    – 28  20nm transition is inflection point with dual exposure
                    – No cost / transistor crossover for first time at 28  20nm transition


2 | ISSCC Keynote | February 18th, 2013
A PARADIGM SHIFT…

                       Microprocessor Advancement
 CPU




                          Single-Core       Multi-Core   Heterogeneous
                              Era              Era        Systems Era



                                                                           High-level
                                                         Heterogeneous   programmable
                                                          Computing
                                                                          OpenCL/DX
                                                                          driver-based
                                 Homogeneous                               programs
 Programmability




                                  Computing




                                                                                         Advancement
                                                                                             GPU
                                                                            Graphics
                                                                          driver-based
                                                                           programs



                   Throughput Performance                                Accelerator




3 | ISSCC Keynote | February 18th, 2013
HETEROGENEOUS SYSTEMS ARCHITECTURE MEMORY MODEL
                                          Today




                     To
              64 bit


                                          Yesterday




                  From
             32 bit


4 | ISSCC Keynote | February 18th, 2013
ARCHITECTURES – A HISTORICAL PERSPECTIVE

  Legacy Processing Era                                      Surround Computing Era



      Single Core CPUs


      Traditionally Optimized Platforms


                                                  Multi-Core CPUs/GPUs


                                                       APUs and legacy SOC


                                                             Heterogeneous Architectures


  1981                  1990s             2000s                          2010s




5 | ISSCC Keynote | February 18th, 2013
CHANGING THE THINKING, CHANGING THE GAME

HSA is designed to make the GPU hardware
directly accessible to the software, using the high
level languages programmers already in use on
the CPU
 C, C++, Java, Python…even JavaScript, HTML5
 ISA agnostic – e.g., x86, 64-bit ARM, Radeon, Mali

GPU becomes a peer processor to the CPU in
terms of system integration
 Full programming language features
 Shared virtual memory: pointer is a pointer
 Coherency
 Context switching


  HSA Foundation – an
  industry-wide initiative
6 | ISSCC Keynote | February 18th, 2013
BENEFITS OF HETEROGENEOUS SYSTEM ARCHITECTURE




7 | ISSCC Keynote | February 18th, 2013
EFFECTIVE COMPUTE OFFLOAD

  APU Accelerated                                            HSA Accelerated Processing Unit
  Software Applications




                                   Data Parallel Workloads



                                     Serial and Task
                                  Parallel Workloads




                                      Made easy by HSA
                     Unleash the best compute elements depending on task


8 | ISSCC Keynote | February 18th, 2013
BRINGING IT ALL TOGETHER
                                                                   MOTION DSP 720P

                                   Power                                                       Performance
     35 W                                                                        25 fps

     30 W
                         DRAM                                                    20 fps
     25 W
                        NB+GPU                         DRAM
     20 W                                                                        15 fps
                                                      NB+GPU
     15 W
                                                                                 10 fps
     10 W              CPU Cores
                                                     CPU Cores                    5 fps
       5W

       0W                                                                         0 fps
                          CPU                       CPU+GPU                                   CPU            CPU+GPU




             Synergistic use of GPU compute
                   + shared memory                                                        >4.0X Better Energy
                            =                                                                 Efficiency1
           lower power and higher performance


 AMD internal testing: AMD E2-3200 APU (2 cores @ 2400Mhz, GPU:2 CU @ 444Mhz),
 Windows 7 OS, MotionDSP vReveal Applications 720P MP4 input
 (http://www.vreveal.com/stabilization)


9 | ISSCC Keynote | February 18th, 2013
TODAY’S DISCUSSION: FROM SURROUND COMPUTING TO
ENABLING THE HOLODECK

1. A fully featured Holodeck is
   still many years away

2. Today our discussion will:
 Establish a Holodeck framework
 Identify Holodeck enabling technologies
 Discuss how Heterogeneous Systems
  Architecture (HSA) accelerates these
  technologies
 Undertake an HSA deep dive on one of
  these enabling technologies
 Look at how new dedicated processors
  will enable Holodeck functionality


10 | ISSCC Keynote | February 18th, 2013
WHAT IS A HOLODECK?




11 | ISSCC Keynote | February 18th, 2013
THE HOLODECK FRAMEWORK:
AN EVOLUTION OF SURROUND COMPUTING


 Natural User Interfaces
 Context Computing
 360 Degree Virtual
  Environments




12 | ISSCC Keynote | February 18th, 2013
HOLODECK ENABLING TECHNOLOGIES:
PROFOUND IMPLICATIONS FOR COMPUTER ARCHITECTURE

Computational Photography
 Delivering seamless and immersive video environments

Directional Audio
 Using audio to enhance immersion and realism of our environments

Natural User Interfaces
 Enabling realistic, natural human
  communication

Context Computing
 Delivering an intuitive understanding
  of the user’s needs in real time

Augmented Reality
 Bringing it all together – combining the
  real and the virtual

13 | ISSCC Keynote | February 18th, 2013
COMPUTATIONAL PHOTOGRAPHY
360 DEGREE VISUAL ENVIRONMENTS, PHOTOSTITCHING, PERIPHERAL VISION AND HSA

 Mapping real life scenes through finite images
   Photo stitching of tiled environments and
    perceptual correction
   Detect interest points & match features
   Projecting geometry with point features
    using algorithms like RANSAC
 Image processing to account for
  curved screen surfaces
 Modulate brightness to account for
  peripheral vision

 HSA presents a unified view of the
 system with shared memory so CPU and
 GPU acceleration in the entire process



14 | ISSCC Keynote | February 18th, 2013
DIRECTIONAL AUDIO

 Couples computationally demanding 3D
  audio and spatialization effects with
  "always on" background processing like
  (VAD) Voice Activity Detection
    Voice activity detection is best
     implemented with special audio
     processors and acceleration
     techniques
    Spatialization effects such as
     “Convolution Reverb” are best
     done with GPU acceleration



      HSA enables seamless
      integration of CPU and GPU
      acceleration with other
      independent accelerators


15 | ISSCC Keynote | February 18th, 2013
NATURAL USER INTERFACES

  Speech Recognition:
       Background processing – echo
        cancellation & noise suppression
       Audio feature extraction
       Voice pattern recognition through
        Markov model or similar algorithm
   Gesture Recognition:
       Frame preprocessing & filtering
       Optical flow or object tracking
       Sophisticated computer vision
        algorithms to delineate the hand or
        body parts from the background

    NUI algorithms all benefit from
    CPU/GPU and audio processors to
    efficiently perform these functions at
    the lowest power
16 | ISSCC Keynote | February 18th, 2013
CONTEXT COMPUTING
BIOMETRICS EXAMPLE

   • Facial Recognition:
         • Face detection (is there a face) –
           GPU acceleration
         • Face identification (pattern
           matching through algorithms like
           Haar face detection) – CPU and
           GPU acceleration
         • Validation through blink detection
           (make sure it is a real face) –
           GPU acceleration



   HSA enables mix and match of the best
   acceleration for each phase of the
   process




17 | ISSCC Keynote | February 18th, 2013
AUGMENTED REALITY

 • Image Registration:
       • Relies on robust and fast feature
         detection – benefits from
         CPU/GPU acceleration
  • Object Tracking:
       • Relies on “optical flow” algorithm
         – benefits from CPU/GPU
         acceleration
  • Image Composition:
       • Once information exists from the
         above, becomes a classic
         graphics rendering use case


   The building blocks of HSA enable the
   augmented reality world.


18 | ISSCC Keynote | February 18th, 2013
THE WAY FORWARD

 Many technologies required to
  enable our vision
    – Heterogeneous engines that
      accelerate key client and server
      workloads
    – Datacenters optimized for
      latency, scalability, and
      efficiency
    – Processors optimized for new
      and emerging workloads
    – Active research into new
      algorithms




19 | ISSCC Keynote | February 18th, 2013
ENABLING TECHNOLOGY DEEP DIVE:
ACCELERATING NATURAL USER INTERFACES (HAAR
      FACE DETECTION) WITH HETEROGENEOUS
                    SYSTEMS ARCHITECTURE
LOOKING FOR FACES IN ALL THE RIGHT PLACES




21 | ISSCC Keynote | February 18th, 2013
LOOKING FOR FACES IN ALL THE RIGHT PLACES




 Quick HD Calculations
 Search square = 21 x 21
 Pixels = 1920 x 1080 = 2,073,600
 Search squares = 1900 x 1060 = ~2 Million




22 | ISSCC Keynote | February 18th, 2013
LOOKING FOR DIFFERENT SIZE FACES
BY SCALING THE VIDEO FRAME




23 | ISSCC Keynote | February 18th, 2013
LOOKING FOR DIFFERENT SIZE FACES
BY SCALING THE VIDEO FRAME




   More HD Calculations
   70% scaling in H and V
   Total Pixels = 4.07 Million
   Search squares = 3.8 Million




24 | ISSCC Keynote | February 18th, 2013
HAAR CASCADE STAGES



                                           Feature k


                                           Feature l    Stage N


                                           Feature m

                                                                   Face still
                                                          Yes      possible?

                                           Feature p
                                                                      No
                                           Feature r   Stage N+1


                                           Feature q               REJECT
                                                                   FRAME




25 | ISSCC Keynote | February 18th, 2013
22 CASCADE STAGES, EARLY OUT BETWEEN EACH



                                                                                        FACE
      STAGE 1                        STAGE 2         STAGE 21        STAGE 22           CONFIRMED




                                               NO FACE


            Final HD Calculations                          Calculation Rate
            Search squares = 3.8 million                   30 frames/sec = 1.4TCalcs/second
            Average features per square = 124              60 frames/sec = 2.8TCalcs/second
            Calculations per feature = 100
            Calculations per frame = 47 GCalcs             …and this only gets front-facing faces




26 | ISSCC Keynote | February 18th, 2013
CASCADE DEPTH ANALYSIS
 Cascade                                                                   25
 Depth
                            20-25    15-20   10-15   5-10   0-5

                                                                       20



                                                                      15



                                                                      10


                                                                  5


                                                                  0




27 | ISSCC Keynote | February 18th, 2013
UNBALANCING DUE TO EXITS IN EARLIER CASCADE STAGES




   Live
 Dead




        When running on the GPU, we run each search rectangle on a separate
         work item
        Early out algorithms, like HAAR, exhibit divergence between work items
            – Some work items exit early
            – Their neighbors continue
            – SIMD packing suffers as a result

28 | ISSCC Keynote | February 18th, 2013
PROCESSING TIME/STAGE
                                                  A10-4600M (6CU@497Mhz, 4 cores@2700Mhz)

                    100
                                                                                                                            GPU    CPU
                     90


                     80


                     70


                     60
        Time (ms)




                     50


                     40


                     30


                     20


                     10


                      0
                          1               2               3               4             5                  6        7   8         9-22
                                                                                  Cascade Stage



    AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 GHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,
    6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)



29 | ISSCC Keynote | February 18th, 2013
PERFORMANCE CPU-VS-GPU
                                                  AMD A10-4600M APU (6CU@497Mhz, 4 cores@2700Mhz)

                     12
                                                                                                                        CPU       HSA   GPU


                     10




                      8
        Images/Sec




                      6




                      4




                      2




                      0
                          0            1              2             3         4         5         6                 7         8         22
                                                                    Number of Cascade Stages on GPU



    AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,
    6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)



30 | ISSCC Keynote | February 18th, 2013
HAAR SOLUTION
RUN DIFFERENT CASCADES ON GPU AND CPU


                                   By seamlessly sharing data between CPU and GPU,
                                  HSA allows the right processor to handle its appropriate
                                                         workload



                                                +2.5x




                                                                           -2.5x

                                            INCREASED             DECREASED ENERGY
                                           PERFORMANCE               PER FRAME




31 | ISSCC Keynote | February 18th, 2013
APPLICATION ACCELERATION USING HSA




  Gesture recognition                                                                     12x
        Photo indexing                                                            10x
     Voice recognition                                                        10x
         Visual Search                                                     9x
          Audio search                                 5x
          Stereo vision                              4x
    Video stabilization                              4x
            Face detect                     2x
                              0        2         4        6         8        10       12        14
                                                Acceleration vs. CPU


    AMD estimates Source:AMD Whitepaper, Accelerating Consumer/Prosumer Multimedia with HSA, June 2012



32 | ISSCC Keynote | February 18th, 2013
HSA EVOLUTION

              Llano                              Trinity                Kaveri              Next Gen

          Physical                             Optimized            Architectural            System
         Integration                           Platforms             Integration           Integration


    Integrate CPU & GPU                     GPU Compute C++       Unified Address Space    GPU compute
           in silicon                           support             for CPU and GPU        context switch


                                                                   GPU uses pageable
        Unified Memory                                                                     GPU graphics
                                           User mode scheduling    system memory via
          Controller                                                                        pre-emption
                                                                      CPU pointers

           Common                          Bi-Directional Power
                                                                  Fully coherent memory
         Manufacturing                     Mgmt between CPU                               Quality of Service
                                                                   between CPU & GPU
          Technology                             and GPU




33 | ISSCC Keynote | February 18th, 2013
HSA PROGRAMMABILITY ADVANTAGE

                                            Unified Programming Models              Domain-
                HSA                                  OpenCL, C++   DX11,             Specific
                    C, C++, Java …                   AMP, Java8 …    OpenGL …       Ext / APIs
             Foundation
                                           HSA Intermediate Language (HSAIL)
                                                    Compute Acceleration    Graphics Acceleration




          • Works with today’s programming models and languages

          • Architected to enable CPU like programmability

          • Promotes development and adoption of extended standards
             • Write Once Run Anywhere – with Performance


34 | ISSCC Keynote | February 18th, 2013
CONCLUSION


 The age of traditional computing is
  dead.
 A paradigm shift in processing has
  brought about the Heterogeneous
  Systems Era

 HSA will enable us to dramatically
  scale processing power while
  increasing power efficiency
 The Holodeck still years away, but
  HSA and dedicated hardware
  blocks will accelerate and enable
  technologies as they emerge




35 | ISSCC Keynote | February 18th, 2013
ACKNOWLEDGEMENTS


 Bill Herz
 Phil Rogers

 Marty Johnson
 Chris Hook
 Sumant Subramanian




36 | ISSCC Keynote | February 18th, 2013
THANK YOU
DISCLAIMER
 The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and
 typographical errors.

 The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to
 product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences
 between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or
 otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to
 time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

 AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO
 RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

 AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN
 NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES
 ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGES.

 ATTRIBUTION
 © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Radeon, and combinations thereof
 are trademarks of Advanced Micro Devices, Inc. Other names and logos are used for informational purposes only and may
 be trademarks of their respective owners.




38 | ISSCC Keynote | February 18th, 2013

More Related Content

What's hot

Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUAMD
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Enginerajdeep
 
Embedded system custom single purpose processors
Embedded system custom single  purpose processorsEmbedded system custom single  purpose processors
Embedded system custom single purpose processorsAiswaryadevi Jaganmohan
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bitsChiou-Nan Chen
 
SPINS: Security Protocols for Sensor Networks
SPINS: Security Protocols for Sensor NetworksSPINS: Security Protocols for Sensor Networks
SPINS: Security Protocols for Sensor NetworksAbhijeet Awade
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 
CNIT 141: 5. Stream Ciphers
CNIT 141: 5. Stream CiphersCNIT 141: 5. Stream Ciphers
CNIT 141: 5. Stream CiphersSam Bowne
 
High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)HARINATH REDDY
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architectureKhanh Le
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACAPankaj Kumar Jain
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for researchEsteban Hernandez
 

What's hot (20)

Cuda
CudaCuda
Cuda
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Engine
 
Embedded system custom single purpose processors
Embedded system custom single  purpose processorsEmbedded system custom single  purpose processors
Embedded system custom single purpose processors
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
SPINS: Security Protocols for Sensor Networks
SPINS: Security Protocols for Sensor NetworksSPINS: Security Protocols for Sensor Networks
SPINS: Security Protocols for Sensor Networks
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Cuda
CudaCuda
Cuda
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
CNIT 141: 5. Stream Ciphers
CNIT 141: 5. Stream CiphersCNIT 141: 5. Stream Ciphers
CNIT 141: 5. Stream Ciphers
 
GFS & HDFS Introduction
GFS & HDFS IntroductionGFS & HDFS Introduction
GFS & HDFS Introduction
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architecture
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACA
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
 

Viewers also liked

AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD
 
NUMA Performance Considerations in VMware vSphere
NUMA Performance Considerations in VMware vSphereNUMA Performance Considerations in VMware vSphere
NUMA Performance Considerations in VMware vSphereAMD
 
Open compute technology
Open compute technologyOpen compute technology
Open compute technologyAMD
 
AMD - Why, What and How
AMD - Why, What and HowAMD - Why, What and How
AMD - Why, What and HowMike Wilcox
 
AMD Radeon Technology Group Summit
AMD Radeon Technology Group SummitAMD Radeon Technology Group Summit
AMD Radeon Technology Group SummitLow Hong Chuan
 
AMD 2014 Performance Mobile APUs
AMD 2014 Performance Mobile APUsAMD 2014 Performance Mobile APUs
AMD 2014 Performance Mobile APUsAMD
 
2014 AMD Low-Power Mobile APUs
2014 AMD Low-Power Mobile APUs2014 AMD Low-Power Mobile APUs
2014 AMD Low-Power Mobile APUsAMD
 
AMD 2014 Mobility APU Lineup Announcement
AMD 2014 Mobility APU Lineup AnnouncementAMD 2014 Mobility APU Lineup Announcement
AMD 2014 Mobility APU Lineup AnnouncementAMD
 
Wps104 direct x 12 a new meaning for efficiency and performance (presented ...
Wps104   direct x 12 a new meaning for efficiency and performance (presented ...Wps104   direct x 12 a new meaning for efficiency and performance (presented ...
Wps104 direct x 12 a new meaning for efficiency and performance (presented ...Jose Fajardo
 
Apu14 beijing final for show english press
Apu14 beijing final for show english pressApu14 beijing final for show english press
Apu14 beijing final for show english pressLow Hong Chuan
 
Progress Toward Topical Therapy of AMD
Progress Toward Topical Therapy of AMDProgress Toward Topical Therapy of AMD
Progress Toward Topical Therapy of AMDRick Trevino
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Radeon Software Crimson ReLive
Radeon Software Crimson ReLive Radeon Software Crimson ReLive
Radeon Software Crimson ReLive Low Hong Chuan
 
AMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD
 
Age Related Macular Degeneration
Age Related Macular DegenerationAge Related Macular Degeneration
Age Related Macular DegenerationJody Abrams
 
Whats New in AMD - 2015
Whats New in AMD - 2015Whats New in AMD - 2015
Whats New in AMD - 2015Rick Trevino
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Amd Ryzen December 2016 Update
Amd Ryzen December 2016 Update Amd Ryzen December 2016 Update
Amd Ryzen December 2016 Update Low Hong Chuan
 
AMD Analyst Day 2009: Rick Bergman
AMD Analyst Day 2009: Rick BergmanAMD Analyst Day 2009: Rick Bergman
AMD Analyst Day 2009: Rick BergmanAMD
 

Viewers also liked (20)

AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 
Amd processor
Amd processorAmd processor
Amd processor
 
NUMA Performance Considerations in VMware vSphere
NUMA Performance Considerations in VMware vSphereNUMA Performance Considerations in VMware vSphere
NUMA Performance Considerations in VMware vSphere
 
Open compute technology
Open compute technologyOpen compute technology
Open compute technology
 
AMD - Why, What and How
AMD - Why, What and HowAMD - Why, What and How
AMD - Why, What and How
 
AMD Radeon Technology Group Summit
AMD Radeon Technology Group SummitAMD Radeon Technology Group Summit
AMD Radeon Technology Group Summit
 
AMD 2014 Performance Mobile APUs
AMD 2014 Performance Mobile APUsAMD 2014 Performance Mobile APUs
AMD 2014 Performance Mobile APUs
 
2014 AMD Low-Power Mobile APUs
2014 AMD Low-Power Mobile APUs2014 AMD Low-Power Mobile APUs
2014 AMD Low-Power Mobile APUs
 
AMD 2014 Mobility APU Lineup Announcement
AMD 2014 Mobility APU Lineup AnnouncementAMD 2014 Mobility APU Lineup Announcement
AMD 2014 Mobility APU Lineup Announcement
 
Wps104 direct x 12 a new meaning for efficiency and performance (presented ...
Wps104   direct x 12 a new meaning for efficiency and performance (presented ...Wps104   direct x 12 a new meaning for efficiency and performance (presented ...
Wps104 direct x 12 a new meaning for efficiency and performance (presented ...
 
Apu14 beijing final for show english press
Apu14 beijing final for show english pressApu14 beijing final for show english press
Apu14 beijing final for show english press
 
Progress Toward Topical Therapy of AMD
Progress Toward Topical Therapy of AMDProgress Toward Topical Therapy of AMD
Progress Toward Topical Therapy of AMD
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Radeon Software Crimson ReLive
Radeon Software Crimson ReLive Radeon Software Crimson ReLive
Radeon Software Crimson ReLive
 
AMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat PresentationAMD Hot Chips Bulldozer & Bobcat Presentation
AMD Hot Chips Bulldozer & Bobcat Presentation
 
Age Related Macular Degeneration
Age Related Macular DegenerationAge Related Macular Degeneration
Age Related Macular Degeneration
 
Whats New in AMD - 2015
Whats New in AMD - 2015Whats New in AMD - 2015
Whats New in AMD - 2015
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
Amd Ryzen December 2016 Update
Amd Ryzen December 2016 Update Amd Ryzen December 2016 Update
Amd Ryzen December 2016 Update
 
AMD Analyst Day 2009: Rick Bergman
AMD Analyst Day 2009: Rick BergmanAMD Analyst Day 2009: Rick Bergman
AMD Analyst Day 2009: Rick Bergman
 

Similar to HSA Powers the Holodeck: Heterogeneous Computing Enables Immersive Virtual Environments

Fugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedFugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedRCCSRENKEI
 
Sears Point Racetrack
Sears Point RacetrackSears Point Racetrack
Sears Point RacetrackDino, llc
 
Mpc5121 econfs
Mpc5121 econfsMpc5121 econfs
Mpc5121 econfsDino, llc
 
Lecture 15 ryuzo okada - vision processors for embedded computer vision
Lecture 15   ryuzo okada - vision processors for embedded computer visionLecture 15   ryuzo okada - vision processors for embedded computer vision
Lecture 15 ryuzo okada - vision processors for embedded computer visionmustafa sarac
 
CAST BA22 32-bit Processor Design Seminar, 2/1/12
CAST BA22 32-bit Processor Design Seminar, 2/1/12CAST BA22 32-bit Processor Design Seminar, 2/1/12
CAST BA22 32-bit Processor Design Seminar, 2/1/12CAST, Inc.
 
Varkon Semiconductor
Varkon Semiconductor Varkon Semiconductor
Varkon Semiconductor Rajiv Parmar
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoEmbarcados
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorAMD Developer Central
 
Case Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded ProcessorsCase Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded Processorsaccount inactive
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnelukdpe
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Intel® Software
 

Similar to HSA Powers the Holodeck: Heterogeneous Computing Enables Immersive Virtual Environments (20)

Fugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedFugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons Learned
 
Sears Point Racetrack
Sears Point RacetrackSears Point Racetrack
Sears Point Racetrack
 
Mpc5121 econfs
Mpc5121 econfsMpc5121 econfs
Mpc5121 econfs
 
Fo2410191024
Fo2410191024Fo2410191024
Fo2410191024
 
Lecture 15 ryuzo okada - vision processors for embedded computer vision
Lecture 15   ryuzo okada - vision processors for embedded computer visionLecture 15   ryuzo okada - vision processors for embedded computer vision
Lecture 15 ryuzo okada - vision processors for embedded computer vision
 
CAST BA22 32-bit Processor Design Seminar, 2/1/12
CAST BA22 32-bit Processor Design Seminar, 2/1/12CAST BA22 32-bit Processor Design Seminar, 2/1/12
CAST BA22 32-bit Processor Design Seminar, 2/1/12
 
Example Application of GPU
Example Application of GPUExample Application of GPU
Example Application of GPU
 
Varkon Semiconductor
Varkon Semiconductor Varkon Semiconductor
Varkon Semiconductor
 
PG-Strom
PG-StromPG-Strom
PG-Strom
 
ISBI MPI Tutorial
ISBI MPI TutorialISBI MPI Tutorial
ISBI MPI Tutorial
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
 
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael MantorGS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
 
Case Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded ProcessorsCase Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded Processors
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
Trends For Innovating Faster
Trends For Innovating FasterTrends For Innovating Faster
Trends For Innovating Faster
 
Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11Nvidia Cuda Apps Jun27 11
Nvidia Cuda Apps Jun27 11
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 

More from AMD

“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor CoreAMD
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingAMD
 
3D V-Cache
3D V-Cache 3D V-Cache
3D V-Cache AMD
 
AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD
 
AMD EPYC World Records
AMD EPYC World RecordsAMD EPYC World Records
AMD EPYC World RecordsAMD
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUAMD
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreAMD
 
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUsAMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUsAMD
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD
 
AMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingAMD
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance AMD
 
The Path to "Zen 2"
The Path to "Zen 2"The Path to "Zen 2"
The Path to "Zen 2"AMD
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next HorizonAMD
 

More from AMD (20)

“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D Packaging
 
3D V-Cache
3D V-Cache 3D V-Cache
3D V-Cache
 
AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC World Records
AMD EPYC World RecordsAMD EPYC World Records
AMD EPYC World Records
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming Begins
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
 
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUsAMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
AMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and Counting
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World Records
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance Computing
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
The Path to "Zen 2"
The Path to "Zen 2"The Path to "Zen 2"
The Path to "Zen 2"
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 

Recently uploaded

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

HSA Powers the Holodeck: Heterogeneous Computing Enables Immersive Virtual Environments

  • 1. HETEROGENEOUS SYSTEMS ARCHITECTURE: THE NEXT AREA OF COMPUTING INNOVATION CASE STUDY: THE HOLODECK Dr. Lisa Su Senior Vice President and GM, Global Business Units, AMD ISSCC Conference February 18, 2013
  • 2. CHALLENGES TO MOORE’S LAW SCALING Area Scaling by Technology Generation Cost Per Transistor Scaling 1.0 1.0 Normalized Cost/Transistor 0.8 0.8 Normalized Area 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 45nm 40nm 32nm 28nm 20nm 20 45nm 40nm 32nm 28nm 20nm 20 FinFET FinFET  Lithography challenges begin severely limiting area scaling at 20nm node – Fewer 1X metals due to cost – Less aggressive feature scaling due to lithography challenges  Compounded by rapidly increasing lithography costs – 28  20nm transition is inflection point with dual exposure – No cost / transistor crossover for first time at 28  20nm transition 2 | ISSCC Keynote | February 18th, 2013
  • 3. A PARADIGM SHIFT… Microprocessor Advancement CPU Single-Core Multi-Core Heterogeneous Era Era Systems Era High-level Heterogeneous programmable Computing OpenCL/DX driver-based Homogeneous programs Programmability Computing Advancement GPU Graphics driver-based programs Throughput Performance Accelerator 3 | ISSCC Keynote | February 18th, 2013
  • 4. HETEROGENEOUS SYSTEMS ARCHITECTURE MEMORY MODEL Today To 64 bit Yesterday From 32 bit 4 | ISSCC Keynote | February 18th, 2013
  • 5. ARCHITECTURES – A HISTORICAL PERSPECTIVE Legacy Processing Era Surround Computing Era Single Core CPUs Traditionally Optimized Platforms Multi-Core CPUs/GPUs APUs and legacy SOC Heterogeneous Architectures 1981 1990s 2000s 2010s 5 | ISSCC Keynote | February 18th, 2013
  • 6. CHANGING THE THINKING, CHANGING THE GAME HSA is designed to make the GPU hardware directly accessible to the software, using the high level languages programmers already in use on the CPU  C, C++, Java, Python…even JavaScript, HTML5  ISA agnostic – e.g., x86, 64-bit ARM, Radeon, Mali GPU becomes a peer processor to the CPU in terms of system integration  Full programming language features  Shared virtual memory: pointer is a pointer  Coherency  Context switching HSA Foundation – an industry-wide initiative 6 | ISSCC Keynote | February 18th, 2013
  • 7. BENEFITS OF HETEROGENEOUS SYSTEM ARCHITECTURE 7 | ISSCC Keynote | February 18th, 2013
  • 8. EFFECTIVE COMPUTE OFFLOAD APU Accelerated HSA Accelerated Processing Unit Software Applications Data Parallel Workloads Serial and Task Parallel Workloads Made easy by HSA Unleash the best compute elements depending on task 8 | ISSCC Keynote | February 18th, 2013
  • 9. BRINGING IT ALL TOGETHER MOTION DSP 720P Power Performance 35 W 25 fps 30 W DRAM 20 fps 25 W NB+GPU DRAM 20 W 15 fps NB+GPU 15 W 10 fps 10 W CPU Cores CPU Cores 5 fps 5W 0W 0 fps CPU CPU+GPU CPU CPU+GPU Synergistic use of GPU compute + shared memory >4.0X Better Energy = Efficiency1 lower power and higher performance AMD internal testing: AMD E2-3200 APU (2 cores @ 2400Mhz, GPU:2 CU @ 444Mhz), Windows 7 OS, MotionDSP vReveal Applications 720P MP4 input (http://www.vreveal.com/stabilization) 9 | ISSCC Keynote | February 18th, 2013
  • 10. TODAY’S DISCUSSION: FROM SURROUND COMPUTING TO ENABLING THE HOLODECK 1. A fully featured Holodeck is still many years away 2. Today our discussion will:  Establish a Holodeck framework  Identify Holodeck enabling technologies  Discuss how Heterogeneous Systems Architecture (HSA) accelerates these technologies  Undertake an HSA deep dive on one of these enabling technologies  Look at how new dedicated processors will enable Holodeck functionality 10 | ISSCC Keynote | February 18th, 2013
  • 11. WHAT IS A HOLODECK? 11 | ISSCC Keynote | February 18th, 2013
  • 12. THE HOLODECK FRAMEWORK: AN EVOLUTION OF SURROUND COMPUTING  Natural User Interfaces  Context Computing  360 Degree Virtual Environments 12 | ISSCC Keynote | February 18th, 2013
  • 13. HOLODECK ENABLING TECHNOLOGIES: PROFOUND IMPLICATIONS FOR COMPUTER ARCHITECTURE Computational Photography  Delivering seamless and immersive video environments Directional Audio  Using audio to enhance immersion and realism of our environments Natural User Interfaces  Enabling realistic, natural human communication Context Computing  Delivering an intuitive understanding of the user’s needs in real time Augmented Reality  Bringing it all together – combining the real and the virtual 13 | ISSCC Keynote | February 18th, 2013
  • 14. COMPUTATIONAL PHOTOGRAPHY 360 DEGREE VISUAL ENVIRONMENTS, PHOTOSTITCHING, PERIPHERAL VISION AND HSA  Mapping real life scenes through finite images  Photo stitching of tiled environments and perceptual correction  Detect interest points & match features  Projecting geometry with point features using algorithms like RANSAC  Image processing to account for curved screen surfaces  Modulate brightness to account for peripheral vision HSA presents a unified view of the system with shared memory so CPU and GPU acceleration in the entire process 14 | ISSCC Keynote | February 18th, 2013
  • 15. DIRECTIONAL AUDIO  Couples computationally demanding 3D audio and spatialization effects with "always on" background processing like (VAD) Voice Activity Detection  Voice activity detection is best implemented with special audio processors and acceleration techniques  Spatialization effects such as “Convolution Reverb” are best done with GPU acceleration HSA enables seamless integration of CPU and GPU acceleration with other independent accelerators 15 | ISSCC Keynote | February 18th, 2013
  • 16. NATURAL USER INTERFACES  Speech Recognition:  Background processing – echo cancellation & noise suppression  Audio feature extraction  Voice pattern recognition through Markov model or similar algorithm  Gesture Recognition:  Frame preprocessing & filtering  Optical flow or object tracking  Sophisticated computer vision algorithms to delineate the hand or body parts from the background NUI algorithms all benefit from CPU/GPU and audio processors to efficiently perform these functions at the lowest power 16 | ISSCC Keynote | February 18th, 2013
  • 17. CONTEXT COMPUTING BIOMETRICS EXAMPLE • Facial Recognition: • Face detection (is there a face) – GPU acceleration • Face identification (pattern matching through algorithms like Haar face detection) – CPU and GPU acceleration • Validation through blink detection (make sure it is a real face) – GPU acceleration HSA enables mix and match of the best acceleration for each phase of the process 17 | ISSCC Keynote | February 18th, 2013
  • 18. AUGMENTED REALITY • Image Registration: • Relies on robust and fast feature detection – benefits from CPU/GPU acceleration • Object Tracking: • Relies on “optical flow” algorithm – benefits from CPU/GPU acceleration • Image Composition: • Once information exists from the above, becomes a classic graphics rendering use case The building blocks of HSA enable the augmented reality world. 18 | ISSCC Keynote | February 18th, 2013
  • 19. THE WAY FORWARD  Many technologies required to enable our vision – Heterogeneous engines that accelerate key client and server workloads – Datacenters optimized for latency, scalability, and efficiency – Processors optimized for new and emerging workloads – Active research into new algorithms 19 | ISSCC Keynote | February 18th, 2013
  • 20. ENABLING TECHNOLOGY DEEP DIVE: ACCELERATING NATURAL USER INTERFACES (HAAR FACE DETECTION) WITH HETEROGENEOUS SYSTEMS ARCHITECTURE
  • 21. LOOKING FOR FACES IN ALL THE RIGHT PLACES 21 | ISSCC Keynote | February 18th, 2013
  • 22. LOOKING FOR FACES IN ALL THE RIGHT PLACES Quick HD Calculations Search square = 21 x 21 Pixels = 1920 x 1080 = 2,073,600 Search squares = 1900 x 1060 = ~2 Million 22 | ISSCC Keynote | February 18th, 2013
  • 23. LOOKING FOR DIFFERENT SIZE FACES BY SCALING THE VIDEO FRAME 23 | ISSCC Keynote | February 18th, 2013
  • 24. LOOKING FOR DIFFERENT SIZE FACES BY SCALING THE VIDEO FRAME More HD Calculations 70% scaling in H and V Total Pixels = 4.07 Million Search squares = 3.8 Million 24 | ISSCC Keynote | February 18th, 2013
  • 25. HAAR CASCADE STAGES Feature k Feature l Stage N Feature m Face still Yes possible? Feature p No Feature r Stage N+1 Feature q REJECT FRAME 25 | ISSCC Keynote | February 18th, 2013
  • 26. 22 CASCADE STAGES, EARLY OUT BETWEEN EACH FACE STAGE 1 STAGE 2 STAGE 21 STAGE 22 CONFIRMED NO FACE Final HD Calculations Calculation Rate Search squares = 3.8 million 30 frames/sec = 1.4TCalcs/second Average features per square = 124 60 frames/sec = 2.8TCalcs/second Calculations per feature = 100 Calculations per frame = 47 GCalcs …and this only gets front-facing faces 26 | ISSCC Keynote | February 18th, 2013
  • 27. CASCADE DEPTH ANALYSIS Cascade 25 Depth 20-25 15-20 10-15 5-10 0-5 20 15 10 5 0 27 | ISSCC Keynote | February 18th, 2013
  • 28. UNBALANCING DUE TO EXITS IN EARLIER CASCADE STAGES Live Dead  When running on the GPU, we run each search rectangle on a separate work item  Early out algorithms, like HAAR, exhibit divergence between work items – Some work items exit early – Their neighbors continue – SIMD packing suffers as a result 28 | ISSCC Keynote | February 18th, 2013
  • 29. PROCESSING TIME/STAGE A10-4600M (6CU@497Mhz, 4 cores@2700Mhz) 100 GPU CPU 90 80 70 60 Time (ms) 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9-22 Cascade Stage AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 GHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1) 29 | ISSCC Keynote | February 18th, 2013
  • 30. PERFORMANCE CPU-VS-GPU AMD A10-4600M APU (6CU@497Mhz, 4 cores@2700Mhz) 12 CPU HSA GPU 10 8 Images/Sec 6 4 2 0 0 1 2 3 4 5 6 7 8 22 Number of Cascade Stages on GPU AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1) 30 | ISSCC Keynote | February 18th, 2013
  • 31. HAAR SOLUTION RUN DIFFERENT CASCADES ON GPU AND CPU By seamlessly sharing data between CPU and GPU, HSA allows the right processor to handle its appropriate workload +2.5x -2.5x INCREASED DECREASED ENERGY PERFORMANCE PER FRAME 31 | ISSCC Keynote | February 18th, 2013
  • 32. APPLICATION ACCELERATION USING HSA Gesture recognition 12x Photo indexing 10x Voice recognition 10x Visual Search 9x Audio search 5x Stereo vision 4x Video stabilization 4x Face detect 2x 0 2 4 6 8 10 12 14 Acceleration vs. CPU AMD estimates Source:AMD Whitepaper, Accelerating Consumer/Prosumer Multimedia with HSA, June 2012 32 | ISSCC Keynote | February 18th, 2013
  • 33. HSA EVOLUTION Llano Trinity Kaveri Next Gen Physical Optimized Architectural System Integration Platforms Integration Integration Integrate CPU & GPU GPU Compute C++ Unified Address Space GPU compute in silicon support for CPU and GPU context switch GPU uses pageable Unified Memory GPU graphics User mode scheduling system memory via Controller pre-emption CPU pointers Common Bi-Directional Power Fully coherent memory Manufacturing Mgmt between CPU Quality of Service between CPU & GPU Technology and GPU 33 | ISSCC Keynote | February 18th, 2013
  • 34. HSA PROGRAMMABILITY ADVANTAGE Unified Programming Models Domain- HSA OpenCL, C++ DX11, Specific C, C++, Java … AMP, Java8 … OpenGL … Ext / APIs Foundation HSA Intermediate Language (HSAIL) Compute Acceleration Graphics Acceleration • Works with today’s programming models and languages • Architected to enable CPU like programmability • Promotes development and adoption of extended standards • Write Once Run Anywhere – with Performance 34 | ISSCC Keynote | February 18th, 2013
  • 35. CONCLUSION  The age of traditional computing is dead.  A paradigm shift in processing has brought about the Heterogeneous Systems Era  HSA will enable us to dramatically scale processing power while increasing power efficiency  The Holodeck still years away, but HSA and dedicated hardware blocks will accelerate and enable technologies as they emerge 35 | ISSCC Keynote | February 18th, 2013
  • 36. ACKNOWLEDGEMENTS  Bill Herz  Phil Rogers  Marty Johnson  Chris Hook  Sumant Subramanian 36 | ISSCC Keynote | February 18th, 2013
  • 38. DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Radeon, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other names and logos are used for informational purposes only and may be trademarks of their respective owners. 38 | ISSCC Keynote | February 18th, 2013