SlideShare a Scribd company logo
1 of 41
HETEROGENEOUS SYSTEM
ARCHITECTURE OVERVIEW
HOT CHIPS TUTORIAL - AUGUST 2013
PHIL ROGERS
HSA FOUNDATION PRESIDENT
AMD CORPORATE FELLOW
HSA FOUNDATION
 Founded in June 2012
 Developing a new platform for
heterogeneous systems
 www.hsafoundation.com
 Specifications under development
in working groups
 Our first specification, HSA
Programmers Reference Manual
is already published and available
on our web site
 Additional specifications for
System Architecture, Runtime
Software and Tools are in process
© Copyright 2012 HSA Foundation. All Rights Reserved. 2
HSA FOUNDATION MEMBERSHIP —
AUGUST 2013
© Copyright 2012 HSA Foundation. All Rights Reserved. 3
Founders
Promoters
Supporters
Contributors
Academic
Associates
SOCS HAVE PROLIFERATED —
MAKE THEM BETTER
 SOCs have arrived and are a tremendous
advance over previous platforms
 SOCs combine CPU cores, GPU cores and
other accelerators, with high bandwidth access
to memory
 How do we make them even better?
 Easier to program
 Easier to optimize
 Higher performance
 Lower power
 HSA unites accelerators architecturally
 Early focus on the GPU compute accelerator,
but HSA goes well beyond the GPU
© Copyright 2012 HSA Foundation. All Rights Reserved. 4
INFLECTIONS IN PROCESSOR DESIGN
© Copyright 2012 HSA Foundation. All Rights Reserved. 5
?
Single-thread
Performance
Time
we are
here
Enabled by:
 Moore‘s
Law
 Voltage
Scaling
Constrained by:
Power
Complexity
Single-Core Era
ModernApplication
Performance
Time (Data-parallel exploitation)
we are
here
Heterogeneous
Systems Era
Enabled by:
 Abundant data
parallelism
 Power efficient
GPUs
Temporarily
Constrained by:
Programming
models
Comm.overhead
Throughput
Performance
Time (# of processors)
we are
here
Enabled by:
 Moore‘s Law
 SMP
architecture
Constrained by:
Power
Parallel SW
Scalability
Multi-Core Era
Assembly  C/C++  Java … pthreads  OpenMP / TBB …
Shader  CUDA OpenCL
 C++ and Java
HIGH LEVEL FEATURES OF HSA
 Features currently being defined in the HSA Working Groups**
 Unified addressing across all processors
 Operation into pageable system memory
 Full memory coherency
 User mode dispatch
 Architected queuing language
 High level language support for GPU compute processors
 Preemption and context switching
© Copyright 2012 HSA Foundation. All Rights Reserved. 6
** All features subject to change, pending completion and ratification of specifications in the HSA Working Groups
HSA — AN OPEN PLATFORM
 Open Architecture, membership open to all
 HSA Programmers Reference Manual
 HSA System Architecture
 HSA Runtime
 Delivered via royalty free standards
 Royalty Free IP, Specifications and APIs
 ISA agnostic for both CPU and GPU
 Membership from all areas of computing
 Hardware companies
 Operating Systems
 Tools and Middleware
© Copyright 2012 HSA Foundation. All Rights Reserved. 7
HSA INTERMEDIATE LAYER — HSAIL
 HSAIL is a virtual ISA for parallel programs
 Finalized to ISA by a JIT compiler or ―Finalizer‖
 ISA independent by design for CPU & GPU
 Explicitly parallel
 Designed for data parallel programming
 Support for exceptions, virtual functions,
and other high level language features
 Lower level than OpenCL SPIR
 Fits naturally in the OpenCL compilation stack
 Suitable to support additional high level languages and programming models:
 Java, C++, OpenMP, etc
© Copyright 2012 HSA Foundation. All Rights Reserved. 8
HSA MEMORY MODEL
 Defines visibility ordering between all threads
in the HSA System
 Designed to be compatible with
C++11, Java, OpenCL and .NET Memory
Models
 Relaxed consistency memory model for
parallel compute performance
 Visibility controlled by:
 Load.Acquire
 Store.Release
 Barriers
© Copyright 2012 HSA Foundation. All Rights Reserved. 9
HSA QUEUING MODEL
 User mode queuing for low latency dispatch
 Application dispatches directly
 No OS or driver in the dispatch path
 Architected Queuing Layer
 Single compute dispatch path for all hardware
 No driver translation, direct to hardware
 Allows for dispatch to queue from any agent
 CPU or GPU
 GPU self enqueue enables lots of solutions
 Recursion
 Tree traversal
 Wavefront reforming
© Copyright 2012 HSA Foundation. All Rights Reserved. 10
HSA SOFTWARE
Hardware - APUs, CPUs, GPUs
Driver Stack
Domain Libraries
OpenCL™, DX Runtimes,
User Mode Drivers
Graphics Kernel Mode Driver
Apps
Apps
Apps
Apps
Apps
Apps
HSA Software Stack
Task Queuing
Libraries
HSA Domain Libraries,
OpenCL ™ 2.x Runtime
HSA Kernel
Mode Driver
HSA Runtime
HSA JIT
Apps
Apps
Apps
Apps
Apps
Apps
User mode component Kernel mode component Components contributed by third parties
TITLE
© Copyright 2012 HSA Foundation. All Rights Reserved. 12
OPENCL™ AND HSA
 HSA is an optimized platform architecture
for OpenCL™
 Not an alternative to OpenCL™
 OpenCL™ on HSA will benefit from
 Avoidance of wasteful copies
 Low latency dispatch
 Improved memory model
 Pointers shared between CPU and GPU
 OpenCL™ 2.0 shows considerable alignment
with HSA
 Many HSA member companies are also active
with Khronos in the OpenCL™ working group
© Copyright 2012 HSA Foundation. All Rights Reserved. 13
BOLT — PARALLEL PRIMITIVES
LIBRARY FOR HSA
 Easily leverage the inherent power efficiency of GPU computing
 Common routines such as scan, sort, reduce, transform
 More advanced routines like heterogeneous pipelines
 Bolt library works with OpenCL and C++ AMP
 Enjoy the unique advantages of the HSA platform
 Move the computation not the data
 Finally a single source code base for the CPU and GPU!
 Developers can focus on core algorithms
 Bolt version 1.0 for OpenCL and C++ AMP is available now at
https://github.com/HSA-Libraries/Bolt
© Copyright 2012 HSA Foundation. All Rights Reserved. 14
HSA OPEN SOURCE SOFTWARE
 HSA will feature an open source linux execution and compilation stack
 Allows a single shared implementation for many components
 Enables university research and collaboration in all areas
 Because it‘s the right thing to do
© Copyright 2012 HSA Foundation. All Rights Reserved. 15
Component Name IHV or Common Rationale
HSA Bolt Library Common Enable understanding and debug
HSAIL Code Generator Common Enable research
LLVM Contributions Common Industry and academic collaboration
HSAIL Assembler Common Enable understanding and debug
HSA Runtime Common Standardize on a single runtime
HSA Finalizer IHV Enable research and debug
HSA Kernel Driver IHV For inclusion in linux distros
ACCELERATING JAVA
GOING BEYOND NATIVE LANGUAGES
JAVA ENABLEMENT BY APARAPI
© Copyright 2012 HSA Foundation. All Rights Reserved. 17
Developer creates
Java™ source
Source compiled to class files
(bytecode) using standard compiler
Aparapi = Runtime capable of converting Java™ bytecode to OpenCL™
For execution on any
OpenCL™ 1.1+ capable device
OR execute via a thread pool if
OpenCL™ is not available
JAVA HETEROGENEOUS
ENABLEMENT ROADMAP
CPU ISA GPU ISA
JVM
Application
APARAPI
GPUCPU
OpenCL™
© Copyright 2012 HSA Foundation. All Rights Reserved. 18
CPU ISA GPU ISA
JVM
Application
APARAPI
HSA CPUHSA CPU
HSA Finalizer
HSAIL
CPU ISA GPU ISA
JVM
Application
APARAPI
HSA CPUHSA CPU
HSA Finalizer
HSAIL
HSA Runtime
LLVM Optimizer
IR
CPU ISA GPU ISA
Sumatra Enabled JVM
Application
HSA CPUHSA CPU
HSA Finalizer
HSAIL
SUMATRA PROJECT OVERVIEW
 AMD/Oracle sponsored Open Source (OpenJDK) project
 Targeted at Java 9 (2015 release)
 Allows developers to efficiently represent data parallel
algorithms in Java
 Sumatra ‗repurposes‘ Java 8‘s multi-core Stream/Lambda
API‘s to enable both CPU or GPU computing
 At runtime, Sumatra enabled Java Virtual Machine (JVM)
will dispatch ‗selected‘ constructs to available HSA
enabled devices
 Developers of Java libraries are already refactoring their
library code to use these same constructs
 So developers using existing libraries should see GPU
acceleration without any code changes
 http://openjdk.java.net/projects/sumatra/
 https://wikis.oracle.com/display/HotSpotInternals/Sumatra
 http://mail.openjdk.java.net/pipermail/sumatra-dev/
© Copyright 2012 HSA Foundation. All Rights Reserved. 19
Application.java
Java Compiler
GPUCPU
Sumatra Enabled JVM
Application
GPU ISA
Lambda/Stream API
CPU ISA
Application.class
Development
Runtime
HSA Finalizer
EXAMPLE WORKLOADS
HAAR FACE DETECTION
CORNERSTONE TECHNOLOGY
FOR COMPUTERVISION
LOOKING FOR FACES IN ALL
THE RIGHT PLACES
Quick HD Calculations
Search square = 21 x 21
Pixels = 1920 x 1080 = 2,073,600
Search squares = 1900 x 1060 = ~2 Million
© Copyright 2012 HSA Foundation. All Rights Reserved. 22
LOOKING FOR DIFFERENT SIZE FACES
— BY SCALING THE VIDEO FRAME
© Copyright 2012 HSA Foundation. All Rights Reserved. 23
More HD Calculations
70% scaling in H and V
Total Pixels = 4.07 Million
Search squares = 3.8 Million
HAAR CASCADE STAGES
© Copyright 2012 HSA Foundation. All Rights Reserved. 24
Feature l
Feature m
Feature p
Feature r
Feature q
Feature k
Stage N
Stage N+1
Face still
possible?Yes
No
REJECT
FRAME
22 CASCADE STAGES, EARLY OUT
BETWEEN EACH
© Copyright 2012 HSA Foundation. All Rights Reserved. 25
STAGE 22STAGE 21STAGE 2STAGE 1
NO FACE
FACE
CONFIRMED
Final HD Calculations
Search squares = 3.8 million
Average features per square = 124
Calculations per feature = 100
Calculations per frame = 47 GCalcs
Calculation Rate
30 frames/sec = 1.4TCalcs/second
60 frames/sec = 2.8TCalcs/second
… and this only gets front-facing faces
UNBALANCING DUE TO EARLY EXITS
 When running on the GPU, we run each search rectangle on a separate
work item
 Early out algorithms, like HAAR, exhibit divergence between work items
 Some work items exit early
 Their neighbors continue
 SIMD packing suffers as a result
© Copyright 2012 HSA Foundation. All Rights Reserved. 26
Live
Dead
CASCADE DEPTH ANALYSIS
© Copyright 2012 HSA Foundation. All Rights Reserved. 27
0
5
10
15
20
25
Cascade Depth
20-25
15-20
10-15
5-10
0-5
PROCESSING TIME/STAGE
© Copyright 2012 HSA Foundation. All Rights Reserved. 28
AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,
6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9-22
Time(ms)
“Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz)
GPU
CPU
PERFORMANCE CPU-VS-GPU
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 22
Images/Sec
Number of Cascade Stages on GPU
“Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz)
CPU
HSA
GPU
© Copyright 2012 HSA Foundation. All Rights Reserved. 29
AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,
6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)
HAAR SOLUTION — RUN DIFFERENT
CASCADES ON GPU AND CPU
© Copyright 2012 HSA Foundation. All Rights Reserved. 30
+2.5x
-2.5x
INCREASED
PERFORMANCE
DECREASED ENERGY
PER FRAME
By seamlessly sharing data between CPU and GPU,
allows the right processor to handle its appropriate workload
ACCELERATING SUFFIX ARRAY
CONSTRUCTION
CLOUD SERVER WORKLOAD
SUFFIX ARRAYS
 Suffix Arrays are a fundamental data structure
 Designed for efficient searching of a large text
 Quickly locate every occurrence of a substring S in a text T
 Suffix Arrays are used to accelerate in-memory cloud workloads
 Full text index search
 Lossless data compression
 Bio-informatics
© Copyright 2012 HSA Foundation. All Rights Reserved. 32
ACCELERATED SUFFIX ARRAY
CONSTRUCTION ON HSA
© Copyright 2012 HSA Foundation. All Rights Reserved. 33
M. Deo, ―Parallel Suffix Array Construction and Least Common Prefix for the GPU‖, Submitted to ‖Principles and Practice of Parallel Programming, (PPoPP‘13)‖ February 2013.
AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM
By offloading data parallel computations to
GPU, HSA increases performance and
reduces energy for Suffix Array Construction
versus Single Threaded CPU.
By efficiently sharing data between CPU and
GPU, HSA lets us move compute to data
without penalty of intermediate copies.
+5.8x
-5x
INCREASED
PERFORMANCE
DECREASED
ENERGYMerge Sort::GPU
Radix Sort::GPU
Compute SA::CPU
Lexical Rank::CPU
Radix Sort::GPU
Skew Algorithm for Compute SA
GAMEPLAY RIGID BODY PHYSICS
RIGID BODY PHYSICS SIMULATION
 Rigid-Body Physics Simulation is:
 A way to animate and interact with objects, widely used in games and movie production
 Used to drive game play and for visual effects (eye candy)
 Physics Simulation is used in many of today‘s software:
 Middleware Physics engines such as Bullet, Havok, PhysX
 Games ranging from Angry Birds and Cut the Rope to Tomb Raider and Crysis 3
 3D authoring tools such as Autodesk Maya, Unity 3D, Houdini, Cinema 4D, Lightwave
 Industrial applications such as Siemens NX8 Mechatronics Concept Design
 Medical applications such as surgery trainers
 Robotics simulation
 But GPU-accelerated rigid-body physics is not used in game play —
only in effects
© Copyright 2012 HSA Foundation. All Rights Reserved. 35
RIGID BODY PHYSICS — ALGORITHM
 Find potential interacting object ―pairs‖ using bounding shape approximations.
 Perform full overlap testing between potentially interacting pairs
 Compute exact contact information for a various shape types
 Compute constraint forces for natural motion and stable stacking
© Copyright 2012 HSA Foundation. All Rights Reserved.
Broad-Phase
Collision
Detection
Setup
constraints
Solve
constraints
Compute
contact
points
A B0 B1 C0 C1 D1 D1 A
1 1 2 2 3 3 4 4
B D
A
1
2 3
4
Mid-Phase
Collision
Detection
Narrow-Phase
Collision
Detection
36
RIGID BODY PHYSICS —
CHALLENGES & SOLUTIONS
Implementation Challenges
 Game engine and Physics engine
need to interact synchronously
during simulation
 The set of pairs can be huge and
changes from frame to frame
 Thousands to Millions for any
given frame
 Narrow-phase algorithms cause
thread divergence
Benefits of HSA
 Fast CPU round-trips
 User mode dispatch
 Unified Addressing,
 Pageable memory,
 Coherency
 Supports as large a pair list as CPU
 Entire memory space
 Dynamic memory allocation
 Improved handling of divergence
 GPU enqueue
© Copyright 2012 HSA Foundation. All Rights Reserved. 37
EASE OF PROGRAMMING
CODE COMPLEXITY VS. PERFORMANCE
LINES-OF-CODE AND PERFORMANCE FOR
DIFFERENT PROGRAMMING MODELS
AMD A10-5800K APU with Radeon™ HD Graphics – CPU: 4 cores, 3800MHz (4200MHz Turbo); GPU: AMD Radeon HD 7660D, 6 compute units, 800MHz; 4GB RAM.
Software – Windows 7 Professional SP1 (64-bit OS); AMD OpenCL™ 1.2 AMD-APP (937.2); Microsoft Visual Studio 11 Beta
0
50
100
150
200
250
300
350
LOC
Copy-back Algorithm Launch Copy Compile Init Performance
Serial CPU TBB Intrinsics+TBB OpenCL™-C OpenCL™ -C++ C++ AMP HSA Bolt
Performance
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0Copy-back
Algorithm
Launch
Copy
Compile
Init.
Copy-back
Algorithm
Launch
Copy
Compile
Copy-back
Algorithm
Launch
Algorithm
Launch
Algorithm
Launch
Algorithm
Launch
Algorithm
Launch
(Exemplary ISV ―Hessian‖ Kernel)
© Copyright 2012 HSA Foundation. All Rights Reserved. 39
© Copyright 2012 HSA Foundation. All Rights Reserved. 40
THE HSA FUTURE
 Architected heterogeneous processing on the SOC
 Programming of accelerators becomes much easier
 Accelerated software that runs across multiple hardware vendors
 Scalability from smart phones to super computers on a common architecture
 GPU acceleration of parallel processing is the initial target, with DSPs
and other accelerators coming to the HSA system architecture model
 Heterogeneous software ecosystem evolves at a much faster pace
 Lower power, more capable devices in your hand, on the wall, in the cloud
THANK YOU

More Related Content

What's hot

KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUHSA Foundation
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsHSA Foundation
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory ModelHSA Foundation
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - IntroHSA Foundation
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILHSA Foundation
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime HSA Foundation
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Foundation
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - RuntimeHSA Foundation
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelHSA Foundation
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...HSA Foundation
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 
AMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overviewinside-BigData.com
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareHSA Foundation
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerAMD Developer Central
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)dibyendu.das
 
HC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerHC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerAMD Developer Central
 

What's hot (20)

KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
ISCA Final Presentation - Applications
ISCA Final Presentation - ApplicationsISCA Final Presentation - Applications
ISCA Final Presentation - Applications
 
ISCA final presentation - Memory Model
ISCA final presentation - Memory ModelISCA final presentation - Memory Model
ISCA final presentation - Memory Model
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - Intro
 
ISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAILISCA Final Presentation - HSAIL
ISCA Final Presentation - HSAIL
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - Runtime
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
ISCA final presentation - Queuing Model
ISCA final presentation - Queuing ModelISCA final presentation - Queuing Model
ISCA final presentation - Queuing Model
 
HSA Features
HSA FeaturesHSA Features
HSA Features
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
AMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory AccessAMD Heterogeneous Uniform Memory Access
AMD Heterogeneous Uniform Memory Access
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshare
 
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor MillerPL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
 
Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)Guide to heterogeneous system architecture (hsa)
Guide to heterogeneous system architecture (hsa)
 
HC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerHC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
HC-4018, How to make the most of GPU accessible memory, by Paul Blinzer
 

Similar to HSA Introduction Hot Chips 2013

"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...Edge AI and Vision Alliance
 
Cloudera - Amr Awadallah - Hadoop World 2010
Cloudera - Amr Awadallah - Hadoop World 2010Cloudera - Amr Awadallah - Hadoop World 2010
Cloudera - Amr Awadallah - Hadoop World 2010Cloudera, Inc.
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattAMD Developer Central
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systemsdairsie
 
SAP HANA Native Application Development
SAP HANA Native Application DevelopmentSAP HANA Native Application Development
SAP HANA Native Application DevelopmentSAP Technology
 
PaaS on Openstack
PaaS on OpenstackPaaS on Openstack
PaaS on OpenstackOpen Stack
 
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
Apache Hive: From MapReduce to Enterprise-grade Big Data WarehousingApache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousingc-bslim
 
getting started with e2studio
getting started with e2studiogetting started with e2studio
getting started with e2studioZiyuan Chen
 
H2O platform workshop
H2O platform workshopH2O platform workshop
H2O platform workshopShareThis
 
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)DataWorks Summit
 
Ceph Performance on OpenStack - Barcelona Summit
Ceph Performance on OpenStack - Barcelona SummitCeph Performance on OpenStack - Barcelona Summit
Ceph Performance on OpenStack - Barcelona SummitTakehiro Kudou
 
HANA SPS07 Extended Application Service
HANA SPS07 Extended Application ServiceHANA SPS07 Extended Application Service
HANA SPS07 Extended Application ServiceSAP Technology
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataInfiniteGraph
 
GlassFish in Production Environments
GlassFish in Production EnvironmentsGlassFish in Production Environments
GlassFish in Production EnvironmentsBruno Borges
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewPaulo Fagundes
 
Rational Team Concertfor Power Customer Presentation02 09 10
Rational Team Concertfor Power Customer Presentation02 09 10Rational Team Concertfor Power Customer Presentation02 09 10
Rational Team Concertfor Power Customer Presentation02 09 10Strongback Consulting
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasAMD Developer Central
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 

Similar to HSA Introduction Hot Chips 2013 (20)

"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
 
Cloudera - Amr Awadallah - Hadoop World 2010
Cloudera - Amr Awadallah - Hadoop World 2010Cloudera - Amr Awadallah - Hadoop World 2010
Cloudera - Amr Awadallah - Hadoop World 2010
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian Bratt
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
SAP HANA Native Application Development
SAP HANA Native Application DevelopmentSAP HANA Native Application Development
SAP HANA Native Application Development
 
PaaS on Openstack
PaaS on OpenstackPaaS on Openstack
PaaS on Openstack
 
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
Apache Hive: From MapReduce to Enterprise-grade Big Data WarehousingApache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
 
getting started with e2studio
getting started with e2studiogetting started with e2studio
getting started with e2studio
 
H2O platform workshop
H2O platform workshopH2O platform workshop
H2O platform workshop
 
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)SAS on Your (Apache) Cluster, Serving your Data (Analysts)
SAS on Your (Apache) Cluster, Serving your Data (Analysts)
 
Ceph Performance on OpenStack - Barcelona Summit
Ceph Performance on OpenStack - Barcelona SummitCeph Performance on OpenStack - Barcelona Summit
Ceph Performance on OpenStack - Barcelona Summit
 
HANA SPS07 Extended Application Service
HANA SPS07 Extended Application ServiceHANA SPS07 Extended Application Service
HANA SPS07 Extended Application Service
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
 
Openshift Enterprise
Openshift EnterpriseOpenshift Enterprise
Openshift Enterprise
 
GlassFish in Production Environments
GlassFish in Production EnvironmentsGlassFish in Production Environments
GlassFish in Production Environments
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overview
 
SAP HANA Cloud – Virtual Bootcamp: How to use the HANA Persistence Se…
SAP HANA Cloud – Virtual Bootcamp: How to use the HANA Persistence Se…SAP HANA Cloud – Virtual Bootcamp: How to use the HANA Persistence Se…
SAP HANA Cloud – Virtual Bootcamp: How to use the HANA Persistence Se…
 
Rational Team Concertfor Power Customer Presentation02 09 10
Rational Team Concertfor Power Customer Presentation02 09 10Rational Team Concertfor Power Customer Presentation02 09 10
Rational Team Concertfor Power Customer Presentation02 09 10
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 

More from HSA Foundation

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 ProvisionalHSA Foundation
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)HSA Foundation
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - CompilationsHSA Foundation
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed HSA Foundation
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Foundation
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...HSA Foundation
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012HSA Foundation
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMDHSA Foundation
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.HSA Foundation
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAHSA Foundation
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is InvaluableHSA Foundation
 

More from HSA Foundation (12)

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 Provisional
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSA
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is Invaluable
 

Recently uploaded

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 

Recently uploaded (20)

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 

HSA Introduction Hot Chips 2013

  • 1. HETEROGENEOUS SYSTEM ARCHITECTURE OVERVIEW HOT CHIPS TUTORIAL - AUGUST 2013 PHIL ROGERS HSA FOUNDATION PRESIDENT AMD CORPORATE FELLOW
  • 2. HSA FOUNDATION  Founded in June 2012  Developing a new platform for heterogeneous systems  www.hsafoundation.com  Specifications under development in working groups  Our first specification, HSA Programmers Reference Manual is already published and available on our web site  Additional specifications for System Architecture, Runtime Software and Tools are in process © Copyright 2012 HSA Foundation. All Rights Reserved. 2
  • 3. HSA FOUNDATION MEMBERSHIP — AUGUST 2013 © Copyright 2012 HSA Foundation. All Rights Reserved. 3 Founders Promoters Supporters Contributors Academic Associates
  • 4. SOCS HAVE PROLIFERATED — MAKE THEM BETTER  SOCs have arrived and are a tremendous advance over previous platforms  SOCs combine CPU cores, GPU cores and other accelerators, with high bandwidth access to memory  How do we make them even better?  Easier to program  Easier to optimize  Higher performance  Lower power  HSA unites accelerators architecturally  Early focus on the GPU compute accelerator, but HSA goes well beyond the GPU © Copyright 2012 HSA Foundation. All Rights Reserved. 4
  • 5. INFLECTIONS IN PROCESSOR DESIGN © Copyright 2012 HSA Foundation. All Rights Reserved. 5 ? Single-thread Performance Time we are here Enabled by:  Moore‘s Law  Voltage Scaling Constrained by: Power Complexity Single-Core Era ModernApplication Performance Time (Data-parallel exploitation) we are here Heterogeneous Systems Era Enabled by:  Abundant data parallelism  Power efficient GPUs Temporarily Constrained by: Programming models Comm.overhead Throughput Performance Time (# of processors) we are here Enabled by:  Moore‘s Law  SMP architecture Constrained by: Power Parallel SW Scalability Multi-Core Era Assembly  C/C++  Java … pthreads  OpenMP / TBB … Shader  CUDA OpenCL  C++ and Java
  • 6. HIGH LEVEL FEATURES OF HSA  Features currently being defined in the HSA Working Groups**  Unified addressing across all processors  Operation into pageable system memory  Full memory coherency  User mode dispatch  Architected queuing language  High level language support for GPU compute processors  Preemption and context switching © Copyright 2012 HSA Foundation. All Rights Reserved. 6 ** All features subject to change, pending completion and ratification of specifications in the HSA Working Groups
  • 7. HSA — AN OPEN PLATFORM  Open Architecture, membership open to all  HSA Programmers Reference Manual  HSA System Architecture  HSA Runtime  Delivered via royalty free standards  Royalty Free IP, Specifications and APIs  ISA agnostic for both CPU and GPU  Membership from all areas of computing  Hardware companies  Operating Systems  Tools and Middleware © Copyright 2012 HSA Foundation. All Rights Reserved. 7
  • 8. HSA INTERMEDIATE LAYER — HSAIL  HSAIL is a virtual ISA for parallel programs  Finalized to ISA by a JIT compiler or ―Finalizer‖  ISA independent by design for CPU & GPU  Explicitly parallel  Designed for data parallel programming  Support for exceptions, virtual functions, and other high level language features  Lower level than OpenCL SPIR  Fits naturally in the OpenCL compilation stack  Suitable to support additional high level languages and programming models:  Java, C++, OpenMP, etc © Copyright 2012 HSA Foundation. All Rights Reserved. 8
  • 9. HSA MEMORY MODEL  Defines visibility ordering between all threads in the HSA System  Designed to be compatible with C++11, Java, OpenCL and .NET Memory Models  Relaxed consistency memory model for parallel compute performance  Visibility controlled by:  Load.Acquire  Store.Release  Barriers © Copyright 2012 HSA Foundation. All Rights Reserved. 9
  • 10. HSA QUEUING MODEL  User mode queuing for low latency dispatch  Application dispatches directly  No OS or driver in the dispatch path  Architected Queuing Layer  Single compute dispatch path for all hardware  No driver translation, direct to hardware  Allows for dispatch to queue from any agent  CPU or GPU  GPU self enqueue enables lots of solutions  Recursion  Tree traversal  Wavefront reforming © Copyright 2012 HSA Foundation. All Rights Reserved. 10
  • 12. Hardware - APUs, CPUs, GPUs Driver Stack Domain Libraries OpenCL™, DX Runtimes, User Mode Drivers Graphics Kernel Mode Driver Apps Apps Apps Apps Apps Apps HSA Software Stack Task Queuing Libraries HSA Domain Libraries, OpenCL ™ 2.x Runtime HSA Kernel Mode Driver HSA Runtime HSA JIT Apps Apps Apps Apps Apps Apps User mode component Kernel mode component Components contributed by third parties TITLE © Copyright 2012 HSA Foundation. All Rights Reserved. 12
  • 13. OPENCL™ AND HSA  HSA is an optimized platform architecture for OpenCL™  Not an alternative to OpenCL™  OpenCL™ on HSA will benefit from  Avoidance of wasteful copies  Low latency dispatch  Improved memory model  Pointers shared between CPU and GPU  OpenCL™ 2.0 shows considerable alignment with HSA  Many HSA member companies are also active with Khronos in the OpenCL™ working group © Copyright 2012 HSA Foundation. All Rights Reserved. 13
  • 14. BOLT — PARALLEL PRIMITIVES LIBRARY FOR HSA  Easily leverage the inherent power efficiency of GPU computing  Common routines such as scan, sort, reduce, transform  More advanced routines like heterogeneous pipelines  Bolt library works with OpenCL and C++ AMP  Enjoy the unique advantages of the HSA platform  Move the computation not the data  Finally a single source code base for the CPU and GPU!  Developers can focus on core algorithms  Bolt version 1.0 for OpenCL and C++ AMP is available now at https://github.com/HSA-Libraries/Bolt © Copyright 2012 HSA Foundation. All Rights Reserved. 14
  • 15. HSA OPEN SOURCE SOFTWARE  HSA will feature an open source linux execution and compilation stack  Allows a single shared implementation for many components  Enables university research and collaboration in all areas  Because it‘s the right thing to do © Copyright 2012 HSA Foundation. All Rights Reserved. 15 Component Name IHV or Common Rationale HSA Bolt Library Common Enable understanding and debug HSAIL Code Generator Common Enable research LLVM Contributions Common Industry and academic collaboration HSAIL Assembler Common Enable understanding and debug HSA Runtime Common Standardize on a single runtime HSA Finalizer IHV Enable research and debug HSA Kernel Driver IHV For inclusion in linux distros
  • 16. ACCELERATING JAVA GOING BEYOND NATIVE LANGUAGES
  • 17. JAVA ENABLEMENT BY APARAPI © Copyright 2012 HSA Foundation. All Rights Reserved. 17 Developer creates Java™ source Source compiled to class files (bytecode) using standard compiler Aparapi = Runtime capable of converting Java™ bytecode to OpenCL™ For execution on any OpenCL™ 1.1+ capable device OR execute via a thread pool if OpenCL™ is not available
  • 18. JAVA HETEROGENEOUS ENABLEMENT ROADMAP CPU ISA GPU ISA JVM Application APARAPI GPUCPU OpenCL™ © Copyright 2012 HSA Foundation. All Rights Reserved. 18 CPU ISA GPU ISA JVM Application APARAPI HSA CPUHSA CPU HSA Finalizer HSAIL CPU ISA GPU ISA JVM Application APARAPI HSA CPUHSA CPU HSA Finalizer HSAIL HSA Runtime LLVM Optimizer IR CPU ISA GPU ISA Sumatra Enabled JVM Application HSA CPUHSA CPU HSA Finalizer HSAIL
  • 19. SUMATRA PROJECT OVERVIEW  AMD/Oracle sponsored Open Source (OpenJDK) project  Targeted at Java 9 (2015 release)  Allows developers to efficiently represent data parallel algorithms in Java  Sumatra ‗repurposes‘ Java 8‘s multi-core Stream/Lambda API‘s to enable both CPU or GPU computing  At runtime, Sumatra enabled Java Virtual Machine (JVM) will dispatch ‗selected‘ constructs to available HSA enabled devices  Developers of Java libraries are already refactoring their library code to use these same constructs  So developers using existing libraries should see GPU acceleration without any code changes  http://openjdk.java.net/projects/sumatra/  https://wikis.oracle.com/display/HotSpotInternals/Sumatra  http://mail.openjdk.java.net/pipermail/sumatra-dev/ © Copyright 2012 HSA Foundation. All Rights Reserved. 19 Application.java Java Compiler GPUCPU Sumatra Enabled JVM Application GPU ISA Lambda/Stream API CPU ISA Application.class Development Runtime HSA Finalizer
  • 21. HAAR FACE DETECTION CORNERSTONE TECHNOLOGY FOR COMPUTERVISION
  • 22. LOOKING FOR FACES IN ALL THE RIGHT PLACES Quick HD Calculations Search square = 21 x 21 Pixels = 1920 x 1080 = 2,073,600 Search squares = 1900 x 1060 = ~2 Million © Copyright 2012 HSA Foundation. All Rights Reserved. 22
  • 23. LOOKING FOR DIFFERENT SIZE FACES — BY SCALING THE VIDEO FRAME © Copyright 2012 HSA Foundation. All Rights Reserved. 23 More HD Calculations 70% scaling in H and V Total Pixels = 4.07 Million Search squares = 3.8 Million
  • 24. HAAR CASCADE STAGES © Copyright 2012 HSA Foundation. All Rights Reserved. 24 Feature l Feature m Feature p Feature r Feature q Feature k Stage N Stage N+1 Face still possible?Yes No REJECT FRAME
  • 25. 22 CASCADE STAGES, EARLY OUT BETWEEN EACH © Copyright 2012 HSA Foundation. All Rights Reserved. 25 STAGE 22STAGE 21STAGE 2STAGE 1 NO FACE FACE CONFIRMED Final HD Calculations Search squares = 3.8 million Average features per square = 124 Calculations per feature = 100 Calculations per frame = 47 GCalcs Calculation Rate 30 frames/sec = 1.4TCalcs/second 60 frames/sec = 2.8TCalcs/second … and this only gets front-facing faces
  • 26. UNBALANCING DUE TO EARLY EXITS  When running on the GPU, we run each search rectangle on a separate work item  Early out algorithms, like HAAR, exhibit divergence between work items  Some work items exit early  Their neighbors continue  SIMD packing suffers as a result © Copyright 2012 HSA Foundation. All Rights Reserved. 26 Live Dead
  • 27. CASCADE DEPTH ANALYSIS © Copyright 2012 HSA Foundation. All Rights Reserved. 27 0 5 10 15 20 25 Cascade Depth 20-25 15-20 10-15 5-10 0-5
  • 28. PROCESSING TIME/STAGE © Copyright 2012 HSA Foundation. All Rights Reserved. 28 AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1) 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9-22 Time(ms) “Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz) GPU CPU
  • 29. PERFORMANCE CPU-VS-GPU 0 2 4 6 8 10 12 0 1 2 3 4 5 6 7 8 22 Images/Sec Number of Cascade Stages on GPU “Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz) CPU HSA GPU © Copyright 2012 HSA Foundation. All Rights Reserved. 29 AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)
  • 30. HAAR SOLUTION — RUN DIFFERENT CASCADES ON GPU AND CPU © Copyright 2012 HSA Foundation. All Rights Reserved. 30 +2.5x -2.5x INCREASED PERFORMANCE DECREASED ENERGY PER FRAME By seamlessly sharing data between CPU and GPU, allows the right processor to handle its appropriate workload
  • 32. SUFFIX ARRAYS  Suffix Arrays are a fundamental data structure  Designed for efficient searching of a large text  Quickly locate every occurrence of a substring S in a text T  Suffix Arrays are used to accelerate in-memory cloud workloads  Full text index search  Lossless data compression  Bio-informatics © Copyright 2012 HSA Foundation. All Rights Reserved. 32
  • 33. ACCELERATED SUFFIX ARRAY CONSTRUCTION ON HSA © Copyright 2012 HSA Foundation. All Rights Reserved. 33 M. Deo, ―Parallel Suffix Array Construction and Least Common Prefix for the GPU‖, Submitted to ‖Principles and Practice of Parallel Programming, (PPoPP‘13)‖ February 2013. AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM By offloading data parallel computations to GPU, HSA increases performance and reduces energy for Suffix Array Construction versus Single Threaded CPU. By efficiently sharing data between CPU and GPU, HSA lets us move compute to data without penalty of intermediate copies. +5.8x -5x INCREASED PERFORMANCE DECREASED ENERGYMerge Sort::GPU Radix Sort::GPU Compute SA::CPU Lexical Rank::CPU Radix Sort::GPU Skew Algorithm for Compute SA
  • 35. RIGID BODY PHYSICS SIMULATION  Rigid-Body Physics Simulation is:  A way to animate and interact with objects, widely used in games and movie production  Used to drive game play and for visual effects (eye candy)  Physics Simulation is used in many of today‘s software:  Middleware Physics engines such as Bullet, Havok, PhysX  Games ranging from Angry Birds and Cut the Rope to Tomb Raider and Crysis 3  3D authoring tools such as Autodesk Maya, Unity 3D, Houdini, Cinema 4D, Lightwave  Industrial applications such as Siemens NX8 Mechatronics Concept Design  Medical applications such as surgery trainers  Robotics simulation  But GPU-accelerated rigid-body physics is not used in game play — only in effects © Copyright 2012 HSA Foundation. All Rights Reserved. 35
  • 36. RIGID BODY PHYSICS — ALGORITHM  Find potential interacting object ―pairs‖ using bounding shape approximations.  Perform full overlap testing between potentially interacting pairs  Compute exact contact information for a various shape types  Compute constraint forces for natural motion and stable stacking © Copyright 2012 HSA Foundation. All Rights Reserved. Broad-Phase Collision Detection Setup constraints Solve constraints Compute contact points A B0 B1 C0 C1 D1 D1 A 1 1 2 2 3 3 4 4 B D A 1 2 3 4 Mid-Phase Collision Detection Narrow-Phase Collision Detection 36
  • 37. RIGID BODY PHYSICS — CHALLENGES & SOLUTIONS Implementation Challenges  Game engine and Physics engine need to interact synchronously during simulation  The set of pairs can be huge and changes from frame to frame  Thousands to Millions for any given frame  Narrow-phase algorithms cause thread divergence Benefits of HSA  Fast CPU round-trips  User mode dispatch  Unified Addressing,  Pageable memory,  Coherency  Supports as large a pair list as CPU  Entire memory space  Dynamic memory allocation  Improved handling of divergence  GPU enqueue © Copyright 2012 HSA Foundation. All Rights Reserved. 37
  • 38. EASE OF PROGRAMMING CODE COMPLEXITY VS. PERFORMANCE
  • 39. LINES-OF-CODE AND PERFORMANCE FOR DIFFERENT PROGRAMMING MODELS AMD A10-5800K APU with Radeon™ HD Graphics – CPU: 4 cores, 3800MHz (4200MHz Turbo); GPU: AMD Radeon HD 7660D, 6 compute units, 800MHz; 4GB RAM. Software – Windows 7 Professional SP1 (64-bit OS); AMD OpenCL™ 1.2 AMD-APP (937.2); Microsoft Visual Studio 11 Beta 0 50 100 150 200 250 300 350 LOC Copy-back Algorithm Launch Copy Compile Init Performance Serial CPU TBB Intrinsics+TBB OpenCL™-C OpenCL™ -C++ C++ AMP HSA Bolt Performance 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0Copy-back Algorithm Launch Copy Compile Init. Copy-back Algorithm Launch Copy Compile Copy-back Algorithm Launch Algorithm Launch Algorithm Launch Algorithm Launch Algorithm Launch (Exemplary ISV ―Hessian‖ Kernel) © Copyright 2012 HSA Foundation. All Rights Reserved. 39
  • 40. © Copyright 2012 HSA Foundation. All Rights Reserved. 40 THE HSA FUTURE  Architected heterogeneous processing on the SOC  Programming of accelerators becomes much easier  Accelerated software that runs across multiple hardware vendors  Scalability from smart phones to super computers on a common architecture  GPU acceleration of parallel processing is the initial target, with DSPs and other accelerators coming to the HSA system architecture model  Heterogeneous software ecosystem evolves at a much faster pace  Lower power, more capable devices in your hand, on the wall, in the cloud

Editor's Notes

  1. We will be open on this.We will reach out to partners and collaborate to bring this to market in the right form
  2. Lets take a deeper dive here into the details of the architecture …
  3. The memory model for a new architecture is key
  4. The memory model for a new architecture is key
  5. Easily leverage the inherent power efficiency of GPU computingCommon routines such as scan, sort, reduce, transformMore advanced routines like heterogeneous pipelinesBolt library works with OpenCL or C++ AMPEnjoy the unique advantages of the HSA platformHigh-performance shared virtual memoryTightly integrated CPU and GPUMove the computation not the dataFinally a single source code base for the CPU and GPU!Developers can focus on core algorithmsLibrary handles the details for different processorsHeterogeneous pipeline allows different stages targeted to CPU or GPU, or tasks that the runtime can put on eitherParallel_filter designed to solve problems like Haar face-detection — early stages on GPU, later stages on CPU.Flexible customization provided by C++ templatesEnjoy the unique advantages of the HSA platformHigh-performance shared virtual memoryDirectly interface with host data structures (e.g., std::vector) Tightly integrated CPU and GPUMove computation not the dataUse appropriate computation resource (CPU or GPU)Ben Sander Bolt talk at 5:15PM–6:00PM, Wednesday, June 13. Hyatt:Evergreen H.
  6. We have already released APARAPI, an acceleration library for Java, layered on OpenCL.The idea behind APARAPI is to allow a write once, run anywhere, solution for Java that does not require the programmer to learn a new language.We supply a library with a kernel class, that the Java programmer can override with their own Java code.Then at runtime, the APARAPI library converts that bytecode to OpenCL, or to a thread pool if an OpenCL device is not available.This has proven popular with the Java community.
  7. Now lets analyze how it works, in the HAAR algorithm.
  8. Now, just what do we do in each of those search boxes?
  9. Each successive cascade stage searches for more features.- Initial cascade stage is cheap, each successive cascade stage is more expensiveExit from a cascade stage means a face is not present in the rectangleSurviving all the cascades confirms a face is presentThis is a nested data parallel opportunity:Significant chunks of parallel code in control flow
  10. The GPU runs at its peak efficiency when its SIMD engines are fully loaded with live work items.Lets see how this looks relative to the CPU at each stage.
  11. Here is a 3D graph of the cascade depth used when searching for the face in the Mona LisaNote that most of the searches exit in 5 cascade stages or fewerA considerable number run between 5–10.And we see the broad pillar that represents the face, where all 22 stages returned a positive results.This is not a classically data parallel workload. Far from it.
  12. Here we see what happens as more and more work items exit on each cascade stage.In the first stage, the GPU is fully loaded and more than 5 times as fast as the CPU.For the next few stages, it is close, and the CPU is clearly better at the latter stages.How best to run this workload?Run early stages on the GPU, and later stages on the CPURun each workload where it is most power efficient!HSA allows this since we are in shared coherent memory
  13. In this graph we show the range of performance and power savings possible by running the first N stages on the GPU, followed by the rest on the CPU.Note that running all on the CPU or all on the GPU, gets a similar bad result.If we run the first 3 stages on the GPU, then switch to the CPU, we get the best result.Note even though this is running fewer stages on the GPU, most of the search squares have exited by the time we switch to the CPU.
  14. And here is the result.Performance more than doubles, and we process each frame with 40% of the power on average.HSA allows the right processor …
  15. Broad Phase: Finding out which objects to collide against other objects. The simplest method is to check collisions with every object against every other object. This is very fast when you have small groups of objects that you want to collide against each other (< 10 we'll say), but extremely slow when you have a lot of objects (< 1000). The reason is that for 10 objects you would need 45 mid-phase checks and for 1000 you would need 499500. It doesn't scale to large numbers very well, and when you have a lot of objects you'll spend more time doing mid-phase checks than anything else. To speed up the broad phase, you'll need to use a smarter algorithm such as sort and sweep, spatial hashing, quadtrees, or some other spatial index that only tries combinations of objects that are likely to be colliding.Mid Phase: This is where you normally do a simple collision check such as a bounding box or bounding sphere check because they would be faster than your narrow phase. Most spatial index structures are going to return some false positives, and it's best to catch them early instead of calling the more expensive narrow phase collision routines.Narrow Phase: The narrow phase is the most specific and most expensive collision detection check such as a polygon to polygon check or a pixel to pixel check. A lot of games do just fine with bounding spheres or bounding boxes and don't need a more specific narrow phase check.
  16. Key Points:Writing optimal CPU implementations requires complex development too. Programmers have to use both intrinsics for vector parallelism, and TBB for multicore parallelism. OpenCL C OpenCL-C is widely known fairly verbose C-based API, and it shows in the boilerplate initialization code, runtime-compile code, and kernel launch. OpenCL C++ : Removes initialization code by providing sensible defaults for platform, context, device, command-queue. No need to set these up, and no need to save them and drag them around for later OCL API calls. Reduce code to compile by using C++ exceptions for error-checking, automatic memory allocation (rather than calling API to determine size of return args) Default arguments, type-checking — code focuses on relevant parameters. The host-side support for C++ is available in a “cl.hpp” file which runs on any OpenCL implementation (including NV, Intel, etc). In addition, AMD OpenCL implementation supports “static” C++ kernel language — classes, namespaces, templates. (Not used in the this implementation). C++ AMP Initialization is handled through sensible defaults. C++AMP eliminates platform, context, accelerator_view combines device and queue. Single-source model : Eliminates run-time compile code, this is done at compile-time with the host code. Single-source model : streamlined kernel call convention (eliminate clSetKernelArg) The implementation here uses C++11 lambda to reduce boilerplate code for functor construction (kernel can directly access local vars) Data xfer reduced by implicit transfers performed by array_view BOLT Moves reduction code into library — what’s left is reduction operator. Removes data xfer and copy-back — interface is directly with host data structures. Bolt-for-C++AMP uses lambda syntax; Bolt-For-OCL does not (not supported) Bolt-for-OCL implementation relies on C++ static kernel language — recently introduced in AMD APP SDK 2.6 (beta) and 2.7 (production?) Other notes: Serial CPU integrates algorithm and reduction and we call it just “algorithm” ; later implementations separate these for performance. Launch is argument setup and calling of the kernel or library routine. Copy-back includes code to copy data back to host and run a host-side final reduction step. LOC includes appropriate spaces and comments. We attempted to use similar coding style across all implementations.Tbb init is 1-line to initialize the scheduler. (tbb::task_scheduler_init)