FUSION APU AND TRENDS/
CHALLENGES IN FUTURE
SOC (PROCESSOR) DESIGN
Denis Foley. Sr. Fellow, AMD
9th International SoC Conference
2nd & 3rd November 2011
2 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
– Three Eras of Processor Performance
– Evolution of Heterogeneous Computing
FSA and Open Standard:
– Why Fusion ?
– Open Standard, Open CL
High Speed, Scalable Interconnect: NoC’s
SoC Trends & Challenges
– Verification Effort
– IP Integration
– TLM, RTL Co-simulation challenges.
3 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
TRENDS: THREE ERAS OF PROCESSOR PERFORMANCE
(# of Processors)
Desire for Throughput
20 years of SMP arch
Parallel SW availability
Abundant data parallelism
Power efficient GPUs
Currently constrained by:
4 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
TRENDS: EVOLUTION OF HETEROGENEOUS COMPUTINGArchitectureMaturity&ProgrammerAccessibility
2012 - 20202009 - 20112002 - 2008
Graphics & Proprietary
Proprietary Drivers Era
Exploit early programmable
“shader cores” in the GPU
Make your program look like
“graphics” to the GPU
CUDA™, Brook+, etc
Standards Drivers Era
C and C++ subsets
Compute centric APIs , data
Multiple address spaces with
explicit data movement
Specialized work queue based
Kernel mode dispatch
Fusion™ System Architecture
GPU Peer Processor
GPU as a co-processor
Unified coherent address space
Task parallel runtimes
Nested Data Parallel programs
User mode dispatch
Pre-emption and context
More uptodate information on FSA:
5 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
FSA & OPEN STANDARD: ENTER FUSION
Dual Core CPU Northbridge DirectX®11 GPU
(Accelerated Processing Unit)
Heterogeneous compute engine combining
x86 compute and parallel processing
capabilities of the GPU on a single die
6 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
FSA & OPEN STANDARD: WHY FUSION?
Integrating CPUs, Northbridge and GPU enables:
– Unified Memory
– High-bandwidth, low latency access by GPU
– Saves on interface power and PHY area
– Shared Power Control and TDP envelope
Potential bandwidth bottleneck
Relatively long memory latency
7 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
COMMITTED TO OPEN STANDARDS
AMD drives open and de-facto
– Compete on the best
Open standards are the basis for
Open standards always win over
– SW developers want their
applications to run on multiple
platforms from multiple
8 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
OPENCL™ AND FSA
FSA is an optimized platform
architecture for OpenCL™
– Not an alternative to OpenCL™
OpenCL™ on FSA will benefit from
– Avoidance of wasteful copies
– Low latency dispatch
– Improved memory model
– Shared pointers
FSA also exposes a lower level
programming interface, for those
that want the ultimate in control
Optimized libraries may choose
the lower level interface
10 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
POWER-THERMAL EFFECTS IN SYSTEMS ON CHIPS
¡ Local failures !
Part not working
Complex SoCs: High power density
Non-uniform power dissipation: Hotspots
Spatial gradients: Cause malfunctions
High on-chip temperatures cause
malfunctions affecting reliability.
Power consumption depends on
Setting frequencies to control power and
11 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
OPTIONS FOR POWER SAVINGS
Convergence of Performance and Low Power
– Notebook->Netbook-> Tablet
12 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
PERFORMANCE AND POWER
S3 idle Static
APU Power vs. Use Case
Performance versus Power Efficiency
Power Management versus Power reduction
Performance & Thermal Design Power
14 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
NOC’S: FROM BUSES TO NETWORKS:
Note: This slide presents industry specific information does not relate to AMD NoC status
15 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
NOC CHALLENGES: CAD TOOLS
Capturing application traffic.
Which Topology ?
Mapping? Routes to use?
architecture : parameters.
Verification for correctness, performance.
QoS under un-reliable conditions.
Key to success: Automate & integrate the steps.
homogeneous systems, with
heterogeneous systems, with
different cores & irregular FP
Mapping, QoS, middleware...
Packeting, buffering, flow control...
Synchronization, wires, power...
16 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
Synchronous Delay Insensitive
Local Clocks, Interaction
with data (becoming aperiodic)
A complete spectrum of approaches to system-timing exist
NOC CHALLENGES: BEYOND GLOBAL SYNCHRONY
18 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
Supporting Heterogeneous computing: high density, high performance,
high memory B.W requirement.
3-D NoC’s option
This slide presents industry specific information does not relate to AMD 3-D stacking status
20 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
WHAT’S NEW IN SOC DESIGN?
Larger and more complex chips with heavy use of pre-existing cores.
Heavy use of multi core processors and DSPs.
Shorter time to market and Smaller design teams.
… and software.
– Increased verification effort: Debugging is harder.
– Integration is more difficult.
– Need for scalable and high speed interconnect.
– SW / HW co-simulation is a major issue.
– Power –Performance challenge.
– How do we treat the system software?
21 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
– Seamless debug across
h/w and software[especially SW]
– Several methodologies
– UCIS,UVM TLM2.0
– Coverage trend
Address Gaps in VHDL,
System C coverage
22 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
– Direct & Random
– Run time improvement
Verification Cycle per second instead of Cycles per second:
Configuring environment to dynamically select relevant
23 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
Emulation Focus Areas:
1. Tests/regression run with Long run time
2. Corner case bugs that may escape traditional verification
3. Replicating System level scenarios
1.Seemless support for assertions.
2.Improve portability between Simulation & Emulation
3. Common model from TLM-HDL-Emulation
24 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
IP INTEGRATION CHALLENGE
Integration of IP :
– Multiple IP’s, various configurations, design languages
– IP’s to be in Sync: macro’s , libraries.
– Complexity increases with mixed language designs
Diversity of Design
25 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
IP INTEGRATION CHALLENGE COMPARISON OF CHOICES
Yes Yes Yes Yes Yes
Yes Yes Yes Yes Yes
No Yes No No No
Performance ++++ (3) +++ (2) + (1) + (1) +++++(4)
Delta Delay Yes Yes No No No
SC + SV/VHDL
SC + SV
Gap: No standardized automated methodology for integration.
• Understand IP blocks: language, source code availability.
• Understand connection: 1-1, distributed, method port
• Option for optimized solution to quickly build a system
26 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
IP INTEGRATION CHALLENGE: GAPS WITH ANALOG IP
INTEGRATION IN SOC
Table1. Gaps with Analog IP Integration in SoC
Gaps Root Cause
-Testchip scenario is different
-Tester used for testchip differs
-Incomplete inbuilt SoC test/debug capability or derisk option for basic
functionality such as PLL clock
IP I/F verification -Incomplete test setup
-No common detailed review process between IP and SoC team. Incorrect
assumption based on past analog IP working silicon
-Mismtach in version between IP simulation model and spice netlist
-Limitations of behavioral model to replicate actual analog IP functionality
EDA tools -Gaps in analog and digital simulation environment
27 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
Verification Environment Bring-up
– Automated Assertions for early checks.
– Review forces, tie-off and relevant checkers from IP to SoC
– Bottleneck for SoC team to get started with verification: Option to use
fake model for initial bring up. Usage of system model.
– Super Block Concept: pre-verified IP blocks at similar frequency &
Current solution: In-house methodology and process. No clear solution
from EDA vendors.
IP INTEGRATION CHALLENGE
28 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
TLM, RTL Co-simulation
Traditional use of System level models : Architecture profiling &
Increasing Demand for Co-simulation: Tradeoff between Accuracy and
Different level of Abstraction.
Need for improvement in Integration methodology and Test bench
Seamless Debug and Coverage methodology.
Using System Level model for HDL generation
Legacy system model not written with conversion in mind.
Current limitation: Incomplete translation.
Lack of reliable Equivalence Check tool.
Need: Merge top down (SystemC) and bottom-up (System Verilog)
Gaps/Work to do: How to do Power analysis
30 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
 Wilson Research Group-MGC study blog 2011.
 AMD Coolchip2011 presentation. Denis Foley, AMD Sr. Fellow.
 Fusion Processors and HPC-2011, Chuck Moore, AMD Corporate
Fellow & Technology Group CTO
 AMD Fusion Developer Summit 2011. Phil Rogers, AMD Corporate
 Fully Asynchronous framework for GALS network on chip. Friedman H
Future of EE, NoC’s presentation. Dr. Srinivasan Murali
 Analog IP integration in SoC, IP reuse’09. Mixed language IP integration
DVCoN 2010. Extending Fucntional coverage to SystemC, VHDL-IP’10.
31 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011
GPU – Graphics processing unit
APU: Accelerated Processing Unit
Open CL: Open Computing Language
TDP – Thermal Design power – a measure of a design
infrastructure’s ability to cool a device
NoC: Network On Chip
TLM: Transaction Level Modeling
Turbo Core – AMD boost mechanism
QoS: Quality of Service
UVM: Universal Verification Methodology
UCIS: Unified Coverage Interoperability Standard