Your SlideShare is downloading. ×
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN

979
views

Published on

9th International SoC Conference 2011 …

9th International SoC Conference 2011
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
979
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. FUSION APU AND TRENDS/ CHALLENGES IN FUTURE SOC (PROCESSOR) DESIGN Pankaj Singh, Acknowledgement: Denis Foley. Sr. Fellow, AMD 9th International SoC Conference 2nd & 3rd November 2011
  • 2. 2 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 TODAY’S TOPICS  Trends: – Three Eras of Processor Performance – Evolution of Heterogeneous Computing  FSA and Open Standard: – Why Fusion ? – Open Standard, Open CL  Power, Performance  High Speed, Scalable Interconnect: NoC’s  3-D Stacking  SoC Trends & Challenges – Verification Effort – IP Integration – TLM, RTL Co-simulation challenges.
  • 3. 3 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 TRENDS: THREE ERAS OF PROCESSOR PERFORMANCE Single-Core Era Single-threadPerformance ? Time we are here o Enabled by:  Moore’s Law  Voltage Scaling  MicroArchitecture Constrained by: Power Complexity Multi-Core Era ThroughputPerformance Time (# of Processors) we are here o Enabled by:  Moore’s Law  Desire for Throughput  20 years of SMP arch Constrained by: Power Parallel SW availability Scalability Heterogeneous Systems Era TargetedApplication Performance Time (Data-parallel exploitation) we are here o Enabled by:  Moore’s Law  Abundant data parallelism  Power efficient GPUs Currently constrained by: Programming models Communication overheads
  • 4. 4 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 TRENDS: EVOLUTION OF HETEROGENEOUS COMPUTINGArchitectureMaturity&ProgrammerAccessibility PoorExcellent 2012 - 20202009 - 20112002 - 2008 Graphics & Proprietary Driver-based APIs Proprietary Drivers Era  “Adventurous” programmers  Exploit early programmable “shader cores” in the GPU  Make your program look like “graphics” to the GPU  CUDA™, Brook+, etc OpenCL™, DirectCompute Driver-based APIs Standards Drivers Era  Expert programmers  C and C++ subsets  Compute centric APIs , data types  Multiple address spaces with explicit data movement  Specialized work queue based structures  Kernel mode dispatch Fusion™ System Architecture GPU Peer Processor Architected Era  Mainstream programmers  Full C++  GPU as a co-processor  Unified coherent address space  Task parallel runtimes  Nested Data Parallel programs  User mode dispatch  Pre-emption and context switching More uptodate information on FSA: http://developer.amd.com/afds/pages/keynote.aspx#/Dev_AFDS_Reb_2
  • 5. 5 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 FSA & OPEN STANDARD: ENTER FUSION Dual Core CPU Northbridge DirectX®11 GPU FUSION APU (Accelerated Processing Unit) Heterogeneous compute engine combining x86 compute and parallel processing capabilities of the GPU on a single die
  • 6. 6 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 FSA & OPEN STANDARD: WHY FUSION? 6  Integrating CPUs, Northbridge and GPU enables: – Unified Memory – High-bandwidth, low latency access by GPU – Saves on interface power and PHY area – Shared Power Control and TDP envelope Potential bandwidth bottleneck Relatively long memory latency
  • 7. 7 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 COMMITTED TO OPEN STANDARDS  AMD drives open and de-facto standards – Compete on the best implementation  Open standards are the basis for large ecosystems  Open standards always win over time – SW developers want their applications to run on multiple platforms from multiple hardware vendors DirectX®
  • 8. 8 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 OPENCL™ AND FSA  FSA is an optimized platform architecture for OpenCL™ – Not an alternative to OpenCL™  OpenCL™ on FSA will benefit from – Avoidance of wasteful copies – Low latency dispatch – Improved memory model – Shared pointers  FSA also exposes a lower level programming interface, for those that want the ultimate in control and performance  Optimized libraries may choose the lower level interface
  • 9. POWER & PERFORMANCE
  • 10. 10 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 POWER-THERMAL EFFECTS IN SYSTEMS ON CHIPS ¡ Local failures ! Part not working  Complex SoCs: High power density  Non-uniform power dissipation: Hotspots  Spatial gradients: Cause malfunctions  High on-chip temperatures cause malfunctions affecting reliability.  Power consumption depends on frequency  Setting frequencies to control power and temperature
  • 11. 11 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 OPTIONS FOR POWER SAVINGS  Convergence of Performance and Low Power – Notebook->Netbook-> Tablet Tablet<-Smartphone
  • 12. 12 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 PERFORMANCE AND POWER S3 idle Static Screen MM07 Media Playback Full Compute APU Power vs. Use Case Performance Power  Performance versus Power Efficiency  Power Management versus Power reduction  Performance & Thermal Design Power
  • 13. HIGH SPEED, SCALABLE INTERCONNECT
  • 14. 14 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 NOC’S: FROM BUSES TO NETWORKS: [Friedman Harel:10] Note: This slide presents industry specific information does not relate to AMD NoC status
  • 15. 15 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 NOC CHALLENGES: CAD TOOLS  Capturing application traffic.  Which Topology ?  Mapping? Routes to use?  Fixing communication architecture : parameters.  Verification for correctness, performance.  Build models.  QoS under un-reliable conditions. Key to success: Automate & integrate the steps. Mesh Topology homogeneous systems, with regular tiles Customized Topology heterogeneous systems, with different cores & irregular FP Software Services Mapping, QoS, middleware... Architecture Packeting, buffering, flow control... Physical Implementation Synchronization, wires, power... CAD Tools
  • 16. 16 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 Synchronous Delay Insensitive Global None Timing Assumptions Less Detection Local Clocks, Interaction with data (becoming aperiodic)  A complete spectrum of approaches to system-timing exist [Mullins06-07] NOC CHALLENGES: BEYOND GLOBAL SYNCHRONY Delay Insensitive
  • 17. 3-D STACKING
  • 18. 18 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 3-D STACKING  Supporting Heterogeneous computing: high density, high performance, high memory B.W requirement.  3-D NoC’s option  Futuristic view: Integrating Bio-sensor Note: This slide presents industry specific information does not relate to AMD 3-D stacking status
  • 19. SOC TRENDS & CHALLENGES: 1. VERIFICATION EFFORT 2. IP INTEGRATION 3. TLM-RTL CO-SIMULATION
  • 20. 20 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 WHAT’S NEW IN SOC DESIGN?  Larger and more complex chips with heavy use of pre-existing cores.  Heavy use of multi core processors and DSPs.  Complex Interconnect.  Shorter time to market and Smaller design teams.  … and software.  Leads to: – Increased verification effort: Debugging is harder. – Integration is more difficult. – Need for scalable and high speed interconnect. – SW / HW co-simulation is a major issue. – Power –Performance challenge. – How do we treat the system software?
  • 21. 21 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 VERIFICATION EFFORT  Debugging – Seamless debug across h/w and software[especially SW]  Testbench Development: – Several methodologies  VMM,OVMUVM.  New developments [Unified strategy] – UCIS,UVM TLM2.0 – Coverage trend  Address Gaps in VHDL, System C coverage
  • 22. 22 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 VERIFICATION EFFORT  Creating/Running Testcase: – Direct & Random – Run time improvement Save-restore. Verification Cycle per second instead of Cycles per second: Configuring environment to dynamically select relevant design/core. Alternate options
  • 23. 23 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 Emulation Focus Areas: 1. Tests/regression run with Long run time 2. Corner case bugs that may escape traditional verification 3. Replicating System level scenarios Ongoing Initiatives/Need: 1.Seemless support for assertions. 2.Improve portability between Simulation & Emulation 3. Common model from TLM-HDL-Emulation VERIFICATION EFFORT  Alternate Options
  • 24. 24 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 IP INTEGRATION CHALLENGE  Integration of IP : – Multiple IP’s, various configurations, design languages – IP’s to be in Sync: macro’s , libraries. – Complexity increases with mixed language designs SYSTEM C SVLO G VERILOG VHDL Unique Strengths of Languages Diversity of Design Teams Importing Existing IP Legacy Testbench Environment
  • 25. 25 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 IP INTEGRATION CHALLENGE COMPARISON OF CHOICES Direct Instantiation SV Bind Construct SystemC Control/Observe SCV- Connect() SC-DPI Source Code Available Yes Yes Yes Yes Yes One IP Compiled Yes Yes Yes Yes Yes Both IP Compiled No Yes No No No Performance ++++ (3) +++ (2) + (1) + (1) +++++(4) Delta Delay Yes Yes No No No Languages Supported SV, SC, VHDL SV, SC, VHDL SC + SV/VHDL SC + SV/VHDL SC + SV Gap: No standardized automated methodology for integration. Recommended Approach: • Understand IP blocks: language, source code availability. • Understand connection: 1-1, distributed, method port • Option for optimized solution to quickly build a system
  • 26. 26 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 IP INTEGRATION CHALLENGE: GAPS WITH ANALOG IP INTEGRATION IN SOC Table1. Gaps with Analog IP Integration in SoC Gaps Root Cause Testchip setup -Testchip scenario is different -Tester used for testchip differs Inbuilt debug -Incomplete inbuilt SoC test/debug capability or derisk option for basic functionality such as PLL clock IP I/F verification -Incomplete test setup Review process -No common detailed review process between IP and SoC team. Incorrect assumption based on past analog IP working silicon IP Modelling -Mismtach in version between IP simulation model and spice netlist -Limitations of behavioral model to replicate actual analog IP functionality -Timing issue -DFT issue EDA tools -Gaps in analog and digital simulation environment
  • 27. 27 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011  Verification Environment Bring-up – Automated Assertions for early checks. – Review forces, tie-off and relevant checkers from IP to SoC – Bottleneck for SoC team to get started with verification: Option to use fake model for initial bring up. Usage of system model. – Super Block Concept: pre-verified IP blocks at similar frequency & interface  Requirement:  Current solution: In-house methodology and process. No clear solution from EDA vendors. IP INTEGRATION CHALLENGE IP Block1 IP Block2 Minimum Manual Effort Hookup Using ICU No BUGS!
  • 28. 28 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 TLM, RTL Co-simulation  Traditional use of System level models : Architecture profiling & Performance Analysis  Increasing Demand for Co-simulation: Tradeoff between Accuracy and Performance.  Open Challenges  Different level of Abstraction.  Need for improvement in Integration methodology and Test bench development  Seamless Debug and Coverage methodology.  Using System Level model for HDL generation  Legacy system model not written with conversion in mind.  Current limitation: Incomplete translation.  Lack of reliable Equivalence Check tool.  Need: Merge top down (SystemC) and bottom-up (System Verilog) methodology/flow.  Gaps/Work to do: How to do Power analysis
  • 29. 29 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 THANK YOU!
  • 30. 30 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 REFERENCES [1] Wilson Research Group-MGC study blog 2011. [2] AMD Coolchip2011 presentation. Denis Foley, AMD Sr. Fellow. [3] Fusion Processors and HPC-2011, Chuck Moore, AMD Corporate Fellow & Technology Group CTO [3] AMD Fusion Developer Summit 2011. Phil Rogers, AMD Corporate Fellow [4] Fully Asynchronous framework for GALS network on chip. Friedman H [5]Future of EE, NoC’s presentation. Dr. Srinivasan Murali [6] Analog IP integration in SoC, IP reuse’09. Mixed language IP integration DVCoN 2010. Extending Fucntional coverage to SystemC, VHDL-IP’10. Pankaj S
  • 31. 31 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 GLOSSARY  GPU – Graphics processing unit  APU: Accelerated Processing Unit  Open CL: Open Computing Language  TDP – Thermal Design power – a measure of a design infrastructure’s ability to cool a device  NoC: Network On Chip  TLM: Transaction Level Modeling  Turbo Core – AMD boost mechanism  QoS: Quality of Service  UVM: Universal Verification Methodology  UCIS: Unified Coverage Interoperability Standard
  • 32. 32 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 BACKUP
  • 33. 33 | 9th Intl. SoC Conference| Nov 2nd,3rd, 2011 Disclaimer The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD makes no representations or warranties with respect to the contents hereof and assumes no responsibility for any inaccuracies, errors or omissions that appear in this information. AMD specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. In no event will AMD be liable to any person for any direct, indirect, special or other consequential damages arising from the use of any information contained herein, even if AMD is expressly advised of the possibility of such damages. Trademark Attribution AMD, the AMD Arrow logo, AMD Athlon, AMD Phenom, AMD Turion, AMD Radeon, and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Microsoft, Windows and DirectX are registered trademarks of Microsoft Corporation in the United States and/or other jurisdictions. PCIe is a registered trademark of PCI-SIG. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. ©2011 Advanced Micro Devices, Inc. All rights reserved.