AMD Hot Chips Bulldozer & Bobcat Presentation

  • 12,460 views
Uploaded on

AMD will be revealing details of two new core architectures - Bulldozer and Bobcat - at Hot Chips 22

AMD will be revealing details of two new core architectures - Bulldozer and Bobcat - at Hot Chips 22

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
12,460
On Slideshare
0
From Embeds
0
Number of Embeds
8

Actions

Shares
Downloads
0
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Before we start: There is a lot of technical detail available below what we are about to show you, this presentation is intended to give you a high level overview of both designs and AMD’s expectations for each. The engineering detail will be presented by the two chief architects for the designs at the upcoming HotChips conference on the Stanford Campus next week. Please feel free to ask detailed questions along the way if you would like to hear more about a specific feature or operation. At a higher level, this shows innovation at AMD remains alive and well. Please think of these core architectures within the context of the new, revitalized AMD built around our focus as a design company since the spin-off of GlobalFoundries, our new VISION platforms and marketing program, and our Fusion APU strategy. “Bobcat” and “Bulldozer” are the latest chapters in that story and form a solid foundation for AMD products for years to come.
  • The two cores, although both x86 compatible, are completely different for a reason. The workloads, end equipment markets and usage scenarios require different approaches and that’s what AMD recognized at the onset of this effort. Think of “Bulldozer”, just as the name implies, as the heavy lifter. It will appear in server, as well as mainstream and high performance client products. “Bobcat” is small and highly efficient. It utilizes those characteristics to address the highly portable netbook / notebook markets.So, 2 different designs, with different goals in mind.
  • So starting with Bulldozer, here’s a block diagram that shows its distinguishing features. We are taking 2 of the most frequently used parts of processor, the integer cores and adding a hefty, shared floating point capability to deliver 2 robust threads much more efficiently than Hyper-threading where a single integer core is used.We have also added a number of instruction set extensions to increase the design’s capabilities and done extensive work on power management to improve performance per watt even further.The 32nm process technology delivers additional savings in terms of area and power consumption; this our first process technology to utilize high-K metal gate.
  • The previous slide hinted to a key differentiator of Bulldozer that bears more explanation.A big conversation in the industry these last few years is how to continue to increase processing performance as we reach plateaus in clock speed.Essentially there have been two approaches used – SMT, which stands for Simultaneous Multi-Threading and CMP, which stands for Core Multi-Processing. CMP is probably the easiest to understand, because it can be described as “if one core is good, two must be better” and it is.. So CMP architectures take a complete core and replicate it.SMT is a little more complex to picture, but because of the way instructions are decoded and executed, it’s possible to have two concurrently tasks running on a single core.Bulldozer takes a third approach..
  • On the first Bulldozer slide we mentioned “true core functionality” – so what exactly does that mean. There are two complete integer units in the Bulldozer design for the most common type of compute tasks, so it functions like a dual-core design allowing maximum performance rather than pushing two threads through a single core. However, we don’t replicate everything on the core like a CMP either. Floating point operations on Bulldozer use a shared scheduler and two 128-bit Multiply and Accumulate Units. Extensive research went into analyzing workloads ahead of this design, so we feel the division between shared and discrete components is the right one. And by the way, the idea of sharing hardware is hardly new, right? Shared Cache, the Northbridge, etc. have been shared across multi-core designs for years already.
  • You can see that larger view of shared hardware components here as we raise our view up to the chip level. On an 8 core Bulldozer design you can see how Bulldozer “modules” are grouped together to share L3 cache and Northbridge, and combined with a memory controller and Northbridge controller to form the major components of the chip. And again, the OS and applications see true cores; the shared floating point components and L2 cache are transparent to the code.
  • So that covers Bulldozer, now let’s cover AMD’s new core design specifically for the low-power x86 market. “Bobcat” is small and highly efficient. It utilizes those characteristics to address the highly portable netbook / notebook markets.
  • Bobcat is a little bit more straight-forward to understand than Bulldozer, but it too, has some highly differentiated features to it. And these were stated from the very beginning because of AMD’s understanding of the final products requirements.
  • So those were the goals. Where did we end up? Bobcat can operate below one-watt (with a resulting reduction in performance) – that’s not a statement about any resulting products, but it does give you some sense of the core’s power envelope. The next bullets here are critical – out-of-order execution means higher performance than an in-order execution core like Atom, pure and simple. Synthesizeable means it uses few custom logic arrays that are more dependent on the specifics of the underlying manufacturing technology for optimal performance and that it can be more easily integrated into SoC designs for faster turnaround of new variations.No limitations on the instruction set either, including support for virtualization.AMD estimates 90% of today’s mainstream CPU performance in less than half the silicon area and a fraction of the power.Will appear early next year in Ontario, which is ahead of schedule.
  • Technical details if needed.
  • The need for optimal energy-efficient balance of CPU and GPU represents the beginning of a new era of computing in 2011, the era of the accelerated processing unit or APU, which combines both on a single piece of silicon.The Fusion of CPU and GPU compute power is what the next chapter in visual computing requires – a powerful visual computing experience at home or on the go without compromise. Our AMD Fusion™ design is driven by mobility and is based on a low-power visual compute architecture that will enhance active and resting battery life while increasing both CPU and GPU performance. This is the culmination of the vision of ‘One AMD’ and only AMD can deliver the GPU and CPU combination that will be the future of computing

Transcript

  • 1. “Bulldozer” and “Bobcat”
    AMD’s Latest x86 Core Innovations
    HotChips22
  • 2. Two x86 Cores Tuned for Target Markets
    Mainstream Client and Server Markets
    “Bulldozer”
    Performance & Scalability
    Low PowerMarkets
    Small
    Die Area
    Cloud Clients Optimized
    “Bobcat”
    Flexible, Low Power & Small
  • 3. The Bulldozer Architecture
    “Bulldozer”
    An innovative design that delivers true core functionality by pairing two integer execution cores with components that can be shared as needed
    Instruction Set extensions to increase capability of the design
    Extensive new power efficiency innovations
    Manufactured on the latest 32nm SOI technology
    Fetch
    Decode
    IntegerScheduler
    IntegerScheduler
    FP Scheduler
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    128-bitFMAC
    128-bitFMAC
    L1 DCache
    L1 DCache
    Shared L2 Cache
  • 4. Approaches for Supporting Multiple Threads
    SMT
    • Force two threads into one core
    • 5. Threads compete for resources
    • 6. Relies on under- utilization
    CMP
    • Dedicated cores for each thread
    • 7. Traditional brute force approach
    • 8. Each core is over- provisioned
    However, there is another way . . .
  • 9. Bulldozer: Two Strong Threads
    Hyperthreaded, single-core chip
    “Bulldozer”
    Fetch
    Fetch
    Decode
    Decode
    IntegerScheduler
    IntegerScheduler
    IntegerScheduler
    FP Scheduler
    FP Scheduler
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    CORE 1
    128-bitFMAC
    128-bitFMAC
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    128-bitFMAC
    128-bitFMAC
    L1 DCache
    L1 DCache
    L1 DCache
    Shared L2 Cache
    L2 Cache
  • 10. DedicatedComponents
    Shared at the module level
    Shared at the chip level
    Sharing Resources
    Fetch
    The Bulldozer architecture has shared and dedicated components
    The shared components:
    Help reduce power consumption
    Help reduce die space (cost)
    The dedicated components:
    Help increase performance and scalability
    Bulldozer dynamically switches between shared and dedicated components to maximize performance per watt
    Decode
    FP Scheduler
    IntScheduler
    IntScheduler
    Core 1
    Core 2
    L1 DCache
    L1 DCache
    128-bit FMAC
    128-bit FMAC
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Shared L2 Cache
    Shared L3 Cache and NB
  • 11. Building a Bulldozer-Based Chip
    Fetch
    Decode
    IntScheduler
    IntScheduler
    FP Scheduler
    Shared L3 Cache and NB
    Integrated Memory Controller
    Integrated Northbridge Controller
    Each chip is composed of multiple bulldozer modules
    Module divisions are transparent to shared hardware, operating system or application
    The modular architecture speeds chip development and increases product flexibility
  • 12. Bulldozer Summary
    “Bulldozer”
    Bulldozer is the next generation of AMD high-performance processor core technology
    This new core is a completely new design from the ground up
    Bulldozer will be utilized in client and server designs in 2011
    AMD delivers 33% more cores and an estimated 50% increase in throughput in the same power envelope as Magny-Cours*
    Fetch
    Decode
    IntegerScheduler
    IntegerScheduler
    FP Scheduler
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    Pipeline
    128-bitFMAC
    128-bitFMAC
    L1 DCache
    L1 DCache
    Shared L2 Cache
    *Based on internal AMD modeling using benchmark simulations
  • 13. Two x86 Cores Tuned for Target Markets
    Mainstream Client and Server Markets
    “Bulldozer”
    Performance & Scalability
    Low PowerMarkets
    Small
    Die Area
    Cloud Clients Optimized
    “Bobcat”
    Flexible, Low Power & Small
  • 14. Bobcat Design Goals
    A small, efficient, low power x86 core
    Excellent performance
    Synthesizable with small number of custom arrays
    Easily Portable across process technologies
  • 15. “Bobcat” x86 Core: Small, Efficient and Strong
    “Bobcat” Core
    • Sub one-watt capable core
    • 16. Out-of-order execution engine
    • 17. Synthesizable / Easy to Reuse
    • 18. Complete ISA support
    • 19. SSE1-3 and virtualization
    • 20. Estimated90% of today’s mainstream performance in less than half of the silicon area*
    • 21. 2011 / notebook APU / “Ontario”
    L1 Icache
    Fetch
    Decode
    IntScheduler
    FP Scheduler
    I-Pipe
    I-Pipe
    Ld-Pipe
    St-Pipe
    A-Pipe
    M-Pipe
    L1 DCache
    L2 Cache
    *Based on internal AMD modeling using benchmark simulations
  • 22. Bobcat Core Overview
    Advanced Micro-architecture
    Dual x86 Decode
    Advanced Branch Predictor
    Full OOO instruction execution
    Full OOO load/store engine
    High Performance Floating Point
    AMD64 64-bit ISA
    SSE1,2,3, SSSE3 ISA
    Secure Virtualization
    32kb L1s
    Low Power Design
    Power Optimized Execution
    Micro-architecture that minimizes data movement and unnecessary reads
    Clock gating, Power gating
    System Low Power States
    Small Core
    Area efficient balance of high performance and low power
    ICACHE
    L2
    Bobcat
    Low
    Power
    Core
    Fetch
    BU
    Decode
    FP
    Scheduler
    Address
    Scheduler
    Integer
    Scheduler
    A
    Pipe
    M
    Pipe
    I
    Pipe
    Store
    Pipe
    I
    Pipe
    Load
    Pipe
    DCACHE
  • 23. Entering the AMD Fusion Processor Era
    • Bobcat is the CPU on “Ontario”, AMD’s first APU
    APU:
    • Combination of CPU and programmable GPU architectures for high-performance heterogeneous compute capability
    • 24. High-speed bus architecture
    • 25. Shared, low-latency memory model
    • 26. Single die design
    System Memory
    SIMD
    Engine
    Array
    X86 CPU Cores
    High Performance Bus&Memory Controller
    Unified Video Decoder
    Platform Interfaces
  • 27. Bobcat Summary
    Bobcat is the CPU engine for AMD’s first APU
    Estimate 90% of the performance of AMD’s current mainstream notebook CPU in less than half the area and a fraction of the power*
    Highly portable across designs and manufacturing technologies
    Sub-one watt capable core
    *Based on internal AMD modeling using benchmark simulations
  • 28. Disclaimer & Attribution
    DISCLAIMER
    The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
    The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to changes to the AMD Fusion Partner Program. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
    AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
    AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
    ATTRIBUTION
    © 2010 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ATI, the ATI logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other names are for informational purposes only and may be trademarks of their respective owners.