AMD Hot Chips Bulldozer & Bobcat Presentation

13,484 views

Published on

AMD will be revealing details of two new core architectures - Bulldozer and Bobcat - at Hot Chips 22

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
13,484
On SlideShare
0
From Embeds
0
Number of Embeds
2,672
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Before we start: There is a lot of technical detail available below what we are about to show you, this presentation is intended to give you a high level overview of both designs and AMD’s expectations for each. The engineering detail will be presented by the two chief architects for the designs at the upcoming HotChips conference on the Stanford Campus next week. Please feel free to ask detailed questions along the way if you would like to hear more about a specific feature or operation. At a higher level, this shows innovation at AMD remains alive and well. Please think of these core architectures within the context of the new, revitalized AMD built around our focus as a design company since the spin-off of GlobalFoundries, our new VISION platforms and marketing program, and our Fusion APU strategy. “Bobcat” and “Bulldozer” are the latest chapters in that story and form a solid foundation for AMD products for years to come.
  • The two cores, although both x86 compatible, are completely different for a reason. The workloads, end equipment markets and usage scenarios require different approaches and that’s what AMD recognized at the onset of this effort. Think of “Bulldozer”, just as the name implies, as the heavy lifter. It will appear in server, as well as mainstream and high performance client products. “Bobcat” is small and highly efficient. It utilizes those characteristics to address the highly portable netbook / notebook markets.So, 2 different designs, with different goals in mind.
  • So starting with Bulldozer, here’s a block diagram that shows its distinguishing features. We are taking 2 of the most frequently used parts of processor, the integer cores and adding a hefty, shared floating point capability to deliver 2 robust threads much more efficiently than Hyper-threading where a single integer core is used.We have also added a number of instruction set extensions to increase the design’s capabilities and done extensive work on power management to improve performance per watt even further.The 32nm process technology delivers additional savings in terms of area and power consumption; this our first process technology to utilize high-K metal gate.
  • The previous slide hinted to a key differentiator of Bulldozer that bears more explanation.A big conversation in the industry these last few years is how to continue to increase processing performance as we reach plateaus in clock speed.Essentially there have been two approaches used – SMT, which stands for Simultaneous Multi-Threading and CMP, which stands for Core Multi-Processing. CMP is probably the easiest to understand, because it can be described as “if one core is good, two must be better” and it is.. So CMP architectures take a complete core and replicate it.SMT is a little more complex to picture, but because of the way instructions are decoded and executed, it’s possible to have two concurrently tasks running on a single core.Bulldozer takes a third approach..
  • On the first Bulldozer slide we mentioned “true core functionality” – so what exactly does that mean. There are two complete integer units in the Bulldozer design for the most common type of compute tasks, so it functions like a dual-core design allowing maximum performance rather than pushing two threads through a single core. However, we don’t replicate everything on the core like a CMP either. Floating point operations on Bulldozer use a shared scheduler and two 128-bit Multiply and Accumulate Units. Extensive research went into analyzing workloads ahead of this design, so we feel the division between shared and discrete components is the right one. And by the way, the idea of sharing hardware is hardly new, right? Shared Cache, the Northbridge, etc. have been shared across multi-core designs for years already.
  • You can see that larger view of shared hardware components here as we raise our view up to the chip level. On an 8 core Bulldozer design you can see how Bulldozer “modules” are grouped together to share L3 cache and Northbridge, and combined with a memory controller and Northbridge controller to form the major components of the chip. And again, the OS and applications see true cores; the shared floating point components and L2 cache are transparent to the code.
  • So that covers Bulldozer, now let’s cover AMD’s new core design specifically for the low-power x86 market. “Bobcat” is small and highly efficient. It utilizes those characteristics to address the highly portable netbook / notebook markets.
  • Bobcat is a little bit more straight-forward to understand than Bulldozer, but it too, has some highly differentiated features to it. And these were stated from the very beginning because of AMD’s understanding of the final products requirements.
  • So those were the goals. Where did we end up? Bobcat can operate below one-watt (with a resulting reduction in performance) – that’s not a statement about any resulting products, but it does give you some sense of the core’s power envelope. The next bullets here are critical – out-of-order execution means higher performance than an in-order execution core like Atom, pure and simple. Synthesizeable means it uses few custom logic arrays that are more dependent on the specifics of the underlying manufacturing technology for optimal performance and that it can be more easily integrated into SoC designs for faster turnaround of new variations.No limitations on the instruction set either, including support for virtualization.AMD estimates 90% of today’s mainstream CPU performance in less than half the silicon area and a fraction of the power.Will appear early next year in Ontario, which is ahead of schedule.
  • Technical details if needed.
  • The need for optimal energy-efficient balance of CPU and GPU represents the beginning of a new era of computing in 2011, the era of the accelerated processing unit or APU, which combines both on a single piece of silicon.The Fusion of CPU and GPU compute power is what the next chapter in visual computing requires – a powerful visual computing experience at home or on the go without compromise. Our AMD Fusion™ design is driven by mobility and is based on a low-power visual compute architecture that will enhance active and resting battery life while increasing both CPU and GPU performance. This is the culmination of the vision of ‘One AMD’ and only AMD can deliver the GPU and CPU combination that will be the future of computing
  • AMD Hot Chips Bulldozer & Bobcat Presentation

    1. 1. “Bulldozer” and “Bobcat”<br />AMD’s Latest x86 Core Innovations<br />HotChips22 <br />
    2. 2. Two x86 Cores Tuned for Target Markets<br />Mainstream Client and Server Markets<br />“Bulldozer”<br />Performance & Scalability<br />Low PowerMarkets<br />Small<br />Die Area<br />Cloud Clients Optimized<br />“Bobcat”<br />Flexible, Low Power & Small<br />
    3. 3. The Bulldozer Architecture<br />“Bulldozer”<br />An innovative design that delivers true core functionality by pairing two integer execution cores with components that can be shared as needed<br />Instruction Set extensions to increase capability of the design<br />Extensive new power efficiency innovations<br />Manufactured on the latest 32nm SOI technology<br />Fetch<br />Decode<br />IntegerScheduler<br />IntegerScheduler<br />FP Scheduler<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />128-bitFMAC<br />128-bitFMAC<br />L1 DCache<br />L1 DCache<br />Shared L2 Cache<br />
    4. 4. Approaches for Supporting Multiple Threads<br />SMT<br /><ul><li>Force two threads into one core
    5. 5. Threads compete for resources
    6. 6. Relies on under- utilization </li></ul>CMP<br /><ul><li>Dedicated cores for each thread
    7. 7. Traditional brute force approach
    8. 8. Each core is over- provisioned</li></ul>However, there is another way . . .<br />
    9. 9. Bulldozer: Two Strong Threads<br />Hyperthreaded, single-core chip<br />“Bulldozer”<br />Fetch<br />Fetch<br />Decode<br />Decode<br />IntegerScheduler<br />IntegerScheduler<br />IntegerScheduler<br />FP Scheduler<br />FP Scheduler<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />CORE 1<br />128-bitFMAC<br />128-bitFMAC<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />128-bitFMAC<br />128-bitFMAC<br />L1 DCache<br />L1 DCache<br />L1 DCache<br />Shared L2 Cache<br />L2 Cache<br />
    10. 10. DedicatedComponents<br />Shared at the module level<br />Shared at the chip level<br />Sharing Resources<br />Fetch<br />The Bulldozer architecture has shared and dedicated components<br />The shared components:<br />Help reduce power consumption<br />Help reduce die space (cost)<br />The dedicated components:<br />Help increase performance and scalability<br />Bulldozer dynamically switches between shared and dedicated components to maximize performance per watt<br />Decode<br />FP Scheduler<br />IntScheduler<br />IntScheduler<br />Core 1<br />Core 2<br />L1 DCache<br />L1 DCache<br />128-bit FMAC<br />128-bit FMAC<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Shared L2 Cache<br />Shared L3 Cache and NB<br />
    11. 11. Building a Bulldozer-Based Chip<br />Fetch<br />Decode<br />IntScheduler<br />IntScheduler<br />FP Scheduler<br />Shared L3 Cache and NB<br />Integrated Memory Controller<br />Integrated Northbridge Controller<br />Each chip is composed of multiple bulldozer modules<br />Module divisions are transparent to shared hardware, operating system or application<br />The modular architecture speeds chip development and increases product flexibility<br />
    12. 12. Bulldozer Summary<br />“Bulldozer”<br />Bulldozer is the next generation of AMD high-performance processor core technology<br />This new core is a completely new design from the ground up<br />Bulldozer will be utilized in client and server designs in 2011<br />AMD delivers 33% more cores and an estimated 50% increase in throughput in the same power envelope as Magny-Cours*<br />Fetch<br />Decode<br />IntegerScheduler<br />IntegerScheduler<br />FP Scheduler<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />Pipeline<br />128-bitFMAC<br />128-bitFMAC<br />L1 DCache<br />L1 DCache<br />Shared L2 Cache<br />*Based on internal AMD modeling using benchmark simulations<br />
    13. 13. Two x86 Cores Tuned for Target Markets<br />Mainstream Client and Server Markets<br />“Bulldozer”<br />Performance & Scalability<br />Low PowerMarkets<br />Small<br />Die Area<br />Cloud Clients Optimized<br />“Bobcat”<br />Flexible, Low Power & Small<br />
    14. 14. Bobcat Design Goals<br />A small, efficient, low power x86 core<br />Excellent performance<br />Synthesizable with small number of custom arrays<br />Easily Portable across process technologies<br />
    15. 15. “Bobcat” x86 Core: Small, Efficient and Strong<br />“Bobcat” Core<br /><ul><li>Sub one-watt capable core
    16. 16. Out-of-order execution engine
    17. 17. Synthesizable / Easy to Reuse
    18. 18. Complete ISA support
    19. 19. SSE1-3 and virtualization
    20. 20. Estimated90% of today’s mainstream performance in less than half of the silicon area*
    21. 21. 2011 / notebook APU / “Ontario”</li></ul>L1 Icache<br />Fetch<br />Decode<br />IntScheduler<br />FP Scheduler<br />I-Pipe<br />I-Pipe<br />Ld-Pipe<br />St-Pipe<br />A-Pipe<br />M-Pipe<br />L1 DCache<br />L2 Cache<br />*Based on internal AMD modeling using benchmark simulations<br />
    22. 22. Bobcat Core Overview<br />Advanced Micro-architecture<br />Dual x86 Decode<br />Advanced Branch Predictor<br />Full OOO instruction execution<br />Full OOO load/store engine<br />High Performance Floating Point<br />AMD64 64-bit ISA<br />SSE1,2,3, SSSE3 ISA<br />Secure Virtualization <br />32kb L1s<br />Low Power Design<br />Power Optimized Execution<br />Micro-architecture that minimizes data movement and unnecessary reads<br />Clock gating, Power gating<br />System Low Power States<br />Small Core<br />Area efficient balance of high performance and low power<br />ICACHE<br />L2<br />Bobcat <br />Low <br />Power<br />Core<br />Fetch<br />BU<br />Decode<br />FP <br />Scheduler<br />Address<br />Scheduler<br />Integer<br />Scheduler<br />A<br />Pipe<br />M<br />Pipe<br />I<br />Pipe<br />Store<br />Pipe<br />I<br />Pipe<br />Load<br />Pipe<br />DCACHE<br />
    23. 23. Entering the AMD Fusion Processor Era<br /><ul><li>Bobcat is the CPU on “Ontario”, AMD’s first APU</li></ul>APU:<br /><ul><li>Combination of CPU and programmable GPU architectures for high-performance heterogeneous compute capability
    24. 24. High-speed bus architecture
    25. 25. Shared, low-latency memory model
    26. 26. Single die design</li></ul>System Memory<br />SIMD<br />Engine<br />Array<br />X86 CPU Cores<br />High Performance Bus&Memory Controller<br />Unified Video Decoder<br />Platform Interfaces<br />
    27. 27. Bobcat Summary<br />Bobcat is the CPU engine for AMD’s first APU<br />Estimate 90% of the performance of AMD’s current mainstream notebook CPU in less than half the area and a fraction of the power*<br />Highly portable across designs and manufacturing technologies<br />Sub-one watt capable core<br />*Based on internal AMD modeling using benchmark simulations<br />
    28. 28. Disclaimer & Attribution<br />DISCLAIMER<br />The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.<br />The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to changes to the AMD Fusion Partner Program. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.<br />AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.<br />AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.<br />ATTRIBUTION<br />© 2010 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ATI, the ATI logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other names are for informational purposes only and may be trademarks of their respective owners.<br />

    ×