This document provides an overview of the AMD EPYCTM microprocessor architecture. It discusses the key tenets of the EPYC processor design including the "Zen" CPU core, virtualization and security features, high per-socket capability through its multi-chip module (MCM) design, high bandwidth fabric interconnect, large memory capacity and disruptive I/O capabilities. It also details the microarchitecture of the "Zen" core and how it was designed and optimized for data center workloads.
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
It is common in the HPC community that the achieved performance with just CPUs is limited for many computational cases. The EuroHPC pre-exascale and the coming exascale systems are mainly focused on accelerators, and some of the largest upcoming supercomputers such as LUMI and Frontier will be powered by AMD Instinct accelerators. However, these new systems create many challenges for developers who are not familiar with the new ecosystem or with the required programming models that can be used to program for heterogeneous architectures. In this paper, we present some of the more well-known programming models to program for current and future GPU systems. We then measure the performance of each approach using a benchmark and a mini-app, test with various compilers, and tune the codes where necessary. Finally, we compare the performance, where possible, between the NVIDIA Volta (V100), Ampere (A100) GPUs, and the AMD MI100 GPU.
Presentation of a paper accepted in Supercomputing Frontiers Asia 2022
Shared Memory Centric Computing with CXL & OMIAllan Cantle
Discusses how CXL can be better utilized as a separate Fabric Cache domain to a processors own Local Cache Domain. This is done by leveraging a Shared Memory Centric architectures that utilize both the Open Memory Interface OMI, and Compute eXpress Link, CXL, for the memory ports.
If AMD Adopted OMI in their EPYC ArchitectureAllan Cantle
AMD's EPYC Architecture has paved the way forward towards Heterogeneous Data Centric Computing, but it is still limited by it's parallel DDR interfaces. This presentation shows the potential for the EPYC architecture if it adopted the Open Memory Interface, OMI, for it's Near Memory interface.
eMMC 5.0 is the latest generation of embedded NAND Flash IP. Arasan provides a complete solution including digital controllers for host and device, the mixed PHY I/O and pads, software drivers, hardware validation and support.
AMD has been away from the HPC space for a while, but now they are coming back in a big way with an open software approach to GPU computing. The Radeon Open Compute Platform (ROCm) was born from the Boltzman Initiative announced last year at SC15. Now available on GitHub, the ROCm Platform bringing a rich foundation to advanced computing by better integrating the CPU and GPU to solve real-world problems.
"We are excited to present ROCm, the first open-source HPC/ultrascale-class platform for GPU computing that’s also programming-language independent. We are bringing the UNIX philosophy of choice, minimalism and modular software development to GPU computing. The new ROCm foundation lets you choose or even develop tools and a language run time for your application."
Watch the video presentation: http://wp.me/p3RLHQ-fJT
Learn more: https://radeonopencompute.github.io/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Universal Flash Storage is an upcoming memory specification for use in mobile phones, tablets and other consumer electronics devices.
It is the successor of Embedded Multimedia controller (eMMC) that currently prevails and will be available as storage in on-chip and expandable form (in the form of memory cards).
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldAllan Cantle
These slides are part of a "Trends in Memory Desegregation" Webinar published in March 2021. You can see the webinar recording here https://youtu.be/g0QEX5qE8kE.
The presentation slides show how the Open Memory Interface, OMI , is a critical System Architecture building block towards our industry being able to easily build Domain Specific Architectures of the future as defined by the gods of Computing Architecture John Hennessy and David Patterson.
MIPI DevCon 2021: Meeting the Needs of Next-Generation Displays with a High-P...MIPI Alliance
Presented by Alain Legault, Hardent Inc.; Joe Rodriguez, Rambus Inc.; and Justin Endo, Mixel, Inc.
Next-generation display applications have an insatiable appetite for bandwidth. Using a combination of VESA Display Stream Compression (DSC) and MIPI DSI-2℠ technology, designers can achieve display resolutions up to 8K without compromise to video quality, battery life or cost. This presentation discusses a fully integrated, off-the-shelf display IP subsystem solution, consisting of Mixel (MIPI C-PHY℠/D-PHY℠ combo), Rambus (MIPI DSI-2® controller) and Hardent (VESA DSC) IP, that can deliver this state-of-the-art performance in a power-efficient and compact footprint.
During the CXL Forum at OCP Global Summit, SMART Modular Director Product Marketing Arthur Sainio, provides an overview of the company's CXL memory cards and modules.
High Performance Object Storage in 30 Minutes with Supermicro and MinIORebekah Rodriguez
The Supermicro Cloud DC is the perfect combination of performance, reliability, craftsmanship and flexibility for deploying MinIO object storage. MinIO on the Cloud DC platform outperforms and is more cost-effective than equivalently-sized hardware from other manufacturers. We recently benchmarked a cluster of four Cloud DC servers with NVMe drives and measured an impressive 42.57 GB/s average read (GET) throughput and 24.69 GB/s average write (PUT) throughput. This first class performance demonstrates that MinIO on Supermicro Cloud DC is a compelling solution for object storage intensive workloads such as advanced analytics, AI/ML and other modern, cloud-native applications.
In this webinar, you will learn:
Best use cases and deployment considerations for MinIO object storage
How to design and size a MinIO object storage cluster on Supermicro Cloud DC
How to deploy a distributed MinIO cluster onto a Cloud DC server cluster
Watch the Webinar: https://www.brighttalk.com/webcast/17278/519401
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
It is common in the HPC community that the achieved performance with just CPUs is limited for many computational cases. The EuroHPC pre-exascale and the coming exascale systems are mainly focused on accelerators, and some of the largest upcoming supercomputers such as LUMI and Frontier will be powered by AMD Instinct accelerators. However, these new systems create many challenges for developers who are not familiar with the new ecosystem or with the required programming models that can be used to program for heterogeneous architectures. In this paper, we present some of the more well-known programming models to program for current and future GPU systems. We then measure the performance of each approach using a benchmark and a mini-app, test with various compilers, and tune the codes where necessary. Finally, we compare the performance, where possible, between the NVIDIA Volta (V100), Ampere (A100) GPUs, and the AMD MI100 GPU.
Presentation of a paper accepted in Supercomputing Frontiers Asia 2022
Shared Memory Centric Computing with CXL & OMIAllan Cantle
Discusses how CXL can be better utilized as a separate Fabric Cache domain to a processors own Local Cache Domain. This is done by leveraging a Shared Memory Centric architectures that utilize both the Open Memory Interface OMI, and Compute eXpress Link, CXL, for the memory ports.
If AMD Adopted OMI in their EPYC ArchitectureAllan Cantle
AMD's EPYC Architecture has paved the way forward towards Heterogeneous Data Centric Computing, but it is still limited by it's parallel DDR interfaces. This presentation shows the potential for the EPYC architecture if it adopted the Open Memory Interface, OMI, for it's Near Memory interface.
eMMC 5.0 is the latest generation of embedded NAND Flash IP. Arasan provides a complete solution including digital controllers for host and device, the mixed PHY I/O and pads, software drivers, hardware validation and support.
AMD has been away from the HPC space for a while, but now they are coming back in a big way with an open software approach to GPU computing. The Radeon Open Compute Platform (ROCm) was born from the Boltzman Initiative announced last year at SC15. Now available on GitHub, the ROCm Platform bringing a rich foundation to advanced computing by better integrating the CPU and GPU to solve real-world problems.
"We are excited to present ROCm, the first open-source HPC/ultrascale-class platform for GPU computing that’s also programming-language independent. We are bringing the UNIX philosophy of choice, minimalism and modular software development to GPU computing. The new ROCm foundation lets you choose or even develop tools and a language run time for your application."
Watch the video presentation: http://wp.me/p3RLHQ-fJT
Learn more: https://radeonopencompute.github.io/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Universal Flash Storage is an upcoming memory specification for use in mobile phones, tablets and other consumer electronics devices.
It is the successor of Embedded Multimedia controller (eMMC) that currently prevails and will be available as storage in on-chip and expandable form (in the form of memory cards).
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldAllan Cantle
These slides are part of a "Trends in Memory Desegregation" Webinar published in March 2021. You can see the webinar recording here https://youtu.be/g0QEX5qE8kE.
The presentation slides show how the Open Memory Interface, OMI , is a critical System Architecture building block towards our industry being able to easily build Domain Specific Architectures of the future as defined by the gods of Computing Architecture John Hennessy and David Patterson.
MIPI DevCon 2021: Meeting the Needs of Next-Generation Displays with a High-P...MIPI Alliance
Presented by Alain Legault, Hardent Inc.; Joe Rodriguez, Rambus Inc.; and Justin Endo, Mixel, Inc.
Next-generation display applications have an insatiable appetite for bandwidth. Using a combination of VESA Display Stream Compression (DSC) and MIPI DSI-2℠ technology, designers can achieve display resolutions up to 8K without compromise to video quality, battery life or cost. This presentation discusses a fully integrated, off-the-shelf display IP subsystem solution, consisting of Mixel (MIPI C-PHY℠/D-PHY℠ combo), Rambus (MIPI DSI-2® controller) and Hardent (VESA DSC) IP, that can deliver this state-of-the-art performance in a power-efficient and compact footprint.
During the CXL Forum at OCP Global Summit, SMART Modular Director Product Marketing Arthur Sainio, provides an overview of the company's CXL memory cards and modules.
High Performance Object Storage in 30 Minutes with Supermicro and MinIORebekah Rodriguez
The Supermicro Cloud DC is the perfect combination of performance, reliability, craftsmanship and flexibility for deploying MinIO object storage. MinIO on the Cloud DC platform outperforms and is more cost-effective than equivalently-sized hardware from other manufacturers. We recently benchmarked a cluster of four Cloud DC servers with NVMe drives and measured an impressive 42.57 GB/s average read (GET) throughput and 24.69 GB/s average write (PUT) throughput. This first class performance demonstrates that MinIO on Supermicro Cloud DC is a compelling solution for object storage intensive workloads such as advanced analytics, AI/ML and other modern, cloud-native applications.
In this webinar, you will learn:
Best use cases and deployment considerations for MinIO object storage
How to design and size a MinIO object storage cluster on Supermicro Cloud DC
How to deploy a distributed MinIO cluster onto a Cloud DC server cluster
Watch the Webinar: https://www.brighttalk.com/webcast/17278/519401
XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...The Linux Foundation
We have been working to get Xen up and running on self-boot Intel® Xeon Phi processors to build HPC clouds. We see several challenges because of the unique (but not unusual for HPC) hardware technologies and performance requirements. For example, such hardware technologies include 1) >256 CPUs, 2) MCDRAM (high-bandwidth memory), 3) integrated fabric (i.e. Intel® Omni-Path). Unlike the “coprocessor“ model, supporting self-boot with >256 CPUs has various implications to Xen, including scheduling and scalability. We need to allow user applications to use MCDRAM directly to perform optimally. Also, we need to enable the integrated HPC fabric for the VM to use by direct I/O assignment.
In addition, we have only a single VM on each node to meet the high-performance requirements of HPC clouds. This (i.e. non-shared) model allowed us to optimize Xen more. In this talk, we share our design and lessons, and discuss the options we considered to achieve high-performance virtualization for HPC.
High-Density Top-Loading Storage for Cloud Scale Applications Rebekah Rodriguez
In this webinar, we will discuss how high-capacity Top-Loading Storage systems are being used for enterprise and cloud scale applications and will identify the key features of the modular architecture for use in today’s software defined storage (SDS) environments. - https://www.brighttalk.com/webcast/17278/527798
POLYTEDA LLC a provider of semiconductor design software and PV-services, announced the general availability of PowerDRC/LVS version 2.0.1. This release is dedicated to delivering further significant improvements for multi-CPU mode and some new LVS functionality. From now XOR operation supports multi-CPU mode to dramatically increase performance
TitanIC presented, "ODSA Use Case - SmartNIC," at the ODSA Workshop. The charter of the ODSA (Open Domain Specification Architecture) Workgroup is to define an open specification that enables building of Domain Specific Accelerator silicon using best-of-breed components from the industry made available as chiplet dies that can be integrated together as Lego blocks on an organic substrate packaging layer. The resulting multi-chip module (MCM) silicon can be produced at significantly lower development and manufacturing costs, and will deliver much needed performance per watt and performance per dollar efficiencies in networking, security, machine learning and other applications. The ODSA Workgroup also intends to deliver implementations of the specification as board-level prototypes, RTL code and libraries.
The Power of HPC with Next Generation Supermicro Systems Rebekah Rodriguez
Witness the astonishing improvement in performance and security with the next new generation of Supermicro platforms. New Supermicro systems deliver unprecedented levels of compute power for the most challenging high-performance workloads. In this Supercomputing roundtable, learn how the new Supermicro products provide a differentiated advantage for early adopters of the most advanced accelerated computing infrastructure in the world.
C++ Programming and the Persistent Memory Developers KitIntel® Software
Topics
Introduction to Persistent Memory
Introduction to Persistent Memory Developers Kit (PMDK)
Working with PMDK
Persistent Memory Programming with PMDK C++ Bindings
Heterogeneous Computing : The Future of SystemsAnand Haridass
Charts from NITK-IBM Computer Systems Research Group (NCSRG)
- Dennard Scaling,Moore's Law, OpenPOWER, Storage Class Memory, FPGA, GPU, CAPI, OpenCAPI, nVidia nvlink, Google Microsoft Heterogeneous system usage
The Supermicro X12 product line, powered by 3rd Gen Intel® Xeon® Scalable processors, contains many innovations that gives organizations more performance for a variety of workloads.
Join this webinar to learn more about the outstanding performance you can get by using Supermicro X12 servers and storage systems using the latest technologies from Intel®.
Watch the webinar: https://www.brighttalk.com/webcast/17278/514618
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerRebekah Rodriguez
In this webinar, members of the Server Solution Team as well as a member of Supermicro’s Product Office will discuss Supermicro’s Universal GPU Server, the server’s modular, standards-based design, the important role of OCP Accelerator Module (OAM) form factor, and Universal Baseboard (UBB) in the system, as well as touching on AMD's next generation HPC accelerator. In addition, we will get some insights into trends in the HPC and AI/Machine Learning space, including the different software platforms and best practices that are driving innovation in our industry and daily lives. In particular: • Tools to enable use of the high performance hardware for HPC and Deep Learning applications • Tools to enable use of multiple GPUs, including RDMA, to solve highly demanding HPC and deep learning models, such as BERT • Running applications in containers with AMD’s next generation GPU system
Big Data Uses with Distributed Asynchronous Object StorageIntel® Software
Learn about the architecture and features of Distributed Asynchronous Object Storage (DAOS). This open source object store is based on the Persistent Memory Development Kit (PMDK) for massively distributed non-volatile memory applications.
Similar to AMD EPYC™ Microprocessor Architecture (20)
Race to Reality: The Next Billion-People Market OpportunityAMD
On September 3rd, 2016 at IFA Berlin, Mark Papermaster, Chief Technology Officer AMD provided unique insights into the new era of Virtual Reality: "Race to Reality - The Next Billion-People Market Opportunity”.
GPU compute has leveraged discrete GPUs for a fairly limited set of academic and supercomputing system workloads until recently. With the increase in performance of integrated GPU inside an Accelerated Processing Unit (APU), introduction of Heterogeneous System Architecture (HSA) devices, and proliferation of programming tools, we are seeing GPU compute make its way into mainstream applications. In this presentation we cover GPU compute and HSA, focusing on the application of GPU compute in the Medical and Print Imaging segments. Examples of performance data are reviewed and the case is made for how GPU compute can deliver tangible benefits.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
AMD EPYC™ Microprocessor Architecture
1. AMD EPYCTM Microprocessor Architecture
JAY FLEISCHMAN, NOAH BECK, KEVIN LEPAK, MICHAEL CLARK
PRESENTED BY: JAY FLEISCHMAN | SENIOR FELLOW
2. 2 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to,
the timing, features, functionality, availability, expectations, performance, and benefits of AMD's EPYCTM products , which are made
pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are
commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with
similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on current beliefs,
assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that could cause
actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and
uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other
future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and
statements. Investors are urged to review in detail the risks and uncertainties in AMD's Securities and Exchange Commission filings,
including but not limited to AMD's Quarterly Report on Form 10-K for the quarter ended December 30, 2017.
CAUTIONARY STATEMENT
3. 3 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
IMPROVING TCO FOR ENTERPRISE AND MEGA DATACENTER
Key Tenets of EPYC™ Processor Design
CPU Performance ▪ “Zen”—high performance x86 core with Server features
Virtualization, RAS, and Security
▪ ISA, Scale-out microarchitecture, RAS
▪ Memory Encryption, no application mods: security you can use
Per-socket Capability
▪ Maximize customer value of platform investment
▪ MCM – allows silicon area > reticle area; enables feature set
▪ MCM – full product stack w/single design, smaller die, better yield
Fabric Interconnect
▪ Cache coherent, high bandwidth, low latency
▪ Balanced bandwidth within socket, socket-to-socket
Memory
▪ High memory bandwidth for throughput
▪ 16 DIMM slots/socket for capacity
I/O
▪ Disruptive 1 Processor (1P), 2P, Accelerator + Storage capability
▪ Integrated Server Controller Hub (SCH)
Power
▪ Performance or Power Determinism, Configurable TDP
▪ Precision Boost, throughput workload power management
4. 4 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
Decode
4 instructions/cycle
512K L2
(I+D) Cache
8 Way
ADD MUL ADDMULALU
2 loads + 1 store
per cycle
6 ops dispatched
Op Cache
INTEGER FLOATING POINT
8 micro-ops/cycle
ALU ALU ALU
Micro-op Queue
64K I-Cache
4 way
Branch Prediction
AGUAGU
Load/Store
Queues
Integer Physical Register File
32K D-Cache
8 Way
FP Register File
Integer Rename Floating Point Rename
Scheduler Scheduler Scheduler Scheduler SchedulerScheduler Scheduler
“ZEN” MICROARCHITECTURE
▪ Fetch Four x86 instructions
▪ Op Cache 2K instructions
▪ 4 Integer units
‒ Large rename space – 168 Registers
‒ 192 instructions in flight/8 wide retire
▪ 2 Load/Store units
‒ 72 Out-of-Order Loads supported
‒ Smart prefetch
▪ 2 Floating Point units x 128 FMACs
‒ built as 4 pipes, 2 Fadd, 2 Fmul
‒ 2 AES units
‒ SHA instruction support
▪ I-Cache 64K, 4-way
▪ D-Cache 32K, 8-way
▪ L2 Cache 512K, 8-way
▪ Large shared L3 cache
▪ 2 threads per core
5. 5 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
INTEGER FLOATING
POINT
Integer Physical Register File FP Register File
Decode
512K
L2 (I+D) Cache
8 Way
ADD MUL ADDMUL4x ALUs
Micro-op Cache
Micro-op Queue
64K I-Cache 4 way Branch Prediction
Integer Rename Floating Point Rename
Schedulers Scheduler
L0/L1/L2
ITLB
Retire Queue
2x AGUs
Load Queue
Store Queue 32K D-Cache
8 Way
L1/L2
DTLBLoad Queue
Store Queue
512K
L2 (I+D) Cache
8 Way
MUL ADD
32K D-Cache
8 Way
L1/L2
DTLB
Integer Physical Register File
Integer Rename
Schedulers
FP Register File
Floating Point Rename
Scheduler
64K I-Cache 4 way
Decode
instructions
ADDMUL4x ALUs
Vertically Threaded
Op-Cache
micro-ops
Micro-op Queue
Branch Prediction
L0/L1/L2
ITLB
Competitively shared structures
Statically Partitioned
Competitively shared with Algorithmic Priority
Competitively shared and SMT Tagged
Retire Queue
6 ops dispatched
2x AGUs
SMT OVERVIEW
▪ All structures fully available in 1T mode
▪ Front End Queues are round robin
with priority overrides
▪ Increased throughput from SMT for data
center workloads
6. 6 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
32B/cycle
32B fetch 32B/cycle
CORE 0
32B/cycle
32B/cycle
2*16B load
1*16B store
8M L3
I+D
Cache
16-way
512K L2
I+D
Cache
8-way
32K
D-Cache
8-way
64K
I-Cache
4-way
“ZEN” CACHE HIERARCHY
▪ Fast private L2 cache, 12 cycles
▪ Fast shared L3 cache, 35 cycles
▪ L3 filled with victims of the L2
▪ L2 tags duplicated in L3 for probe
filtering and fast cache transfer
▪ Multiple smart prefetchers
▪ 50 outstanding misses supported
from L2 to L3 per core
▪ 96 outstanding misses supported
from L3 to memory
HIGH BANDWIDTH, LARGE QUEUES FOR DATA CENTER
7. 7 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
CPU COMPLEX
▪ A CPU complex (CCX) is four cores
connected to an L3 Cache
▪ The L3 Cache is 16-way associative,
8MB, mostly exclusive of L2
▪ The L3 Cache is made of 4 slices, by
low-order address interleave
▪ Every core can access every cache
with same average latency
▪ Utilize upper level metals for high
frequency data transfer
▪ Multiple CCXs instantiated at SoC
level
CORE 3
CORE 1L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
CORE 3L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
CORE 0 L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
CORE 2 L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
9. 9 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
“ZEN”
Totally New
High-performance
Core Design
New High-Bandwidth,
Low Latency
Cache System
Designed from the ground up for
optimal balance of performance
and power for the datacenter
Simultaneous
Multithreading (SMT)
for High Throughput
Energy-efficient FinFET
Design for Enterprise-class
Products
ZEN DELIVERS
GREATER THAN
(vs Prior AMD processors) 52% IPC*
10. 10 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
VIRTUALIZATION ISA AND MICROARCHITECTURE ENHANCEMENTS
ISA FEATURE NOTES “Bulldozer” “Zen”
Data Poisoning Better handling of memory errors ✓
AVIC Advanced Virtual Interrupt Controller ✓
Nested Virtualization Nested VMLOAD, VMSAVE, STGI, CLGI ✓
Secure Memory Encryption (SME) Encrypted memory support ✓
Secure Encrypted Virtualization (SEV) Encrypted guest support ✓
PTE Coalescing Combines 4K page tables into 32K page size ✓
“Zen” Core built for the Data Center
Microarchitecture Features
Reduction in World
Switch latency
Dual Translation
Table engines for
TLBs
Tightly coupled L2/L3
cache for scale-out
Private L2: 12 cycles
Shared L3: 35 cycles
Cache + Memory
topology reporting
for Hypervisor/OS
scheduling
11. 11 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
HARDWARE MEMORY ENCRYPTION
SECURE MEMORY ENCRYPTION (SME)
▪ Protects against physical memory attacks
▪ Single key is used for encryption of
system memory
‒ Can be used on systems with Virtual Machines (VMs) or Containers
▪ OS/Hypervisor chooses pages to encrypt
via page tables or enabled transparently
▪ Support for hardware devices (network, storage,
graphics cards) to access encrypted pages
seamlessly through DMA
▪ No application modifications required
Defense Against Unauthorized
Access to Memory
DRAM
AES-128 Engine
OS/Hypervisor
VM Sandbox
/ VM
Key
Container
Applications
12. 12 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
HARDWARE MEMORY ENCRYPTION
SECURE ENCRYPTED VIRTUALIZATION (SEV)
▪ Protects VMs/Containers from each other,
administrator tampering, and untrusted
Hypervisor
▪ One key for Hypervisor and one key per VM,
groups of VMs, or VM/Sandbox with multiple
containers
▪ Cryptographically isolates the hypervisor from
the guest VMs
▪ Integrates with existing AMD-V technology
▪ System can also run unsecure VMs
▪ No application modifications required
DRAM
AES-128 Engine
OS/Hypervisor
VM Sandbox
/ VM
Key Key …
…
Container
Key
ApplicationsApplicationsApplications
13. 13 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
BUILT FOR SERVER: RAS (RELIABILITY, AVAILABILITY, SERVICEABILITY)
DRAM
Uncorrectable
Error
Core A
Poison
Cache
DRAM
Normal
Access
Good
Core B
(A) (B)
Uncorrectable Error limited to
consuming process (A) vs (B)
Data Poisoning
Enterprise RAS Feature Set
▪ Caches
‒ L1 data cache SEC-DED ECC
‒ L2+L3 caches with DEC-TED ECC
▪ DRAM
‒ DRAM ECC with x4 DRAM device failure correction
‒ DRAM Address/Command parity, write CRC—with replay
‒ Patrol and demand scrubbing
‒ DDR4 post package repair
▪ Data Poisoning and Machine Check Recovery
▪ Platform First Error Handling
▪ Infinity Fabric link packet CRC with retry
14. 14 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
MCM VS. MONOLITHIC
▪ MCM approach has many advantages
‒ Higher yield, enables increased feature-set
‒ Multi-product leverage
▪ EPYC Processors
‒ 4x 213mm2 die/package = 852mm2 Si/package*
▪ Hypothetical EPYC Monolithic
‒ ~777mm2*
‒ Remove die-to-die Infinity Fabric On-Package
(IFOP) PHYs and logic (4/die), duplicated logic, etc.
‒ 852mm2 / 777mm2 = ~10% MCM area overhead
* Die sizes as used for Gross-Die-Per-Wafer
MCM Delivers Higher Yields, Increases Peak Compute, Maximizes Platform Value
▪ Inverse-exponential reduction in yield
with die size
‒ Multiple small dies has inherent yield advantage
I/ODDR
CCX
CCX
Die 1
I/O
I/ODDR
CCX
CCX
Die 3
I/O
I/ODDR
CCX
CCX
Die 2
I/O
I/ODDR
CCX
CCX
Die 0
I/O
CCX
CCX
CCX
CCX
CCX
CCX
CCX
CCX
I/O
DDR
I/O I/O I/O
DDR
DDRDDR
I/O I/O I/O I/O
32C Die Cost
1.0X
32C Die Cost
0.59X1
15. 15 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
CHIP FLOORPLANNING FOR PACKAGE
▪ DDR placement on one die edge
▪ Chips in 4-die MCM rotated 180°, DDR facing package
left/right edges
▪ Package-top Infinity Fabric pinout requires diagonal
placement of IFIS
▪ 4th IFOP enables routing
of high-speed I/O in only
four package substrate
layers
IFIS/PCIe IFOP IFOP
Zen
Zen
Zen
Zen
L3
Zen
Zen
Zen
Zen
L3
IFOP IFIS/PCIe/SATAIFOP
DDR
DDR
I/O
CCX CCX
DDR I/O
16. 16 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
ADVANCED PACKAGING
▪ 58mm x 75mm organic package
▪ 4 die-to-die IFOP links/die
‒ 3 connected/die
▪ 2 SERDES/die to pins
‒ 1/die to “top” (G) and “bottom” (P) links
‒ Balanced I/O for 1P, 2P systems
▪ 2 DRAM-channels/die to pins
Purpose-built MCM Architecture
G[0-3], P[0-3] : 16 lane high-speed SERDES Links
M[A-H] : 8 x72b DRAM channels
: Die-to-Die Infinity Fabric On-Package links
CCX: 4Core + L2/core + shared L3 complex
P1
P0
P2
P3
G2
G3
G1
G0
MF
MG
MH
ME
Die-to-Package Topology Mapping
MA
MD
MC
MB
I/ODDR
CCX
CCX
Die 1
I/O
I/ODDR
CCX
CCX
Die 3
I/O
I/ODDR
CCX
CCX
Die 2
I/O
I/ODDR
CCX
CCX
Die 0
I/O
18. 18 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
INFINITY FABRIC: MEMORY SYSTEM
Optimized Memory Bandwidth
and Capacity per Socket
▪ 8 DDR4 channels per socket
▪ Up to 2 DIMMs per channel, 16 DIMMs per socket
▪ Up to 2667MT/s, 21.3GB/s, 171GB/s per socket
‒ Achieves 145GB/s per socket, 85% efficiency*
▪ Up to 2TB capacity per socket (8Gb DRAMs)
▪ Supported memory
‒ RDIMM
‒ LRDIMM
‒ 3DS DIMM
‒ NVDIMM-N
19. 19 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
I/O SUBSYSTEM
x16
x4
x2 x2
x8
x4 x4
x2 x2 x2 x2
x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x
Combo Link Type “B” Capabi
x16
x8
x4 x4
x2 x2
x8
x4 x4
x2 x2 x2 x2
x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1
x1 x1 x1 x1
x2 x2
x1 x1 x1 x1
x1 x1 x1
Restrictions:
1) Ports from a single PHY must be of the same type
(one of xGMI, PCIe, SATA).
x1
Combo Link Type “A” Capabilities
x16x16
PHY
A0
PHY
A1
PHY
A2
PHY
A3
PHY
A4
PHY
B2
PHY
B3
PHY
B4
xGM
x16
x8
x4 x4
x2 x2
x8
x4 x4
x2 x2 x2 x2
x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1
x2 x2
x1 x1 x1 x1
Combo Link Type “B” Capabilities
x16
x8
x4 x4
x2 x2
x8
x4 x4
x2 x2 x2 x2
x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1
x1 x1 x1 x1
x2 x2
x1 x1 x1 x1
x1 x1 x1
Restrictions:
1) Ports from a single PHY must be of the same type
(one of xGMI, PCIe, SATA).
2) There are a maximum of 8 PCIe ports in any 16-lane combo link.
3) Ports from PHY A0 and PHY A1 must all be of the same type
(one of xGMI, PCIe, SATA).
x1
Combo Link Type “A” Capabilities
x16x16
PHY
A0
PHY
A1
PHY
A2
PHY
A3
PHY
A4
PHY
B0
PHY
B1
PHY
B2
PHY
B3
PHY
B4
PCIe Port
SATA Port
xGMI Link
Architected for Massive I/O Systems with
Leading-edge Feature Set, Platform Flexibility
▪ 8 x16 links available in 1 and 2 socket systems
‒ Link bifurcation support; max of 8 PCIe® devices per x16
‒ Socket-to-Socket Infinity Fabric (IFIS), PCIe, and SATA support
▪ 32GB/s bi-dir bandwidth per link, 256GB/s per socket
▪ PCIe Features*
‒ Advanced Error Reporting (AER)
‒ Enhanced Downstream Port Containment (eDPC)
‒ Non-Transparent Bridging (NTB)
‒ Separate RefClk with Independent Spread support (SRIS)
‒ NVMe and Hot Plug support
‒ Peer-to-peer support
▪ Integrated SATA support
* = See PCI-SIG for details
PCIe
IFIS Infinity Fabric
PCIe
IFIS Infinity Fabric
SATA
High Speed Link
(SERDES)
Capabilities
20. 20 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
EPYC™—1P I/O PERFORMANCE
▪ High bandwidth,
high efficiency PCIe®
‒ Local DMA, Peer-to-Peer (P2P)
High Performance I/O and Storage
Achieved
1x16 PCIe Bandwidth*
(Efficiency vs 16GB/s per direction)
Local
DRAM
(GBs / Efficiency1)
Die-to-Die
Infinity Fabric (1-hop)
(GBs / Efficiency1)
DMA
Read 12.2 / 76% 12.1 / 76%
Write 13.0 / 81% 14.2 / 88%
Read + Write 22.8 / 71% 22.6 / 70%
P2P Write NA 13.6 / 85%
1: Peak efficiency of PCIe is ~85%-95% due to protocol overheads
EPYC™ 7601 1P
16 x Samsung pm1725a NVMe, Ubuntu 17.04,
FIO v2.16, 4KB 100% Read Results
9.2M IOPs*
▪ High performance storage
‒ ~285K IOPs/Core with 32 cores
‒ 1P direct-connect 24 drives x4
(no PCIe switches)
21. 21 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
POWER MANAGING I/O
Dynamic Link Width Modulation
▪ The unprecedented connectivity that EPYC™ provides can consume additional power
▪ Full bandwidth is not required for all workloads
▪ Dynamic Link Width Modulation tracks the bandwidth demand of infinity fabric connections
between sockets and adapts bandwidth to the need
▪ Up to 8% performance/Watt gains at the CPU socket*
FABRIC
64 LANES
HIGH SPEED I/O
64 LANES
HIGH SPEED I/O
*See Endnotes
22. 22 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
Power Performance
POWER: PERFORMANCE-DETERMINISM, POWER-DETERMINISM,
CONFIGURABLE TDP
▪ Server systems and environments vary
▪ Silicon varies
‒ Faster / higher leakage parts
‒ Slower / lower leakage parts
▪ Some customers demand repeatable, deterministic
performance for all environments
‒ In this mode, power consumption will vary
▪ Others may want maximum performance with
fixed power consumption
‒ In this mode, performance will vary
▪ EPYC™ CPUs allow boot-time choice of either mode
0.96
0.98
1
1.02
1.04
1.06
Performance
Max System and Silicon Margin Min
Power Determinism Mode
0
0.5
1
1.5
Power
Perf Determinism Mode
Normalized Performance and Power
Max System and Silicon Margin Min
Product TDP Low range High range
180W 165W 200W
155W 135W 170W
120W 105W 120W
▪ EPYC allows configurable TDP for TCO and peak
performance vs performance/Watt optimization
23. 23 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
POWER: PER-CORE LINEAR VOLTAGE REGULATION
▪ Running all dies/cores at the voltage required by
the slowest wastes power
▪ Per-core voltage regulator capabilities enable
each core’s voltage to be optimized
‒ Significant core power savings and variation reduction
Per-core LDO Controlled
Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7
VDD0
VDD1
RVDD (VRM output)
Core8 Core9 Core10 Core11 Core12 Core13 Core14 Core15
VDD8
VDD9
Die0 Die1
Core16 Core17 Core18 Core19 Core20 Core21 Core22 Core23
VDD16
VDD17
Die2
Core24 Core25 Core26 Core27 Core28 Core29 Core33 Core31
VDD24
VDD31
Die3
24. 24 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
0
70
140
210
280
350
TRIAD TRIAD
STREAM Memory Bandwidth (EPYC 7601 1P -> 2P)
1P 2P
DELIVERED CPU PERFORMANCE
▪ Memory bandwidth and scaling
‒ 97% Bandwidth scaling 1P->2P
‒ Bandwidth for Disruptive 1P and 2P performance
▪ INT and FP performance
‒ Memory bandwidth for demanding FP
applications
‒ Competitive INT and FP performance
‒ Leadership TCO
Optimizing Compiler
(See endnotes for calculations)
AMD EPYC 7601
64 CORES, 2.2 GHz
HP DL385 Gen10
AOCC
INTEL XEON 8176
56 CORES, 2.1 GHz
HP DL380 Gen10
ICC
EPYC
Uplift
2P
SPECrate®2017_int_base 272 254 +7%
SPECrate®2017_fp_base 259 225 +15%
Peak Mem BW 8x2666 6x2666 +33%
MSRP $4200 $8719 Better Value
EPYC Processors
Deliver Compelling
Performance,
Memory Bandwidth,
and Value GB/s
25. 25 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
AMD EPYC™ 7000
SERIES PROCESSOR
Architected for Enterprise and
Datacenter Server TCO
Up to 32 High-Performance “Zen” Cores
8 DDR4 Channels per CPU
Up to 2TB Memory per CPU
128 PCIe® Lanes
Dedicated Security Subsystem
Integrated Chipset for 1-Chip Solution
Socket-Compatible with Next Gen EPYC
Processors
26. 26 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
GLOSSARY OF TERMS
“Bulldozer” = AMD’s previous family of high performance CPU cores
CAKE = Coherent AMD SocKet Extender
CCX = Tightly Coupled Core + Cache Complex
4Core + L2 cache/core + Shared L3 cache complex
CLGI = Clear Global Interrupt
DIMM = Dual Inline Memory Module
ECC = Error Checking and Correcting
SEC-DED = Single Error Correct, Double Error Detect
DEC-TED = Double Error Correct, Triple Error Detect
GPU = Graphics Processing Unit
IPC = Instructions Per Clock
IFIS = Infinity Fabric Inter-Socket
IFOP = Infinity Fabric On-Package
LDO = Low Drop Out, a linear voltage regulator
MCM = Multi-Chip Module
NUMA = Non-Uniform Memory Architecture
OS = Operating System
Post Package Repair = Utilize internal DRAM array redundancy to map out
errant locations
PSP = Platform Security Processor
RAS = Reliability, Availability, Serviceability
SATA = Serial ATA device connection
SCH = Server Controller Hub
SCM = Single Chip Module
Scrubbing = Searching for, and proactively correcting, ECC errors
SERDES = Serializer/Deserializer
STGI = Set Global Interrupt
TCO = Total Cost of Ownership
TDP = Thermal Design Power
Topology Reporting = Describing cache and memory layout via
ACPI/Software structures for OS/Hypervisor task scheduling optimization
VM = Virtual Machine
VRM = Voltage Regulator Module
World Switch = Switch between Guest Operating System and Hypervisor
27. 27 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
ENDNOTES
SLIDE 9
Updated Feb 28, 2017: Generational IPC uplift for the “Zen” architecture vs. “Piledriver” architecture is +52% with an estimated SPECint_base2006 score compiled with GCC 4.6 –O2 at a fixed 3.4GHz. Generational
IPC uplift for the “Zen” architecture vs. “Excavator” architecture is +64% as measured with Cinebench R15 1T, and also +64% with an estimated SPECint_base2006 score compiled with GCC 4.6 –O2, at a fixed
3.4GHz. System configs: AMD reference motherboard(s), AMD Radeon™ R9 290X GPU, 8GB DDR4-2667 (“Zen”)/8GB DDR3-2133 (“Excavator”)/8GB DDR3-1866 (“Piledriver”), Ubuntu Linux 16.x (SPECint_base2006
estimate) and Windows® 10 x64 RS1 (Cinebench R15). SPECint_base2006 estimates: “Zen” vs. “Piledriver” (31.5 vs. 20.7 | +52%), “Zen” vs. “Excavator” (31.5 vs. 19.2 | +64%). Cinebench R15 1t scores: “Zen” vs.
“Piledriver” (139 vs. 79 both at 3.4G | +76%), “Zen” vs. “Excavator” (160 vs. 97.5 both at 4.0G| +64%). GD-108
SLIDE 14
Based on AMD internal yield model using historical defect density data for mature technologies.
SLIDE 18
In AMD internal testing on STREAM Triad, 2 x EPYC 7601 CPU in AMD “Ethanol” reference system, Ubuntu 16.04, Open64 v4.5.2.1 compiler suite, 512 GB (16 x 32GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD scored
290. NAP-22
SLIDE 20
In AMD internal testing on IO Bandwidth benchmarks. 2x EPYC 7601 in AMD “Ethanol” reference systems, BIOS “TDL0080F” (default settings), Ubuntu 15.04.2, 512GB (16 x 32GB 2Rx4 PC4-2666) memory. Read and
Write tests used AMD FirePro W7100 GPUs, 256B payload size, Driver “amdgpu-pro version 17.10-450821”. Read + Write tests used Mellanox IB 100Gbps Adapter, 512B payload size, Driver “OFED 3.3-1.04”. Read
and Write DMA tests performance using AMD SST (System Stress Tool). Write peer-to-peer (P2P) tests performance using AMD P2P (Peer to Peer) tool. Read + Write DMA tests performance using OFED qperf tool.
1 x EPYC 7601 CPU in HPE Cloudline CL3150, Ubuntu 17.04 4.10 kernel (Scheduler changed to NOOP, CPU governor set to performance), 256 GB (8 x 32GB 2Rx4 PC4-2666) memory, 24 x Samsung pm1725a NVMe
drives (with only 16 enabled): FIO v2.16 (4 Jobs per drive, IO Depth of 32, 4K block size) Average Read IOPs 9,178,000 on 100% Read Test (Average BW 35.85 GB/s); FIO (4 jobs per drive, IO depth of 10, 4K block
size) Average Write IOPs 7,111,000 on 100% Write Test (Average BW 27.78 GB/s Each run was done for 30 seconds with a 10 second ramp up using 16 NVMe drives. NAP-24
SLIDE 21
EPYC 7601 DVT part on AMD Ethanol 2P system, 32GB x16 (512GB) DR-2667 full pop 1of1, 1 500GB SSD, 1 1GBit Nic, Ubuntu 16.04 noGUI. Feature off gets an estimated score of 1297, feature on gets an estimated
score of 1321. Selectable power used to increase from 180W to 195W with feature-off yields the same feature-on estimated score of 1321 indicating an effective 8% perf/Watt improvement at the socket level for
feature-on (195W/180W)
28. 28 COOLCHIPS 21 | AMD EPYCTM MICROPROCESSOR ARCHITECTURE
ENDNOTES
SLIDE 24
MSRP for AMD EPYC 7601 of $4200 and for Intel Platinum 8176 of $8719 based on Anandtech https://www.anandtech.com/show/11544/intel-skylake-ep-vs-amd-epyc-7000-cpu-battle-of-the-decade/10
AMD EPYC 7601 1-socket to 2-socket memory bandwidth scaling achieves up to 1.97x. In AMD internal testing on STREAM Triad, 1 x EPYC 7601 CPU in AMD “Ethanol” reference system, Ubuntu 16.04, Open64
v4.5.2.1 compiler suite, 256 GB (8 x 32GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD scored 147; 2 x EPYC 7601 CPU in AMD “Ethanol” reference system, Ubuntu 16.04, Open64 v4.5.2.1 compiler suite, 512 GB (16 x
32GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD scored 290
AMD EPYC 7601 CPU-based 2-socket system scored 272 on SPECrate®2017_int_base, 7% higher than an Intel Platinum 8176-based 2-socket system which scored 254, based on publications on www.spec.org as of
March 7, 2018. 2 x EPYC 7601 CPU in HPE ProLiant DL385 Gen10 scored 272 on SPECrate®2017_int_base published November 2017 at www.spec.org versus 2x Intel Xeon Platinum 8176 CPU in HPE ProLiant DL380
Gen10 score of 254 published October 2017 at www.spec.org. SPEC and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.
https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171114-00833.pdf
https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171003-00078.pdf
AMD EPYC 7601 CPU-based 2-socket system scored 259 on SPECrate®2017_fp_base, 15% higher than an Intel Platinum 8176-based 2-socket system which scored 225, based on publications on www.spec.org as of
March 7, 2018. 2 x EPYC 7601 CPU in HPE ProLiant DL385 Gen10 scored 259 on SPECrate®2017_fp_base published November 2017 at www.spec.org versus 2x Intel Xeon Platinum 8176 CPU in HPE ProLiant DL380
Gen10 score of 225 published October 2017 at www.spec.org. SPEC and SPECfp are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.
https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171114-00845.pdf
https://www.spec.org/cpu2017/results/res2017q4/cpu2017-20171003-00077.pdf