Shared Memory Centric Computing with CXL & OMIAllan Cantle
Discusses how CXL can be better utilized as a separate Fabric Cache domain to a processors own Local Cache Domain. This is done by leveraging a Shared Memory Centric architectures that utilize both the Open Memory Interface OMI, and Compute eXpress Link, CXL, for the memory ports.
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
It is common in the HPC community that the achieved performance with just CPUs is limited for many computational cases. The EuroHPC pre-exascale and the coming exascale systems are mainly focused on accelerators, and some of the largest upcoming supercomputers such as LUMI and Frontier will be powered by AMD Instinct accelerators. However, these new systems create many challenges for developers who are not familiar with the new ecosystem or with the required programming models that can be used to program for heterogeneous architectures. In this paper, we present some of the more well-known programming models to program for current and future GPU systems. We then measure the performance of each approach using a benchmark and a mini-app, test with various compilers, and tune the codes where necessary. Finally, we compare the performance, where possible, between the NVIDIA Volta (V100), Ampere (A100) GPUs, and the AMD MI100 GPU.
Presentation of a paper accepted in Supercomputing Frontiers Asia 2022
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldAllan Cantle
These slides are part of a "Trends in Memory Desegregation" Webinar published in March 2021. You can see the webinar recording here https://youtu.be/g0QEX5qE8kE.
The presentation slides show how the Open Memory Interface, OMI , is a critical System Architecture building block towards our industry being able to easily build Domain Specific Architectures of the future as defined by the gods of Computing Architecture John Hennessy and David Patterson.
If AMD Adopted OMI in their EPYC ArchitectureAllan Cantle
AMD's EPYC Architecture has paved the way forward towards Heterogeneous Data Centric Computing, but it is still limited by it's parallel DDR interfaces. This presentation shows the potential for the EPYC architecture if it adopted the Open Memory Interface, OMI, for it's Near Memory interface.
eMMC 5.0 is the latest generation of embedded NAND Flash IP. Arasan provides a complete solution including digital controllers for host and device, the mixed PHY I/O and pads, software drivers, hardware validation and support.
STT MRAM for Artificial Intelligence ApplicationsDanny Sabour
The rise of artificial intelligence (AI) has been making a huge impact on our daily lives, especially in the fields of recommendation systems, image recognition, natural language processing, and autonomous driving. As the amount of input data, weight parameters and intermediate data in the machine learning process grows exponentially, memory becomes a critical bottleneck, which requires a high density, low power and high speed non-volatile memory (NVM) solution. Among emerging NVM technologies, spin-transfer torque magnetoresistive random access memory (STT-MRAM) based on perpendicular magnetic tunnel junctions (pMTJ) shows distinct advantages:
- High speed (comparable to DRAM and LLC), low standby power (vs eSRAM) and low active power deliver high tera operation per second (TOPS) per watt during the training and inference process.
- Practically unlimited endurance allows large amount of data to be intensively processed.
- High data retention allows weight parameters of neural network to be directly stored in NVM without extra power consumption (DRAM refreshing or SRAM leakage), especially for edge computing.
- Excellent scalability to advanced technology nodes (beyond 7 nm) allows ASIC design with high density (vs eSRAM) for specific AI applications.
Facebook presented, "Chiplets in Data Centers," at the ODSA Workshop. The charter of the ODSA (Open Domain Specification Architecture) Workgroup is to define an open specification that enables building of Domain Specific Accelerator silicon using best-of-breed components from the industry made available as chiplet dies that can be integrated together as Lego blocks on an organic substrate packaging layer. The resulting multi-chip module (MCM) silicon can be produced at significantly lower development and manufacturing costs, and will deliver much needed performance per watt and performance per dollar efficiencies in networking, security, machine learning and other applications. The ODSA Workgroup also intends to deliver implementations of the specification as board-level prototypes, RTL code and libraries.
During the CXL Forum at OCP Global Summit 23, Rick Kutcipal and Sreeni Bagalkote of Broadcom presented their PCIe/CXL Roadmap and announced their Atlas 4 CXL switch.
AMD has been away from the HPC space for a while, but now they are coming back in a big way with an open software approach to GPU computing. The Radeon Open Compute Platform (ROCm) was born from the Boltzman Initiative announced last year at SC15. Now available on GitHub, the ROCm Platform bringing a rich foundation to advanced computing by better integrating the CPU and GPU to solve real-world problems.
"We are excited to present ROCm, the first open-source HPC/ultrascale-class platform for GPU computing that’s also programming-language independent. We are bringing the UNIX philosophy of choice, minimalism and modular software development to GPU computing. The new ROCm foundation lets you choose or even develop tools and a language run time for your application."
Watch the video presentation: http://wp.me/p3RLHQ-fJT
Learn more: https://radeonopencompute.github.io/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Shared Memory Centric Computing with CXL & OMIAllan Cantle
Discusses how CXL can be better utilized as a separate Fabric Cache domain to a processors own Local Cache Domain. This is done by leveraging a Shared Memory Centric architectures that utilize both the Open Memory Interface OMI, and Compute eXpress Link, CXL, for the memory ports.
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
It is common in the HPC community that the achieved performance with just CPUs is limited for many computational cases. The EuroHPC pre-exascale and the coming exascale systems are mainly focused on accelerators, and some of the largest upcoming supercomputers such as LUMI and Frontier will be powered by AMD Instinct accelerators. However, these new systems create many challenges for developers who are not familiar with the new ecosystem or with the required programming models that can be used to program for heterogeneous architectures. In this paper, we present some of the more well-known programming models to program for current and future GPU systems. We then measure the performance of each approach using a benchmark and a mini-app, test with various compilers, and tune the codes where necessary. Finally, we compare the performance, where possible, between the NVIDIA Volta (V100), Ampere (A100) GPUs, and the AMD MI100 GPU.
Presentation of a paper accepted in Supercomputing Frontiers Asia 2022
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldAllan Cantle
These slides are part of a "Trends in Memory Desegregation" Webinar published in March 2021. You can see the webinar recording here https://youtu.be/g0QEX5qE8kE.
The presentation slides show how the Open Memory Interface, OMI , is a critical System Architecture building block towards our industry being able to easily build Domain Specific Architectures of the future as defined by the gods of Computing Architecture John Hennessy and David Patterson.
If AMD Adopted OMI in their EPYC ArchitectureAllan Cantle
AMD's EPYC Architecture has paved the way forward towards Heterogeneous Data Centric Computing, but it is still limited by it's parallel DDR interfaces. This presentation shows the potential for the EPYC architecture if it adopted the Open Memory Interface, OMI, for it's Near Memory interface.
eMMC 5.0 is the latest generation of embedded NAND Flash IP. Arasan provides a complete solution including digital controllers for host and device, the mixed PHY I/O and pads, software drivers, hardware validation and support.
STT MRAM for Artificial Intelligence ApplicationsDanny Sabour
The rise of artificial intelligence (AI) has been making a huge impact on our daily lives, especially in the fields of recommendation systems, image recognition, natural language processing, and autonomous driving. As the amount of input data, weight parameters and intermediate data in the machine learning process grows exponentially, memory becomes a critical bottleneck, which requires a high density, low power and high speed non-volatile memory (NVM) solution. Among emerging NVM technologies, spin-transfer torque magnetoresistive random access memory (STT-MRAM) based on perpendicular magnetic tunnel junctions (pMTJ) shows distinct advantages:
- High speed (comparable to DRAM and LLC), low standby power (vs eSRAM) and low active power deliver high tera operation per second (TOPS) per watt during the training and inference process.
- Practically unlimited endurance allows large amount of data to be intensively processed.
- High data retention allows weight parameters of neural network to be directly stored in NVM without extra power consumption (DRAM refreshing or SRAM leakage), especially for edge computing.
- Excellent scalability to advanced technology nodes (beyond 7 nm) allows ASIC design with high density (vs eSRAM) for specific AI applications.
Facebook presented, "Chiplets in Data Centers," at the ODSA Workshop. The charter of the ODSA (Open Domain Specification Architecture) Workgroup is to define an open specification that enables building of Domain Specific Accelerator silicon using best-of-breed components from the industry made available as chiplet dies that can be integrated together as Lego blocks on an organic substrate packaging layer. The resulting multi-chip module (MCM) silicon can be produced at significantly lower development and manufacturing costs, and will deliver much needed performance per watt and performance per dollar efficiencies in networking, security, machine learning and other applications. The ODSA Workgroup also intends to deliver implementations of the specification as board-level prototypes, RTL code and libraries.
During the CXL Forum at OCP Global Summit 23, Rick Kutcipal and Sreeni Bagalkote of Broadcom presented their PCIe/CXL Roadmap and announced their Atlas 4 CXL switch.
AMD has been away from the HPC space for a while, but now they are coming back in a big way with an open software approach to GPU computing. The Radeon Open Compute Platform (ROCm) was born from the Boltzman Initiative announced last year at SC15. Now available on GitHub, the ROCm Platform bringing a rich foundation to advanced computing by better integrating the CPU and GPU to solve real-world problems.
"We are excited to present ROCm, the first open-source HPC/ultrascale-class platform for GPU computing that’s also programming-language independent. We are bringing the UNIX philosophy of choice, minimalism and modular software development to GPU computing. The new ROCm foundation lets you choose or even develop tools and a language run time for your application."
Watch the video presentation: http://wp.me/p3RLHQ-fJT
Learn more: https://radeonopencompute.github.io/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Vulkan and DirectX12 share many common concepts, but differ vastly from the APIs most game developers are used to. As a result, developing for DX12 or Vulkan requires a new approach to graphics programming and in many cases a redesign of the Game Engine. This lecture will teach the basic concepts common to Vulkan and DX12 and help developers overcome the main problems that often appear when switching to one of the new APIs. It will explain how those new concepts will help games utilize the hardware more efficiently and discuss best practices for game engine development.
For more, visit http://developer.amd.com/
This session provides an architectural introduction of Intel’s enthusiast system solutions, with an emphasis on performance tuning for gaming and content creation. The discussion will include key overclocking ecosystem ingredients such as Intel® Extreme Memory Profile (Intel® XMP) technology. Live demos will accompany our discussion. Attendees will leave with a good understanding of the overclocking capabilities of Intel’s latest processors.
http://myeventagenda.com/sessions/0B9F4191-1C29-408A-8B61-65D7520025A8/7/5
GPU compute has leveraged discrete GPUs for a fairly limited set of academic and supercomputing system workloads until recently. With the increase in performance of integrated GPU inside an Accelerated Processing Unit (APU), introduction of Heterogeneous System Architecture (HSA) devices, and proliferation of programming tools, we are seeing GPU compute make its way into mainstream applications. In this presentation we cover GPU compute and HSA, focusing on the application of GPU compute in the Medical and Print Imaging segments. Examples of performance data are reviewed and the case is made for how GPU compute can deliver tangible benefits.
Race to Reality: The Next Billion-People Market OpportunityAMD
On September 3rd, 2016 at IFA Berlin, Mark Papermaster, Chief Technology Officer AMD provided unique insights into the new era of Virtual Reality: "Race to Reality - The Next Billion-People Market Opportunity”.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
1. DELIVERING A NEW LEVEL
OF VISUAL PERFORMANCE
IN AN SOC
AMD “RAVEN RIDGE” APU
AMD CONFIDENTIAL
Dan Bouvier, Jim Gibney, Alex Branover, Sonu Arora
Presented by:
Dan Bouvier
Corporate VP, Client Products Chief Architect
2. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |2
CPU Performance GPU Performance Power
FIRST
“Zen”-based APU
LONG BATTERY LIFE
Premium form factors
HIGH-PERFORMANCE
On-die “Vega”-based graphics
AMD Ryzen™ 7 2700U 7th Gen AMD A-Series APU
200%
MORE CPU PERFORMANCE
Up to
Scaled GPU
and CPU up to
reach target
frame rate
Managed
power delivery
and thermal
dissipation
Improved
memory
bandwidth
efficiency
Upgraded
display
experience
Increased
package
performance
density
RAISING THE BAR FOR THE APU VISUAL EXPERIENCE
128%
MORE GPU PERFORMANCE
Up to
58%
LESS POWER
Up to
* See footnotes for details.
MOBILE APU GENERATIONAL
PERFORMANCE GAINS
3. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |3
“RAVEN RIDGE” APU
“ZEN” CPU
(4 CORE | 8 THREAD)
CPU 2
CPU 0
USB 3.1
-----------
USB 2.0
Display
Controller
Next
AMD GFX+
(11 COMPUTE UNITS)
Infinity Fabric
Platform
Security
Processor
4MB
L3 Cache
Multimedia
Engines
PCIe GPP
Video
Codec
Next
Audio
ACP
NVMe
-----------
SATA
X64DDR4
System
Management
Unit
CPU 1
CPU 3
PCIe
Discrete
GFX
CU CU CU CU
CU CU CU CU
CU
CU CU
X64DDR4
1MB L2
Cache
Sensor
Fusion
Hub
AMD “VEGA” GPU
AMD “ZEN” x86 CPU CORES
HIGH
BANDWIDTH
SOC FABRIC
& MEMORY
SYSTEM
FULL
SYSTEM
CONNECTIVITY
UPGRADED
DISPLAY ENGINE
INTEGRATED
SENSOR
FUSION HUB
ACCELERATED
MULTIMEDIA
EXPERIENCE
4. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |4
Technology: GLOBALFOUNDRIES 14nm – 11 layer metal
Transistor count: 4.94B
Die Size: 209.78mm2
“Raven Ridge” die
than prior generation “Bristol Ridge” APU
SIGNIFICANT DENSITY INCREASE
more transistors
59%
smaller die
16%
* See footnotes for details.
BGA Package: 25 x 35 x 1.38mm
5. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |5
INTEGRATED “VEGA” GRAPHICS
Graphics Engine
▪ Up to 11 Next Gen Compute Unit (NCU)
▪ 1 MB L2
▪ Flexible Geometry Engine
▪ 1 Draw Stream Binning Rasterizer
▪ 16 Pixels Units (32bpp)
▪ 44 Texture Units
DirectX® 12.1 Features
▪ Conservative Rasterization
▪ Raster Ordered Views
▪ Standard Swizzle
▪ Axis Aligned Rectangular Primitives
Throughput at 11 NCU
▪ 1200 MTri/sec @ 1200 Mhz
▪ Rendering 19.2 GPix/sec @1200 MHz
▪ 1690 FP32GFLOPS /
3379 FP16GFLOPS @ 1200 MHz
▪ 52.8 MTex per second @ 1200MHz
Infinity Fabric
Geometry/Raster/RB+
sDMA
CP
NCU Array
Shader System (SS) Workload Manager
Core Fabric
L2L2L2L2
ACE
ACE
ACE
ACE
6. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |6
“ZEN” CPU IMPROVES VISUAL FRAME RATE
Decode
4 instructions/cycle
512K
L2 (I+D) Cache
8 Way
ADD MUL ADDMULALU
2 loads + 1
store per cycle
6 ops dispatched
Op Cache
INTEGER FLOATING POINT
ALU ALU ALU
Micro-op Queue
64K I-Cache 4 way Branch Prediction
AGUAGU
Load/Store
Queues
Integer Physical Register File
32K D-Cache
8 Way
FP Register File
Integer Rename Floating Point Rename
Scheduler Scheduler Scheduler Scheduler SchedulerScheduler Scheduler
Micro-ops
CORE 3
CORE 1L3M
512MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
512MB
CORE 3L3M
512MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
512MB
CORE 0 L3M
512MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
512MB
CORE 2 L3M
512MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
512MB
High performance “Zen” core
▪ Free up more power for GPU
Shared L3 Cache
4MBL2 Cache per core
512KB“ZEN” CPU cores
4Up to
7. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |7
“ZEN” WITH PRECISION BOOST 2
▪ Governed by CPU temperature,
current, load
▪ Seeks highest possible frequency from
environmental inputs, graceful roll-off
▪ Opens new boost opportunities for
real-world nT workloads (e.g., games)
▪ 25MHz granularity
* See footnotes for details.
8. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |8
TUNE FOR THE PHASES OF VISUAL WORKLOADS
▪ Trade power/current based on
dynamic utilization:
− Core ↔ Core
− CPU ↔ GPU
▪ On-die regulation and fine-grained
frequency control enables fast,
accurate frequency and voltage
changes
▪ Fine-grained p-states (FGPS) across
the IPs - continuous frequency control
STEER POWER WHERE IT’S BEST USED
* See footnotes for details.
9. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |9
“ZEN” CPU AND “VEGA” GFX CO-MANAGEMENT
▪ CPU threads feed major GPU resources:
3D engine, compute engine, and DMA
engine (data fetch and writeback)
▪ CPU “submits” tasks, GFX “renders” or
“computes”
▪ One coherent control and data interface to
integrate and manage the full SoC
▪ Power budgeting based on activity and
efficiency
▪ Enhanced flow for quiescing/powering-off
CPU-GFX component
WITH INFINITY FABRIC
Infinity FabricMultimedia
Engines
“ZEN” CORE COMPLEX
“Zen”
Core
“Zen”
Core
“Zen”
Core
“Zen”
Core
L3
Cache
“VEGA” GRAPHICS
Graphics
Pipeline
L2 Cache
Pixel
Engines
Compute
Engine
I/O and
System Hub
Display
Engine
DDR4
Memory
Controllers
10. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |10
FAST DEPLOYMENT OF NEW ARCHITECTURE
▪ Standard port definition for IP connections
(SDP = Scalable Data Port)
− Common interface definition used for CPU, GPU,
I/O, multi-media hubs, display, memory controller
▪ Coherent HyperTransport™ transport layer
− Builds upon generations of coherent fabric
development
− Flexible topology to adapt to diverse SoC
configurations
▪ SDP hides complexities of coherence
protocol from connected IP
MODULAR AMD INFINITY FABRIC
Transport Layer
Engines
Memory
Controllers
I/O Sub
system
Accelerators
SDP
Interface
Modules
Engines
11. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |11
“RAVEN RIDGE” INFINITY FABRIC
“Raven Ridge” Optimizations
▪ 32 Byte internal datapath width
▪ Up to 1.6GHz for bandwidth
exceeding 50GB/s
▪ Up to 5 transfers/clock per switch
▪ Improved CPU latency under load,
while maintaining DRAM efficiency
▪ Structured for multi-region
power gating
▪ Floorplan-aware, optimized display
to memory routing
CPU Core
Complex
Memory
Controllers
I/O Sub
system
Display
Controller
Region A
Coherent
Master
Coherent
Slave
Memory
Controllers
Coherent
Slave
Transport Layer
Switch
Transport Layer
Switch
Non Coherent
Master
IO
Master/
Slave
Graphics
Container
Multimedia
Hub
Coherent
Master
Transport Layer
Switch
Transport Layer
Switch
Non Coherent
Master
Graphics
Container
Coherent
Master
12. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |12
Picker arbitration generally age ordered,
except when younger passes older due to:
1) priority
2) VC resource availability
3) other resource such as output port busy
QUALITY OF SERVICE FOR SMOOTH VISUAL EXPERIENCE
Three Request Classes
▪ Hard real time:
− High BW (e.g., display surface refresh)
− Low BW (e.g., audio)
▪ Soft real time (e.g., video playback)
▪ Non real time
(e.g., typical CPU/GPU/IO requests)
Architectural Mechanisms
▪ Multiple virtual channels
▪ Priority classes (Low/Medium/High/Urgent)
▪ End-to-end priority escalation by VC for out
of bounds conditions
Transport
Request
Queue
Transport
Response
Queue
Transport
Probe
Queue
Transport
Data
Queue
PICKERS PICKERS PICKERS PICKERS
Switch-level View of QoS Architecture
BUFFERS
VC
Dedicated
Tokens
Shared Pool
Tokens
13. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |13
MEMORY BOUND PERFORMANCE OPTIMIZATION
New features and optimized SoC
configuration contribute to
improved memory-limited
performance:
▪ Caching and algorithms to reduce
memory requests
▪ Improved lossless compression
usage (DCC)
▪ Better request ordering to reduce
DRAM page conflicts and
read/write turnarounds
Fabric Transport Layer
Memory
Controllers
“Vega” GFX
Engine
“Zen” CPU Core
Complex
Memory
Controllers
Display
Controller
Multimedia
Hub
4MB
L3 Cache
▪ 1MB Shared L2 Cache
▪ Larger dedicated GPU
TLB cache
▪ Deferred Primitive
Batch Binning
▪ Multi Level DRAM
Aware Reordering
Deeper
Arbitration
Queues
Direct Reads of
Compressed memory
Memory Efficient
Quality of Service
14. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |14
“RAVEN RIDGE” GRAPHICS SCALING
GENERATIONAL IMPROVEMENTS FOR MEMORY BOUND GAMING PERFORMANCE
Gaming performance scaling uplift
due to new AMD Vega GPU features:
▪ 4x larger GFX L2 cache, unified
across all graphics clients
▪ DSBR (Draw Stream Binning
Rasterizer) feature reduces
bandwidth
▪ Improved lossless DCC memory
compression
* See footnotes for details.
15. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |15
NEW GENERATION DISPLAY AND VIDEO CODEC ENGINE
Display Engine (DCN)
▪ Flexible display pipe architecture
− Up to four 4kp60 displays
▪ Low power display engine with DCC, 4K2K@60hz @Vmin
▪ HDR support
− From 32bpp to 64bpp surfaces
− From sRGB to BT2020
▪ Higher bandwidth interfaces - HDMI 2.1, DP 1.4, HBR3
▪ USB-Type C with display alt-mode
Video Codec (VCN)
▪ Unified encode and decode engine
− Up to 4kp60 HEVC 10b decode
− Up to 4kp30 HEVC 8b encode
▪ Low power video playback – 4kp30 @Vmin
▪ HEVC 10b decode
▪ HEVC encode for superior quality skype
▪ VP9 decode for efficient YouTube playback
Memory
Interface
Hub
F(+)
Input
Processing
Input
Processing
Input
Processing
Input
Processing
Output
Pipe
Output
Pipe
Output
Pipe
Output
Pipe
DISPLAY ENGINE
InfinityFabric
AltMode Ctrl
Type CDisplay
USB/DP Mux
USB
16. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |16
Represents LDO
Regulated / Power
Gating Region
L3
CPU Region
“ZEN” CORE COMPLEX
“VEGA” GRAPHICS COMPLEX
GFX Compute Region
GFX Region
VDD Region
VDD Package Rail
EFFICIENT POWER DELIVERY
▪ Current delivery overprovisioned for worst-case
overlap between CPU and GPU
▪ Fine-grain LDO control allows for efficient
tracking of the CPU and GFX phases, powered
by a unified VDD power rail
▪ 1st stage: off-chip motherboard vreg
2nd stage: on-chip vreg with digital LDO
▪ Multiple digital LDO regions for CPU cores,
graphics core, and sub-regions
− Idle engine is powered off
▪ Allows more peak CPU/GPU current to improve
boost performance
WITH DIGITAL LOW-DROPOUT REGULATORS
CPU 1CPU 0
CPU 2 CPU 3
System
Voltage
Regulator
* See footnotes for details.
17. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |17
SYNERGISTIC POWER RAIL SHARING
▪ Shared regulator reduces total
regulator current requirements
▪ Less motherboard power supply
footprint
▪ More peak CPU/GPU current to
improve boost performance
WITH DIGITAL LDO REGULATORS
* See footnotes for details.
18. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |18
DeeperLowPowerStates
FasterEntry/ExitLatencies
ENHANCED POWER OFF STATE
For CPU Cores
▪ Each core can enter CC6 power gating
▪ CPUOFF can lower L3 cache power when
all cores in CC6
For Graphics
▪ Gating can power down up to 95% of
the GPU
▪ GFXOFF can further power down GPU
un-core (aka GPU monitor logic)
GFXOFF+CPUOFF=VDDOFF;
Halts System VDD Regulator
▪ Up to 99% residency in Windows static
screen idle*
CPU AND GPU
Region Power Gating
by LDO PG Headers
Latencies 100us or less
Multiple LDO
Regions Gated
Latencies 1.5ms or less
Input VDD
Rail Off
CC6
Active States, Deep
Sleep States, Clock
Gated States
Active States,
Clock Gated
States
Meet
CC6
Entry
Timer
Meet
GFX Idle
Entry
Timer
CPUOFF GFXOFF
All Cores
in CC6 and
Meet CPUOFF
Entry Timer
Meet
GFXOFF
Entry
Timer
Enter if
Simultaneous
CPUOFF and
GFXOFF
VDDOFF
Graphics
Power
Gating
* See footnotes for details.
19. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |19
MORE THERMAL COMPUTE HEADROOM IN NOTEBOOKS
SKIN TEMPERATURE AWARE POWER MANAGEMENT (STAPM)
Before STAPM:
APU guard-banded to Tj~60C to meet
Tskin requirements
After STAPM:
Delta between ambient and Tskin
calculated based on the power/activity
system components
Conceptual example of behavior
20. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |20 * See footnotes for details.
AMD Ryzen™ 7
2700U
Core i7-8550U Core i7-7500U AMD FX™ 9800PAMD Ryzen™ 5
2400G
Core i5-8400 Core i5-7400
3DMARK® TIME SPY
21. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |21
League of
Legends™
1080p
DirectX®9
Medium
DOTA™ 2
1080p
DirectX®11
Fastest+
CS:GO™
1080p
DirectX® 9
Medium
No MSAA
Quake®
Champions
1280x720
DirectX® 11
High
AverageFPS
GOOD VISUAL THRESHOLD
30
GAMING ON THE GO
IN AN ULTRATHIN
* See footnotes for details.
AMD RYZEN™ 5 2400G
DESKTOP PROCESSOR
TRUE HIGH-DEFINITION
1080P GAME PERFORMANCE
Battlefield 1
1080p
Low, DX12
Overwatch™
1080p
Medium
Rocket
League
1080p
Medium
Skyrim
1080p
Medium
Witcher 3
1080p
Low, Hair
Works Off
AverageFPS
GOOD VISUAL THRESHOLD
30
Overwatch™
1280x720
DirectX®11
Low79%
Render Scale
22. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |22
Developing energy efficient processors
has long been a design focus at AMD.
In 2014, AMD set a bold “25x20” goal to
deliver at least 25X more energy efficiency
in our mobile processors by 2020. Visit
AMD.com/25x20.
25XADDITIONAL ENERGY
EFFICIENCY BY 2020
(2014–2020)
25X
AMD ACCELERATING ENERGY EFFICIENCY
ON TRACK TO ACHIEVE OUR GOAL
Energy efficiency of AMD APUs* “25x20” goal
2 0 1 7
* See footnotes for details.
23. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |23
The true potential of
the APU realized by
combining “Zen” CPU
with “Vega” Graphics
Advances in power and
thermal management
provide more headroom
for visual throughput
Data movement
improvements at all
levels to reduce
bandwidth bottlenecks
24. | AMD Ryzen™ Processors with Radeon™ Vega Graphics - Hot Chips 30 |24
FOOTNOTES
Slide 2: Based on AMD testing as of 9/28/2017. System configuration(s): AMD Reference Motherboard (2700U), HP ENVY X360 (FX-9800P/”7th Gen APU”), Samsung 850 Pro SSD, Windows 10 x64 1703,
1920x1080. AMD Ryzen™ 7 2700U Graphics Driver: 23.20.768.9. AMD FX-9800P Graphics Driver: 22.19.662.4. 1x8GB DDR4-2133 (AMD FX-9800P). 2x4GB DDR4-2400 (AMD Ryzen™ 7 2700U). Power
Consumption defined as joules of power consumed during a complete run of Cinebench R15 nT: AMD FX™ 9800P = 3782 joules (100%) vs. AMD Ryzen™ 7 2700U =1594J (58% less). Different configurations
may yield different results
Slide 4: Based on “Bristol Ridge” die size of 250.04mm2 and transistor count of 3.1 billion.
Slide 7: Based on AMD testing of as of 9/25/2017. System configuration(s): AMD Reference Platform, AMD Ryzen™ 7 2700U APU, 2x4GB DDR4-2400, graphics driver 17.30.2015. AMD SenseMI technology is
built into all Ryzen processors, but specific features and their enablement may vary by product and platform. Learn more at http://www.amd.com/en/technologies/sense-mi.
Slide 8: Based on AMD testing as of 10/11/2017. Clock speed plot is a snapshot of 8 seconds of 3DMark Fire Strike. “Effective frequency” is the product of the reported clock speed and %time in active
workload C0 C-state.
Slide 14: Based on AMD testing as of 6/11/2018. System configuration(s): AMD “Bristol Ridge” Mobile APU reference platform, AMD FX-9800P, 2x8GB DDR4-2400, Crucial BX100 SSD, Windows 10 x64 Build
16299, Graphics Driver: 21.19.384.20, BIOS: TMY130BA; AMD Ryzen™ Mobile APU reference platform, AMD Ryzen™ 7 2700U, 2x8GB DDR4-2400, WD7500BPKX, Windows 10 x64 Build 16299, Graphics
Driver: 24.20.154.6220, BIOS: WGV8215N
Slide 17: Based on AMD infrastructure requirements for “Bristol Ridge“ 15W TDP (VDDCR_CPU supply EDC limit is 35A, VDDCR_GFX supply EDC limit is 35A), and AMD infrastructure requirements for “Raven
Ridge” 15W TDP (VDDCR_VDD supply EDC limit is 45A).
Slide 18: Based on AMD internal data of an optimized AMD Ryzen™ Mobile APU reference platform as of 9/25/2017. PC manufacturers may vary configuration yielding different results.
Slide 20: Notebook: Based on AMD testing as of 9/25/2017. Common system configurations: Samsung 850 Pro SSD, Windows 10 x64 1703, 1920x1080; Intel Graphics Driver: 22.20.16.4691; AMD Ryzen™
mobile APU Graphics Driver: 23.20.768.9; AMD FX-9800P Graphics Driver: 22.19.662.4; AMD FX-9800P configured in HP ENVY X360 (1x8GB DDR4-2133). AMD Ryzen™ 7 2700U configured in AMD reference
platform (2x4GB DDR4-2400). Core i7-8550U configured in Acer Swift 3 (2x4GB DDR4-2400). Core i7-7500U configured in HP ENVY X360 (2x4GB DDR4-2400). Graphics results measured with 3DMark®
TimeSpy. Core i7-8550U score (350) is baseline 100%. Core i7-7500U score (377) is 107% of baseline. AMD FX-9800P score (400) is 114% of baseline. AMD Ryzen™ 7 2700U score (915) is 261% of baseline.
Different configurations may yield different results.
Desktop: Common system configurations: Samsung 850 Pro SSD, Windows 10 x64 Pro RS3, 1920x1080; Intel i5 8400 Graphics Driver: 15.47.02.4815; Intel I5-7400 Graphics Driver: 15.46.05.4771; AMD
Ryzen™ mobile APU Graphics Driver: CL1491290-171206a-321461E 2.1.1 RC5 17.40 RC19; AMD Ryzen™ 5 2400G configured in AMD reference platform (2x8GB DDR4-2667). Core i5-8400 configured in Z370
Aorus Gaming 5 (2x8GB DDR4-2667). Core i5-7400 configured in B250 Gaming M3 (2x8GB DDR4-2400).
Slide 21: Based on AMD testing as of 9/25/2017. System configuration(s): HP ENVY X360, AMD Ryzen™ 7 2700U, 2x4GB DDR4-2400, Samsung 850 Pro SSD, Windows 10 x64 1703, Graphics Driver:
17.30.1025, BIOS F11.
Desktop Testing by AMD Performance labs as of 01/02/2018 on the following systems. PC manufacturers may vary configurations yielding different results. Results may vary based on driver versions used.
System Configs: All systems equipped with 16GB dual-channel DDR4 @ 2666 MHz, Samsung 850 PRO 512GB SSD, Windows 10 RS2 operating system. Socket AM4 System: AMD Ryzen 5 2400G, AMD Ryzen
3 2200G, Myrtle RV motherboard. Graphics driver 23.20.768.0 (17.40).
Slide 22: Data source: AMD confidential based on internal test results of upcoming “Raven Ridge” APU.