HSA is a new heterogeneous programming model, created for lowering the learning curve of heterogeneous. This slide shares you the advanced features and HSA.
Disclaimer: Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License.
You assume all responsibility for use and potential liability associated with any use of the material.
Great Paper on HSAemu Full system simulator built form PQUEMU to do Full System Emulation of HSA from our Academic Member Yeh-Ching Chung of National Tsing Hua University
HSA is a new heterogeneous programming model, created for lowering the learning curve of heterogeneous. This slide shares you the advanced features and HSA.
Disclaimer: Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License.
You assume all responsibility for use and potential liability associated with any use of the material.
Great Paper on HSAemu Full system simulator built form PQUEMU to do Full System Emulation of HSA from our Academic Member Yeh-Ching Chung of National Tsing Hua University
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
Presentation CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Java applications, by Gary Frost and Vignesh Ravi at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
Dr. Lisa Su, Senior Vice President and GM, Global Business Units, AMD keynote from ISSCC on Heterogeneous Systems Architecture: The Next Area of Computing Innovation - Case Study, The Holodeck.
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”HSA Foundation
AFDS Keynote: “The Programmer’s Guide to the APU Galaxy.”
Phil Rogers, AMD Corporate Fellow
It’s a well-understood maxim in the technology industry that software and hardware must evolve in parallel, and be well matched, to achieve greatness. With the introduction of the world’s first APU in January 2011, AMD pointed the world toward a new way of computing. This was very much a first step in an architectural journey that is well underway at AMD. APUs combine different processing engines in single-chip combinations to strike a unique balance between the dimensions of performance, power consumption and price. Hear how AMD is working to ease the programmer’s access to this new level of compute horsepower and dramatically expand the processing resources available to modern applications
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
Keynote presentation, The Role of Java in Heterogeneous Computing, and How You Can Help, by Nandini Ramani, VP, Java Platform, Oracle Corporation, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...HSA Foundation
Phil Roger goes deeper into what HSA is, and some of the area it can address since his first presentation on HSA in 2011. He also announces the HSA Foundation and it founding members
In this video from SC13, Vinod Tipparaju presents an Heterogeneous System Architecture Overview.
"The HSA Foundation seeks to create applications that seamlessly blend scalar processing on the CPU, parallel processing on the GPU, and optimized processing on the DSP via high bandwidth shared memory access enabling greater application performance at low power consumption. The Foundation is defining key interfaces for parallel computation utilizing CPUs, GPUs, DSPs, and other programmable and fixed-function devices, thus supporting a diverse set of high-level programming languages and creating the next generation in general-purpose computing."
Learn more: http://hsafoundation.com/
Watch the video presentation: http://wp.me/p3RLHQ-aXk
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
Presentation Hc-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton at the AMD Developer Summit (APU13) November 11-13, 2013.
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD, at the Embedded Vision Alliance Summit, May 2014.
Harris Gasparakis, Ph.D., is AMD’s OpenCV manager. In addition to enhancing OpenCV with OpenCL acceleration, he is engaged in AMD’s Computer Vision strategic planning, ISVs, and AMD Ventures engagements, including technical leadership and oversight in the AMD Gesture product line. He holds a Ph.D. in theoretical high energy physics from YITP at SUNYSB. He is credited with enabling real-time volumetric visualization and analysis in Radiology Information Systems (Terarecon), including the first commercially available virtual colonoscopy system (Vital Images). He was responsible for cutting edge medical technology (Biosense Webster, Stereotaxis, Boston Scientific), incorporating image and signal processing with AI and robotic control.
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...AMD Developer Central
Presentation WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang, John Yoon and Nicolas Lorain at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sept-2016-member-meeting-hsa-foundation
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Dr. John Glossner, President of the HSA Foundation and CEO of GPT-US, delivers the presentation "Enabling Efficient Heterogeneous Processing Through Coherency" at the September 2016 Embedded Vision Alliance Member Meeting. Glossner describes the organization's goals and deliverables for enabling heterogenous programming.
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
Presentation CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Java applications, by Gary Frost and Vignesh Ravi at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
Dr. Lisa Su, Senior Vice President and GM, Global Business Units, AMD keynote from ISSCC on Heterogeneous Systems Architecture: The Next Area of Computing Innovation - Case Study, The Holodeck.
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”HSA Foundation
AFDS Keynote: “The Programmer’s Guide to the APU Galaxy.”
Phil Rogers, AMD Corporate Fellow
It’s a well-understood maxim in the technology industry that software and hardware must evolve in parallel, and be well matched, to achieve greatness. With the introduction of the world’s first APU in January 2011, AMD pointed the world toward a new way of computing. This was very much a first step in an architectural journey that is well underway at AMD. APUs combine different processing engines in single-chip combinations to strike a unique balance between the dimensions of performance, power consumption and price. Hear how AMD is working to ease the programmer’s access to this new level of compute horsepower and dramatically expand the processing resources available to modern applications
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
Keynote presentation, The Role of Java in Heterogeneous Computing, and How You Can Help, by Nandini Ramani, VP, Java Platform, Oracle Corporation, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...HSA Foundation
Phil Roger goes deeper into what HSA is, and some of the area it can address since his first presentation on HSA in 2011. He also announces the HSA Foundation and it founding members
In this video from SC13, Vinod Tipparaju presents an Heterogeneous System Architecture Overview.
"The HSA Foundation seeks to create applications that seamlessly blend scalar processing on the CPU, parallel processing on the GPU, and optimized processing on the DSP via high bandwidth shared memory access enabling greater application performance at low power consumption. The Foundation is defining key interfaces for parallel computation utilizing CPUs, GPUs, DSPs, and other programmable and fixed-function devices, thus supporting a diverse set of high-level programming languages and creating the next generation in general-purpose computing."
Learn more: http://hsafoundation.com/
Watch the video presentation: http://wp.me/p3RLHQ-aXk
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
Presentation Hc-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton at the AMD Developer Summit (APU13) November 11-13, 2013.
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD, at the Embedded Vision Alliance Summit, May 2014.
Harris Gasparakis, Ph.D., is AMD’s OpenCV manager. In addition to enhancing OpenCV with OpenCL acceleration, he is engaged in AMD’s Computer Vision strategic planning, ISVs, and AMD Ventures engagements, including technical leadership and oversight in the AMD Gesture product line. He holds a Ph.D. in theoretical high energy physics from YITP at SUNYSB. He is credited with enabling real-time volumetric visualization and analysis in Radiology Information Systems (Terarecon), including the first commercially available virtual colonoscopy system (Vital Images). He was responsible for cutting edge medical technology (Biosense Webster, Stereotaxis, Boston Scientific), incorporating image and signal processing with AI and robotic control.
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...AMD Developer Central
Presentation WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang, John Yoon and Nicolas Lorain at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sept-2016-member-meeting-hsa-foundation
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Dr. John Glossner, President of the HSA Foundation and CEO of GPT-US, delivers the presentation "Enabling Efficient Heterogeneous Processing Through Coherency" at the September 2016 Embedded Vision Alliance Member Meeting. Glossner describes the organization's goals and deliverables for enabling heterogenous programming.
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureDatabricks
Shuffle in Apache Spark is an intermediate phrase redistributing data across computing units, which has one important primitive that the shuffle data is persisted on local disks. This architecture suffers from some scalability and reliability issues. Moreover, the assumptions of collocated storage do not always hold in today’s data centers. The hardware trend is moving to disaggregated storage and compute architecture for better cost efficiency and scalability.
To address the issues of Spark shuffle and support disaggregated storage and compute architecture, we implemented a new remote Spark shuffle manager. This new architecture writes shuffle data to a remote cluster with different Hadoop-compatible filesystem backends.
Firstly, the failure of compute nodes will no longer cause shuffle data recomputation. Spark executors can also be allocated and recycled dynamically which results in better resource utilization.
Secondly, for most customers currently running Spark with collocated storage, it is usually challenging for them to upgrade the disks on every node to latest hardware like NVMe SSD and persistent memory because of cost consideration and system compatibility. With this new shuffle manager, they are free to build a separated cluster storing and serving the shuffle data, leveraging the latest hardware to improve the performance and reliability.
Thirdly, in HPC world, more customers are trying Spark as their high performance data analytics tools, while storage and compute in HPC clusters are typically disaggregated. This work will make their life easier.
In this talk, we will present an overview of the issues of the current Spark shuffle implementation, the design of new remote shuffle manager, and a performance study of the work.
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
Presentation for the sudoers Barcelona group 0ct 06 2015, on benchmarking Hadoop with ALOJA open source benchmarking platform. The presentation was mostly a live DEMO, posting some slides for the people who could not attend.
http://lanyrd.com/2015/sudoers-barcelona-october/
Leveraging Open Source to Manage SAN Performancebrettallison
Scope - The primary focus of this presentation is how to leverage open source software to help in managing Shared Storage performance. The storage server will be the focus with particular emphasis on ESS. This solution is a small one-off solution.
This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA Nsight
This is a presentation I gave on last GPGPU workshop we did on April 2013.
The usage of GPGPU is expanding, and creates a continuum from Mobile to HPC. At the same time, question is whether the GPGPU languages are the right ones (well, no) and aren't we wasting resources on re-developing the same SW stack instead of converging.
A basic introduction to GPU architecture. Based on Kayvon\'s "From Shader Code to a Teraflop: How GPU Shader Cores Work"
Updated to include the latest GPUs: AMD Tahiti (HD7970) and NVIDIA Kepler (GTX690)
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
2. DISCLAIMER:
This presentation is not an Official HSA Foundation
presentation.
Most of the Material is taken from HSA HotChips 2013
Some slides contains my insights / Opinions
5. HISTORIC PERSPECTIVE
Accelerated System
Program runs on CPU
API to access Accelerators
ASIC or Firmware
Configurable, but operation is
fixed
Heterogeneous System
Program runs on CPU
Offloads work on Accelerators
GPU, DSP, etc.
Offloaded work is JITed (compiled at
runtime)
5
Distributed SoC based
6. HSA FOUNDATION
Originated from AMD’s FSA – Fusion System Architecture
HSA Foundation Founded in June 2012
6
8. WHAT IS HSA ALL ABOUT ?
(MY TAKE)
“Bring Accelerators forward as a first class processor”
Unified address space, pageable memory, coherency
Eliminate drivers from dispatch path (user mode queues)
Standardized SW stack built on top of a set of HW requirements
Improve interoperability between IP vendors
Unified Architecture for Accelerators
Start from GPU, extend to DSP / FPGA
/ Fixed-Function Acc , etc.
SoC Centric
Major features are optimal for SoC
environment (same memory/die)
Support of distributed system is
possible, yet inefficient (PCI atomics,
others)
8Slide Taken from Phil Rogers HSA Overview, HotChips 2013
9. HSA WORKING GROUPS
HSA Systems Architecture
hUMA – Unified Memory Model
hQ – HSA Queuing Model
HSA Programmer Reference Specification
HSAIL – HSA Intermediate Language
HSA System Runtime
HSA Compliance
HSA Tools
9http://hsafoundation.com/standards/
10. OPENCL™ AND HSA
HSA is an optimized platform architecture
for OpenCL™
Not an alternative to OpenCL™
OpenCL™ on HSA will benefit from
Avoidance of wasteful copies
Low latency dispatch
Improved memory model
Pointers shared between CPU and GPU
OpenCL™ 2.0 shows considerable alignment
with HSA
Many HSA member companies are also active
with Khronos in the OpenCL™ working group
10Slide Taken from Phil Rogers HSA Overview, HotChips 2013
12. hUMA
HSA Unified Memory Architecture
Evolution of CPU / GPU memory systems:
1. CPU uses Virtual Addresses, GPU uses Physical Addresses
Memory had to be pinned
GPU can access a limited area in the CPU memory (Aperture)
Requires copy from system memory to GPU-visible memory
Pointer-based data structures can’t be shared
2. CPU uses Virtual Addresses, GPU uses Virtual Addresses (but not the same)
Memory still had to be pinned
12
GPU can access the entire system
memory
Copy is not required
Pointer-based data structures still
can’t be shared
3. hUMA
13. hUMA
HSA Unified Memory Architecture
Shared Virtual Memory
CPU & GPU see the
same addresses
Pageable Memory
GPU can (somehow)
initiate a page fault
Cache coherency
13
14. SHARED VIRTUAL MEMORY
Advantages
No mapping tricks, no copying back-and-forth between different PA
addresses
Send pointers (not data) back and forth between HSA agents.
Note the Hardware Implications …
Common Page Tables (and common interpretation of architectural
semantics such as shareability, protection, etc).
Common mechanisms for address translation (and servicing
address translation faults)
Concept of a process address space (PASID) to allow multiple, per
process virtual address spaces within the system.
14 Slide Taken from Ian bratt HSA QUEUEING, HotChips 2013
15. CACHE COHERENCY DOMAINS
Advantages
Composability
Reduced SW complexity when communicating between agents
Lower barrier to entry when porting software
Note the Hardware Implications …
Hardware coherency support between all HSA agents
Can take many forms
Stand alone Snoop Filters / Directories
Combined L3/Filters
Snoop-based systems (no filter)
Etc …
15 Slide Taken from Ian bratt HSA QUEUEING, HotChips 2013
17. hQ Motivation
1. GPU Dispatch has a lot of overhead
SW/Driver stack overhead
User mode to Kernel mode switch
17
18. hQ Motivation
2. Master/Slave pattern is limiting (and has a lot of overhead)
CPU schedules work to the GPU
Communication overhead (report results next kernel grid size)
18
Slide from “Introduction to Dynamic Parallelism”, Stephen Jones, NVIDIA Corporation
19. hQ
HSA QUEUING MODEL
User mode queuing for low latency dispatch
Application dispatches directly
No OS or driver in the dispatch path
Architected Queuing Layer
Single compute dispatch path for all hardware
No driver translation, direct to hardware
Allows for dispatch to queue from any agent
CPU or GPU
GPU can spawn its
own work
19
Picture from AMD Blog:
hQ: From Master/Slave to Masterpiece
20. ARCHITECTED QUEUEING LANGUAGE
HSA Queues look just like standard
shared memory queues, supporting
multi-producer, single-consumer
Support is allowed for single-producer,
single-consumer
Queues consist of storage, read/write
indices, ID, etc.
Queues are created/destroyed via calls
to the HSA runtime
“Packets” are placed in queues directly
from user mode, via an architected
protocol
Packet format is architected
20
Producer Producer
Consumer
Read Index
Write Index
Storage in
coherent, shared
memory
Packets
Slide Taken from Ian bratt HSA QUEUEING, HotChips 2013
22. WHAT IS HSAIL?
HSAIL is the intermediate language for parallel compute in HSA
Generated by a high level compiler (LLVM, gcc, Java VM, etc)
Low-level IR, close to machine ISA level
Compiled down to target ISA by an IHV “Finalizer”
Finalizer may execute at run time, install time, or build time
Example: OpenCL™ Compilation Stack using HSAIL
22
OpenCL™ Kernel
High-Level Compiler Flow (Developer)
Finalizer Flow (Runtime)
EDG or CLANG
SPIR
LLVM
HSAIL HSAIL
Finalizer
Hardware ISA
Slide Taken from Ben Sander’s HSAIL: Portable Compiler IR FOR HSA, HotChips 2013
23. HSAIL INSTRUCTION SET HIGHLIGHTS
“SIMT” – Single Instruction Multiple Data
ISA is Scalar, describes one serial thread – Parallelism is done by HW
RISC-Like
Load-store architecture
136 opcodes
Fixed number of Registers
1 Control
Pool of 512 bytes
Single
Double
Quad
7 segments of memory
global, read only, group, spill, private, arg, kernarg
23
01: version 0:95: $full : $large;
02: // static method HotSpotMethod<Main.lambda$2(Player)>
03: kernel &run (
04: kernarg_u64 %_arg0 // Kernel signature for lambda method
05: ) {
06: ld_kernarg_u64 $d6, [%_arg0]; // Move arg to an HSAIL register
07: workitemabsid_u32 $s2, 0; // Read the work-item global “X” coord
08:
09: cvt_u64_s32 $d2, $s2; // Convert X gid to long
10: mul_u64 $d2, $d2, 8; // Adjust index for sizeof ref
11: add_u64 $d2, $d2, 24; // Adjust for actual elements start
12: add_u64 $d2, $d2, $d6; // Add to array ref ptr
13: ld_global_u64 $d6, [$d2]; // Load from array element into reg
14: @L0:
15: ld_global_u64 $d0, [$d6 + 120]; // p.getTeam()
16: mov_b64 $d3, $d0;
17: ld_global_s32 $s3, [$d6 + 40]; // p.getScores ()
18: cvt_f32_s32 $s16, $s3;
19: ld_global_s32 $s0, [$d0 + 24]; // Team getScores()
20: cvt_f32_s32 $s17, $s0;
21: div_f32 $s16, $s16, $s17; // p.getScores()/teamScores
22: st_global_f32 $s16, [$d6 + 100]; // p.setPctOfTeamScores()
23: ret;
24: };
25. HIGH-LEVEL SOFTWARE STACK
Programming Languages
OpenCL 2.0
C++ AMP
Java (Aparapi/Sumatra)
HSA Runtime (User Mode Driver)
System Query
Access to JIT Compilers
Access to Queues
JIT Compilers
Offline or online (JIT)
LLVM Compiler (LLVM HSAIL)
HSAIL Finalizer (HSAIL BIN)
Kernel Mode Driver
25
http://www.hsafoundation.com/hsa-developer-tools/
26. HSA OPEN SOURCE SOFTWARE
HSA will feature an open source linux execution and compilation stack
Allows a single shared implementation for many components
Enables university research and collaboration in all areas
Because it’s the right thing to do
26
Component Name IHV or Common Rationale
HSA Bolt Library Common Enable understanding and debug
HSAIL Code Generator Common Enable research
LLVM Contributions Common Industry and academic collaboration
HSAIL Assembler Common Enable understanding and debug
HSA Runtime Common Standardize on a single runtime
HSA Finalizer IHV Enable research and debug
HSA Kernel Driver IHV For inclusion in linux distros
Slide Taken from Phil Rogers “Heterogeneous System Architecture Overview”, HotChips 2013
27. JAVA HETEROGENEOUS
ENABLEMENT ROADMAP
CPU ISA GPU ISA
JVM
Application
APARAPI
GPUCPU
OpenCL™
27
CPU ISA GPU ISA
JVM
Application
APARAPI
HSA CPUHSA CPU
HSA Finalizer
HSAIL
CPU ISA GPU ISA
JVM
Application
APARAPI
HSA CPUHSA CPU
HSA Finalizer
HSAIL
HSA Runtime
LLVM Optimizer
IR
CPU ISA GPU ISA
Sumatra Enabled JVM
Application
HSA CPUHSA CPU
HSA Finalizer
HSAIL
Slide Taken from Phil Rogers “Heterogeneous System Architecture Overview”, HotChips 2013
29. HSA CHALLENGES –
VENDOR SUPPORT
29
Founders
Promoters
Supporters
Contributors
Academic
Slide Taken from Phil Rogers HSA Overview, HotChips 2013
Missing some key players:
Intel, NVIDIA, Apple, Microsoft, Google, …
30. HSA CHALLENGES –
LANGUAGES SUPPORT
HSAIL (or LLVM) is not an attractive level to code at…
Leverage existing parallel languages/paradigms to exploit HSA features:
C++ AMP
OpenCL 2.0 (done!)
OpenMP
Add your favorite …
Extend popular languages to exploit HSA:
Scripting languages: Python
Web languages : HTML5, RoR, Javascript, …
DSL languages
30
31. HSA CHALLENGES –
SECURITY
HSA design had some security measures in mind:
Accelerator supports privilege level, with user and privileged memory
Execute, Read and Write are protected by page table entries
Support in fixed time context sceduling (DoS protection)
But:
Advanced features such as hUMA & uQ are potential back door
OS & Security Apps currently do not monitor the accelerators
Monitoring may require OS changes
Detailed specification can be used to find attack vectors
Some accelerators architecture may introduce a security flaw
Example: local memory on GPU
31
34. HSA AVAILABILITY
Simulators:
HSAEMU – A full system emulator for HSA platforms
Work done by System SW Lab at NTHU (National Tsing Hua University)
http://hsaemu.org/
Code available on GitHub - https://github.com/SSLAB-HSA/HSAemu
HSAIL Simulator
Code available on GitHub - https://github.com/HSAFoundation/HSAIL-Instruction-
Set-Simulator
34
38. hUMA & Discrete GPUs
hUMA can be extended beyond SoC, if the proper HW exists
(such as Hawaii GPU…)
38
Slide from “IOMMUv2: the Ins and Outs of Heterogeneous GPU use”,
AFDS 2012
39. HSAIL AND SPIR
39
Feature HSAIL SPIR
Intended Users
Compiler developers who want to
control their own code generation.
Compiler developers who want a fast
path to acceleration across a wide
variety of devices.
IR Level
Low-level, just above the machine
instruction set High-level, just below LLVM-IR
Back-end code generation Thin, fast, robust.
Flexible. Can include many
optimizations and compiler
transformation including register
allocation.
Where are compiler
optimizations performed?
Most done in high-level compiler,
before HSAIL generation.
Most done in back-end code generator,
between SPIR and device machine
instruction set
Registers Fixed-size register pool Infinite
SSA Form No Yes
Binary format Yes Yes
Code generator for LLVM Yes Yes
Back-end device targets
Modern GPU architectures
supported by members of the HSA
Foundation
Any OpenCL device including GPUs,
CPUs, FPGAs
Memory Model
Relaxed consistency with
acquire/release, barriers, and fine-
grained barriers
Flexible. Can support the OpenCL 1.2
Memory Model
Slide Taken from Ben Sander’s HSAIL: Portable Compiler IR FOR HSA, HotChips 2013
41. OPENCL™ AND HSA
HSA is an optimized platform architecture
for OpenCL™
Not an alternative to OpenCL™
OpenCL™ on HSA will benefit from
Avoidance of wasteful copies
Low latency dispatch
Improved memory model
Pointers shared between CPU and GPU
OpenCL™ 2.0 shows considerable alignment
with HSA
Many HSA member companies are also active
with Khronos in the OpenCL™ working group
41Slide Taken from Phil Rogers HSA Overview, HotChips 2013
42. BOLT — PARALLEL PRIMITIVES
LIBRARY FOR HSA
Easily leverage the inherent power efficiency of GPU computing
Common routines such as scan, sort, reduce, transform
More advanced routines like heterogeneous pipelines
Bolt library works with OpenCL and C++ AMP
Enjoy the unique advantages of the HSA platform
Move the computation not the data
Finally a single source code base for the CPU and GPU!
Developers can focus on core algorithms
Bolt version 1.0 for OpenCL and C++ AMP is available now at
https://github.com/HSA-Libraries/Bolt
42Slide Taken from Phil Rogers HSA Overview, HotChips 2013
43. HSA OPEN SOURCE SOFTWARE
HSA will feature an open source linux execution and compilation stack
Allows a single shared implementation for many components
Enables university research and collaboration in all areas
Because it’s the right thing to do
43
Component Name IHV or Common Rationale
HSA Bolt Library Common Enable understanding and debug
HSAIL Code Generator Common Enable research
LLVM Contributions Common Industry and academic collaboration
HSAIL Assembler Common Enable understanding and debug
HSA Runtime Common Standardize on a single runtime
HSA Finalizer IHV Enable research and debug
HSA Kernel Driver IHV For inclusion in linux distros
44. LINES-OF-CODE AND PERFORMANCE FOR
DIFFERENT PROGRAMMING MODELS
AMD A10-5800K APU with Radeon™ HD Graphics – CPU: 4 cores, 3800MHz (4200MHz Turbo); GPU: AMD Radeon HD 7660D, 6 compute units, 800MHz; 4GB RAM.
Software – Windows 7 Professional SP1 (64-bit OS); AMD OpenCL™ 1.2 AMD-APP (937.2); Microsoft Visual Studio 11 Beta
0
50
100
150
200
250
300
350
LOC
Copy-back Algorithm Launch Copy Compile Init Performance
Serial CPU TBB Intrinsics+TBB OpenCL™-C OpenCL™ -C++ C++ AMP HSA Bolt
Performance
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0Copy-back
Algorithm
Launch
Copy
Compile
Init.
Copy-back
Algorithm
Launch
Copy
Compile
Copy-back
Algorithm
Launch
Algorithm
Launch
Algorithm
Launch
Algorithm
Launch
Algorithm
Launch
(Exemplary ISV “Hessian” Kernel)
44