Heterogeneous Systems Architecture (HSA) can optimize algorithms in video games by leveraging general purpose processing on graphics processing units (GPUs). HSA features like a unified address space and low latency will reduce data copying overhead and latency between the CPU and GPU. This will enable more widespread use of GPU processing for tasks beyond graphics like physics simulation, frustum culling, sorting, and asset decompression. HSA may allow dedicating the on-die GPU in accelerated processing units (APUs) to general purpose algorithms to further optimize game performance.
GPU databases utilize GPUs to accelerate database operations like analytics. In contrast to CPUs which have few cores for serial tasks, GPUs have thousands of cores that enable massively parallel processing better suited for analytics. Popular GPU databases include Kinetica, SqreamDB, and BlazingDB. They can handle larger datasets and more complex queries faster than traditional CPU-based databases by distributing operations across GPU cores. Common uses of GPU databases involve fast data processing, stream analytics, graph processing, and extreme analytics.
The document discusses parallel computing on the GPU. It outlines the goals of achieving high performance, energy efficiency, functionality, and scalability. It then covers the tentative schedule, which includes introductions to GPU computing, CUDA, threading and memory models, performance, and floating point considerations. It recommends textbooks and notes for further reading. It discusses key concepts like parallelism, latency vs throughput, bandwidth, and how GPUs were designed for throughput rather than latency like CPUs. Winning applications are said to use both CPUs and GPUs, with CPUs for sequential parts and GPUs for parallel parts.
A brief technical overview about GPU power consumption and performance, with references to the latest architecture developed by Nvidia: Maxwell and Tegra X1.
Co-Author: Pietro Piscione (https://www.linkedin.com/pub/pietro-piscione/84/b37/926)
The document provides an introduction to GPU computing. It begins with a simple OpenGL program to demonstrate parallelism on the GPU. It then discusses shader programs and lighting effects. Next, it covers using the OpenGL pipeline and GLSL to perform computations by rendering to a framebuffer object. The document introduces compute shaders and compares CUDA and OpenCL programming models. It provides a code example and discusses resources for learning more about GPU computing.
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerAMD Developer Central
The document discusses the challenges of memory access when using a GPU. It describes the programmer's view of memory as a flat address space and how GPUs complicate this model. GPUs have their own memory hierarchies with local memory caches and different types of memory. GPU memory is accessed through specialized APIs that allocate objects like buffers and textures instead of regular malloc memory. This introduces complexity in ensuring coherency between CPU and GPU memory views. The talk will address these memory challenges and how solutions like HSA and hUMA aim to provide a more unified memory model.
A graphics processing unit (GPU) is a microprocessor designed specifically to process graphics. It handles millions of math-intensive processes like 3D rendering per second. This allows real-time 3D graphics on PCs and game consoles that were previously only available on high-end workstations. A GPU takes the computationally intensive graphics tasks off the CPU to improve performance. It has integrated components like transform and lighting engines to handle 3D graphics processing more efficiently than a general-purpose CPU.
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
Presentation Hc-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton at the AMD Developer Summit (APU13) November 11-13, 2013.
GPU databases utilize GPUs to accelerate database operations like analytics. In contrast to CPUs which have few cores for serial tasks, GPUs have thousands of cores that enable massively parallel processing better suited for analytics. Popular GPU databases include Kinetica, SqreamDB, and BlazingDB. They can handle larger datasets and more complex queries faster than traditional CPU-based databases by distributing operations across GPU cores. Common uses of GPU databases involve fast data processing, stream analytics, graph processing, and extreme analytics.
The document discusses parallel computing on the GPU. It outlines the goals of achieving high performance, energy efficiency, functionality, and scalability. It then covers the tentative schedule, which includes introductions to GPU computing, CUDA, threading and memory models, performance, and floating point considerations. It recommends textbooks and notes for further reading. It discusses key concepts like parallelism, latency vs throughput, bandwidth, and how GPUs were designed for throughput rather than latency like CPUs. Winning applications are said to use both CPUs and GPUs, with CPUs for sequential parts and GPUs for parallel parts.
A brief technical overview about GPU power consumption and performance, with references to the latest architecture developed by Nvidia: Maxwell and Tegra X1.
Co-Author: Pietro Piscione (https://www.linkedin.com/pub/pietro-piscione/84/b37/926)
The document provides an introduction to GPU computing. It begins with a simple OpenGL program to demonstrate parallelism on the GPU. It then discusses shader programs and lighting effects. Next, it covers using the OpenGL pipeline and GLSL to perform computations by rendering to a framebuffer object. The document introduces compute shaders and compares CUDA and OpenCL programming models. It provides a code example and discusses resources for learning more about GPU computing.
HC-4018, How to make the most of GPU accessible memory, by Paul BlinzerAMD Developer Central
The document discusses the challenges of memory access when using a GPU. It describes the programmer's view of memory as a flat address space and how GPUs complicate this model. GPUs have their own memory hierarchies with local memory caches and different types of memory. GPU memory is accessed through specialized APIs that allocate objects like buffers and textures instead of regular malloc memory. This introduces complexity in ensuring coherency between CPU and GPU memory views. The talk will address these memory challenges and how solutions like HSA and hUMA aim to provide a more unified memory model.
A graphics processing unit (GPU) is a microprocessor designed specifically to process graphics. It handles millions of math-intensive processes like 3D rendering per second. This allows real-time 3D graphics on PCs and game consoles that were previously only available on high-end workstations. A GPU takes the computationally intensive graphics tasks off the CPU to improve performance. It has integrated components like transform and lighting engines to handle 3D graphics processing more efficiently than a general-purpose CPU.
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
Presentation Hc-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael Wootton at the AMD Developer Summit (APU13) November 11-13, 2013.
This document discusses graphics processing units (GPUs) and GPU rendering. It describes how GPUs assist CPUs in performing complex rendering calculations more quickly. The document outlines some of the latest and most powerful GPU models from AMD and Nvidia, such as the Radeon R9 295X2 and GeForce GTX Titan Z. It also discusses challenges in GPU rendering like realistic lighting simulations and high power consumption. GPU-accelerated computing is increasing performance across many applications by offloading compute-intensive tasks to thousands of GPU cores.
This document provides an overview of graphics processing units (GPUs). It discusses the history and evolution of GPUs, how they work, and their increasing use for general purpose computing beyond just graphics. Specifically, it notes that GPUs were designed for parallel processing of graphics but are now used more broadly due to their high computational power. The document also summarizes key aspects of GPU architecture, programming, applications, and ongoing work to improve GPU computing tools and techniques.
The document provides an overview of graphics processing units (GPUs). It defines a GPU as a processor optimized for graphics, video, and visual computing. GPUs have a highly parallel architecture with thousands of smaller cores designed to handle multiple tasks simultaneously, unlike CPUs which have fewer serial cores. The document compares CPU and GPU architectures, describes the physical components of a GPU including the motherboard, graphics processor, memory, and display connector. It provides details on GPU memory, pipelines, and manufacturers like NVIDIA, AMD, and Intel. The document concludes with information on latest GPU technologies such as CUDA, PhysX, 3D Vision, and examples of high-end consumer GPUs.
This document discusses GPU computing and provides examples of its applications. It summarizes that:
1) GPUs are massively parallel processors that can provide 5-10x higher performance than CPUs for certain tasks like data-intensive computing.
2) Several success stories show speedups of 20-240x using GPUs for applications like EM field simulation, molecular dynamics, and MATLAB simulations.
3) NVIDIA's Professor Partnership program supports academic research by providing GPU equipment, discounts, grants, and research contracts to further GPU computing education and applications.
Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar
A GPU is an electronic circuit that rapidly manipulates memory to accelerate image processing and display. Modern GPUs use parallel processing, rendering each pixel and storing color, location and lighting data. GPUs have dedicated video memory and more cores than CPUs, making them better for processing large blocks of data. The Pascal GPU uses 16nm technology, HBM2 memory, NVLink interconnect and unified memory to improve performance for graphics and deep learning applications like physics simulations and image processing.
This document summarizes the evolution of GPUs and their advantages over CPUs for parallel processing. It discusses how GPUs have become highly parallel, multithreaded processors optimized for graphics and visual computing. The document outlines the key stages in GPU development from basic graphics controllers to massively parallel programmable processors. It provides examples of GPU architectures and how tasks like matrix operations that are well-suited to parallelism can be offloaded from CPUs to GPUs using CUDA to achieve significant performance gains. Test results show GPUs outperforming CPUs for large matrix multiplications, with execution time increasing sublinearly with matrix size for GPUs but superlinearly for CPUs.
A graphics processing unit (GPU) is a specialized microprocessor that accelerates graphics rendering from the central processor. Modern GPUs are highly parallel and efficient at manipulating computer graphics. The top GPUs currently are the Nvidia GeForce GTX 580 and AMD Radeon HD 5970, which have over 500 CUDA cores and 3200 stream processors respectively.
The document discusses the evolution of GPU architecture and capabilities over time. It describes how GPUs have become massively parallel processors with programmable capabilities beyond just graphics. The document outlines the core components of a GPU including the graphics pipeline and programming model. It also discusses how GPUs are well suited for parallel, data-intensive applications and how their capabilities have expanded into general purpose computing through technologies like CUDA.
The document discusses graphics processing units (GPUs). It provides three key points:
1. A GPU is a specialized microprocessor that accelerates the rendering of 2D and 3D graphics. It allows for faster drawing of polygons, textures, and images to improve rendering speed.
2. GPUs were originally developed in the 1990s by Nvidia and helped popularize 3D graphics. Modern GPUs contain hundreds of processing cores and have memory bandwidths up to 230GB/s.
3. GPUs are used across many devices from embedded systems and mobile phones to PCs and game consoles. They accelerate graphics tasks through parallel processing compared to conventional CPUs.
The document discusses graphics processing units (GPUs) and their role in computer graphics. It explains that GPUs are needed to translate binary data from the CPU into images on a monitor, as CPUs fail for high-end graphics work. It then covers the components of a graphics card, techniques like anti-aliasing and shaders, and APIs like OpenGL and DirectX. Finally, it discusses applications of GPUs in areas like gaming, design, movies, and medicine.
GPUs are specialized processors designed for graphics processing. CUDA (Compute Unified Device Architecture) allows general purpose programming on NVIDIA GPUs. CUDA programs launch kernels across a grid of blocks, with each block containing multiple threads that can cooperate. Threads have unique IDs and can access different memory types including shared, global, and constant memory. Applications that map well to this architecture include physics simulations, image processing, and other data-parallel workloads. The future of CUDA includes more general purpose uses through GPGPU and improvements in virtual memory, size, and cooling.
A GPU is a specialized microprocessor that accelerates 3D and 2D graphics. It was defined in 1999 by Nvidia who marketed the first GPU chip. GPUs are efficient at manipulating and displaying computer graphics compared to CPUs. The GPU pipeline receives geometry from the CPU and provides pictures as output. It processes through stages of vertex processing, triangle setup, pixel processing, and writing to memory interfaces like frame buffers and textures. GPUs are used widely in applications like gaming, CAD tools, and computer graphics due to their powerful graphics processing capabilities.
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD, at the Embedded Vision Alliance Summit, May 2014.
Harris Gasparakis, Ph.D., is AMD’s OpenCV manager. In addition to enhancing OpenCV with OpenCL acceleration, he is engaged in AMD’s Computer Vision strategic planning, ISVs, and AMD Ventures engagements, including technical leadership and oversight in the AMD Gesture product line. He holds a Ph.D. in theoretical high energy physics from YITP at SUNYSB. He is credited with enabling real-time volumetric visualization and analysis in Radiology Information Systems (Terarecon), including the first commercially available virtual colonoscopy system (Vital Images). He was responsible for cutting edge medical technology (Biosense Webster, Stereotaxis, Boston Scientific), incorporating image and signal processing with AI and robotic control.
This document provides an overview of graphics cards, including:
- A brief history of graphics cards from the 1970s to modern PCs in 1995.
- Explanations that graphics cards are the GPU and contain dedicated RAM for graphics processing.
- The main manufacturers are Nvidia, AMD, and ATI and cards have heatsinks and fans for cooling.
- Having two graphics cards can prolong life through technologies like SLI and Crossfire by splitting the workload.
- Graphics cards allow for benefits like higher framerates, resolutions, and visual effects in games.
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...AMD Developer Central
This document provides an overview of using the Synthetic Workload Analysis Toolkit (SWAT) and IPython notebooks to analyze big data workloads. SWAT is a software platform that automates the creation, deployment, execution, and data gathering of synthetic compute workloads on clusters. IPython notebooks can be used to interactively explore system logs gathered by SWAT to identify performance bottlenecks and optimize workloads. Graphs of resource utilization are generated to determine if the system is CPU-bound, disk-bound, or network-bound. This analysis helps tune workloads and characterize systems.
This document provides an overview and guidance for a media studies assessment. Section A involves writing about production experiences and creative decision making over two years, applying theoretical contexts. For question 1b, students must analyze one of their productions using a concept from the specification list, such as genre, narrative or representation. Section B involves analyzing media texts and theories. It discusses concepts like democracy, participation, and debates around the impact of new media. Students are encouraged to use examples from their own work and additional research to understand contested views in media theory.
This document discusses graphics processing units (GPUs) and GPU rendering. It describes how GPUs assist CPUs in performing complex rendering calculations more quickly. The document outlines some of the latest and most powerful GPU models from AMD and Nvidia, such as the Radeon R9 295X2 and GeForce GTX Titan Z. It also discusses challenges in GPU rendering like realistic lighting simulations and high power consumption. GPU-accelerated computing is increasing performance across many applications by offloading compute-intensive tasks to thousands of GPU cores.
This document provides an overview of graphics processing units (GPUs). It discusses the history and evolution of GPUs, how they work, and their increasing use for general purpose computing beyond just graphics. Specifically, it notes that GPUs were designed for parallel processing of graphics but are now used more broadly due to their high computational power. The document also summarizes key aspects of GPU architecture, programming, applications, and ongoing work to improve GPU computing tools and techniques.
The document provides an overview of graphics processing units (GPUs). It defines a GPU as a processor optimized for graphics, video, and visual computing. GPUs have a highly parallel architecture with thousands of smaller cores designed to handle multiple tasks simultaneously, unlike CPUs which have fewer serial cores. The document compares CPU and GPU architectures, describes the physical components of a GPU including the motherboard, graphics processor, memory, and display connector. It provides details on GPU memory, pipelines, and manufacturers like NVIDIA, AMD, and Intel. The document concludes with information on latest GPU technologies such as CUDA, PhysX, 3D Vision, and examples of high-end consumer GPUs.
This document discusses GPU computing and provides examples of its applications. It summarizes that:
1) GPUs are massively parallel processors that can provide 5-10x higher performance than CPUs for certain tasks like data-intensive computing.
2) Several success stories show speedups of 20-240x using GPUs for applications like EM field simulation, molecular dynamics, and MATLAB simulations.
3) NVIDIA's Professor Partnership program supports academic research by providing GPU equipment, discounts, grants, and research contracts to further GPU computing education and applications.
Nvidia (History, GPU Architecture and New Pascal Architecture)Saksham Tanwar
A GPU is an electronic circuit that rapidly manipulates memory to accelerate image processing and display. Modern GPUs use parallel processing, rendering each pixel and storing color, location and lighting data. GPUs have dedicated video memory and more cores than CPUs, making them better for processing large blocks of data. The Pascal GPU uses 16nm technology, HBM2 memory, NVLink interconnect and unified memory to improve performance for graphics and deep learning applications like physics simulations and image processing.
This document summarizes the evolution of GPUs and their advantages over CPUs for parallel processing. It discusses how GPUs have become highly parallel, multithreaded processors optimized for graphics and visual computing. The document outlines the key stages in GPU development from basic graphics controllers to massively parallel programmable processors. It provides examples of GPU architectures and how tasks like matrix operations that are well-suited to parallelism can be offloaded from CPUs to GPUs using CUDA to achieve significant performance gains. Test results show GPUs outperforming CPUs for large matrix multiplications, with execution time increasing sublinearly with matrix size for GPUs but superlinearly for CPUs.
A graphics processing unit (GPU) is a specialized microprocessor that accelerates graphics rendering from the central processor. Modern GPUs are highly parallel and efficient at manipulating computer graphics. The top GPUs currently are the Nvidia GeForce GTX 580 and AMD Radeon HD 5970, which have over 500 CUDA cores and 3200 stream processors respectively.
The document discusses the evolution of GPU architecture and capabilities over time. It describes how GPUs have become massively parallel processors with programmable capabilities beyond just graphics. The document outlines the core components of a GPU including the graphics pipeline and programming model. It also discusses how GPUs are well suited for parallel, data-intensive applications and how their capabilities have expanded into general purpose computing through technologies like CUDA.
The document discusses graphics processing units (GPUs). It provides three key points:
1. A GPU is a specialized microprocessor that accelerates the rendering of 2D and 3D graphics. It allows for faster drawing of polygons, textures, and images to improve rendering speed.
2. GPUs were originally developed in the 1990s by Nvidia and helped popularize 3D graphics. Modern GPUs contain hundreds of processing cores and have memory bandwidths up to 230GB/s.
3. GPUs are used across many devices from embedded systems and mobile phones to PCs and game consoles. They accelerate graphics tasks through parallel processing compared to conventional CPUs.
The document discusses graphics processing units (GPUs) and their role in computer graphics. It explains that GPUs are needed to translate binary data from the CPU into images on a monitor, as CPUs fail for high-end graphics work. It then covers the components of a graphics card, techniques like anti-aliasing and shaders, and APIs like OpenGL and DirectX. Finally, it discusses applications of GPUs in areas like gaming, design, movies, and medicine.
GPUs are specialized processors designed for graphics processing. CUDA (Compute Unified Device Architecture) allows general purpose programming on NVIDIA GPUs. CUDA programs launch kernels across a grid of blocks, with each block containing multiple threads that can cooperate. Threads have unique IDs and can access different memory types including shared, global, and constant memory. Applications that map well to this architecture include physics simulations, image processing, and other data-parallel workloads. The future of CUDA includes more general purpose uses through GPGPU and improvements in virtual memory, size, and cooling.
A GPU is a specialized microprocessor that accelerates 3D and 2D graphics. It was defined in 1999 by Nvidia who marketed the first GPU chip. GPUs are efficient at manipulating and displaying computer graphics compared to CPUs. The GPU pipeline receives geometry from the CPU and provides pictures as output. It processes through stages of vertex processing, triangle setup, pixel processing, and writing to memory interfaces like frame buffers and textures. GPUs are used widely in applications like gaming, CAD tools, and computer graphics due to their powerful graphics processing capabilities.
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD, at the Embedded Vision Alliance Summit, May 2014.
Harris Gasparakis, Ph.D., is AMD’s OpenCV manager. In addition to enhancing OpenCV with OpenCL acceleration, he is engaged in AMD’s Computer Vision strategic planning, ISVs, and AMD Ventures engagements, including technical leadership and oversight in the AMD Gesture product line. He holds a Ph.D. in theoretical high energy physics from YITP at SUNYSB. He is credited with enabling real-time volumetric visualization and analysis in Radiology Information Systems (Terarecon), including the first commercially available virtual colonoscopy system (Vital Images). He was responsible for cutting edge medical technology (Biosense Webster, Stereotaxis, Boston Scientific), incorporating image and signal processing with AI and robotic control.
This document provides an overview of graphics cards, including:
- A brief history of graphics cards from the 1970s to modern PCs in 1995.
- Explanations that graphics cards are the GPU and contain dedicated RAM for graphics processing.
- The main manufacturers are Nvidia, AMD, and ATI and cards have heatsinks and fans for cooling.
- Having two graphics cards can prolong life through technologies like SLI and Crossfire by splitting the workload.
- Graphics cards allow for benefits like higher framerates, resolutions, and visual effects in games.
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...AMD Developer Central
This document provides an overview of using the Synthetic Workload Analysis Toolkit (SWAT) and IPython notebooks to analyze big data workloads. SWAT is a software platform that automates the creation, deployment, execution, and data gathering of synthetic compute workloads on clusters. IPython notebooks can be used to interactively explore system logs gathered by SWAT to identify performance bottlenecks and optimize workloads. Graphs of resource utilization are generated to determine if the system is CPU-bound, disk-bound, or network-bound. This analysis helps tune workloads and characterize systems.
This document provides an overview and guidance for a media studies assessment. Section A involves writing about production experiences and creative decision making over two years, applying theoretical contexts. For question 1b, students must analyze one of their productions using a concept from the specification list, such as genre, narrative or representation. Section B involves analyzing media texts and theories. It discusses concepts like democracy, participation, and debates around the impact of new media. Students are encouraged to use examples from their own work and additional research to understand contested views in media theory.
Dennis "Whitey" Lueck has an open garden to the public where he grows about a dozen different fruits and two dozen vegetables, including Marion berries in a circle and a Granny Smith apple tree that started bearing fruit after only two years. Lueck maintains the garden with about 15 minutes of work per day to continue his father's tradition of growing their own food and show people that composting can be clean.
This document provides an overview of how to use Facebook effectively for communications and marketing. It discusses best practices for setting up a Facebook group, including making it globally accessible with a straightforward name and logo, providing fresh content like events and photos, having multiple administrators, and promoting the group. Tips are provided for marketing events through Facebook like using the RSVP feature and updating events if details change. Ways to get the first 25 members include sending a broadcast email with the group link, promoting it on your website and in newsletters, and having officers and board members join.
Stefan Freitag presented on the D-Grid infrastructure in Germany. D-Grid supported multiple middleware platforms like gLite, UNICORE, and Globus Toolkit across over 30,000 CPU cores and 5 petabytes of storage. A reference system was created to help resources install software stacks consistently. A cloud computing prototype was also developed using OpenNebula to utilize idle resources and attract new users. Lessons learned included a lack of adoption of the reference system and legal issues around dual-use technologies and liability in virtual organizations. Future challenges include merging with the German NGI initiative to avoid duplication and better integrating services.
Temple Free Will Baptist Church welcomes visitors with its mission to exalt, evangelize, equip, encourage, and enjoy their walk with God as a Christ-centered, family-oriented church. Visitors can expect to be greeted with a smile and friendly welcome, enjoy singing favorite gospel songs, and receive foundational biblical teachings and preaching. The church offers youth programs, nursery, seniors ministry, ladies ministry, and choir opportunities.
Computer People is a UK-based staffing agency established in 1972 that supplies IT professionals. They employ over 350 staff across 16 offices in the UK and Ireland. They supply permanent, contract, interim and fixed-term IT staff across various roles and specializations. They have a database of over 380,000 IT professionals and receive over 5,600 new and updated CVs per week. They are the number one provider of IT contractors in the UK, supplying over 4,000 contractors daily, and one of the top three providers of permanent IT staff, making over 2,500 placements annually.
This document provides an overview of the topics and objectives covered in the CCNA 1 v3.0 Module 2 course, including data networks, network history, networking devices, network topology, network protocols, LANs, WANs, MANs, SANs, VPNs, bandwidth, digital transfer calculations, analog vs. digital, the OSI model, TCP/IP model, encapsulation, and layer names. It is intended as a reference for instructors but not as a study guide for students.
Ibm web sphere datapower b2b appliance xb60 revealednetmotshop
This document provides an overview of IBM WebSphere DataPower B2B Appliance XB60. It discusses business-to-business integration concepts and technologies. It then describes the XB60 appliance, how it facilitates B2B integration using industry standards, and how data flows through its B2B Gateway service. The document also covers device setup and administrative tasks for the XB60, including initializing the device, defining the base configuration, and configuring domains, groups and users.
This presentations covers:
- Case Study: Advocating for a Safe Place to Play at Stiern Park by Jennifer Lopez, Kaiser Permanente and Jose Pinto, Greenfield Community Resident
- Case Study: Youth Map Healthy Food Options in Historic Filipinotown (LA) by Ailene Ignacio, Asian Pacific Islander Obesity Prevention Alliance
- Using Wikimaps on HealthyCity.org for Community-Engaged Mapping
- Additional Resources
This document provides an agenda and notes for a career development meeting. The agenda includes discussions around strategic planning, CWEX programs, career counselling using Career Cruising, and career education resources. Participants will discuss successes and challenges over the past year, and suggestions for the upcoming year. They will also participate in a career navigation activity called "My Career GPS" to explore their own career paths.
How to Use HealthyCity.org for Uploading Your Own Data Healthy City
Do you collect data about your community? Are you using the best tools to target your services, outreach or organizing efforts? Using www.HealthyCity.org to upload and map the data you gather can help maximize your organization’s efforts. This webinar is for individuals looking to better understand the usefulness of data for planning, advocacy and action. We will discuss the importance of data-driven decision-making, how to layer your data alongside other information available on HealthyCity.org, as well as examples of how user-uploaded data has been utilized for research and advocacy.
In this webinar you will learn:
- How to upload point or thematic data on HealthyCity.org, including how to set up your spreadsheet, input information, and how to transform your survey data into informative maps and charts.
- How other HealthyCity.org users have had success in uploading and assessing their data for community research and advocacy, program planning, grant writing, and more.
- The best ways to take the maps you’ve made on HealthyCity.org and share them in reports or social media.
- About accessing our Help Center, which has a User Guide, video tutorials, and recorded webinars that can help you over any technical hurdles.
In this presentation from 2012, AMD details the potential benefits that developers could take advantage of to leverage additional performance efficiency boosts and parallellism in gaming via utilizing the HSA capabilities of selected silicon.
HSA is an open standard architecture that allows CPUs and GPUs to work together more efficiently. It features a unified memory model, low-latency dispatch between devices, and power management between CPUs and GPUs. The HSA Foundation is working to promote HSA and help developers take advantage of heterogeneous computing through tools and standards. HSA aims to make GPUs and other accelerators accessible to more programmers through languages like C++ AMP and provide performance benefits over traditional graphics-focused programming.
Graphics processing units (GPUs) are increasingly being used for general-purpose computing applications due to their highly parallel and programmable nature. GPU computing uses the GPU alongside the CPU in a heterogeneous model, with the sequential CPU portion handling control flow and passing data to the GPU for parallel intensive computations. GPUs have evolved from fixed-function processors into fully programmable parallel processors. Many applications that require large amounts of parallelism and throughput can benefit from offloading work to the GPU. GPU architectures provide a high degree of parallelism through multiple stream processors that can execute the same instructions on different data sets. Software environments like CUDA and OpenCL allow general-purpose programming of GPUs for applications beyond graphics. Future improvements may include
Graphics processing unit or GPU (also occasionally called visual processing unit or VPU) is a specialized microprocessor that offloads and accelerates graphics rendering from the central (micro) processor. Modern GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms. In CPU, only a fraction of the chip does computations where as the GPU devotes more transistors to data processing.
GPGPU is a programming methodology based on modifying algorithms to run on existing GPU hardware for increased performance. Unfortunately, GPGPU programming is significantly more complex than traditional programming for several reasons.
The document discusses graphics processing units (GPUs) and general-purpose GPU (GPGPU) computing. It explains that GPUs were originally designed for computer graphics but can now be used for general computations through GPGPU. The document outlines CUDA and MPI frameworks for programming GPGPU applications and discusses how GPGPU provides highly parallel processing that is much faster than traditional CPUs. Example applications mentioned include molecular dynamics, bioinformatics, and high performance computing.
The document is a project report on Accelerated Processing Units (APUs) written by Neelesh Vaish. It includes an introduction to APUs, which integrate a CPU and GPU on a single die. It then covers 5 chapters that detail APU capabilities, AMD's role in developing the first APU using its Fusion technology, the APU architecture, how software can help, and a conclusion. The document also includes an index and bibliography citing sources of information.
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
This document summarizes a survey on GPU systems and their performance on different applications. It discusses how GPUs can be used for general purpose computing due to their high parallel processing capabilities. Several computational intensive applications that achieve speedups when implemented on GPUs are described, including video decoding, matrix multiplication, parallel AES encryption, and password recovery for MS office documents. The GPU architecture and Nvidia's CUDA programming model are also summarized. While GPUs provide significant performance benefits, some limitations for non-graphics applications are noted. The conclusion is that GPUs are a good alternative for computational intensive tasks to reduce CPU load and improve performance compared to CPU-only implementations.
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecturemohamedragabslideshare
This document summarizes research on revisiting co-processing techniques for hash joins on coupled CPU-GPU architectures. It discusses three co-processing mechanisms: off-loading, data dividing, and pipelined execution. Off-loading involves assigning entire operators like joins to either the CPU or GPU. Data dividing partitions data between the processors. Pipelined execution aims to schedule workloads adaptively between the CPU and GPU to maximize efficiency on the coupled architecture. The researchers evaluate these approaches for hash join algorithms, which first partition, build hash tables, and probe tables on the input relations.
Graphics Processing Unit GPU is a processor or electronic chip for graphics. GPUs are massively parallel processors used widely used for 3D graphic and many non graphic applications. As the demand for graphics applications increases, GPU has become indispensable. The use of GPUs has now matured to a point where there are countless industrial applications. This paper provides a brief introduction on GPUs, their properties, and their applications. Matthew N. O. Sadiku | Adedamola A. Omotoso | Sarhan M. Musa "Graphics Processing Unit: An Introduction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-1 , December 2019, URL: https://www.ijtsrd.com/papers/ijtsrd29647.pdf Paper URL: https://www.ijtsrd.com/engineering/electrical-engineering/29647/graphics-processing-unit-an-introduction/matthew-n-o-sadiku
This document provides an overview of graphics processing units (GPUs). It discusses the history and evolution of GPUs, how they work, and their increasing use for general purpose computing beyond just graphics. Specifically, it outlines how GPUs were originally designed to process graphics but are now highly parallel processors that can be used to accelerate complex computations. It also summarizes some of the key components of GPUs and how their performance advantages have led to a growing field of GPU computing.
Game engines have long been in the forefront of taking advantage of the ever
increasing parallel compute power of both CPUs and GPUs. This talk is about how the
parallel compute is utilized in practice on multiple platforms today in the Frostbite game
engine and how we think the parallel programming models, hardware and software in
the industry should look like in the next 5 years to help us make the best games possible.
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...AMD Developer Central
Keynote, Developers: The Heart of AMD Innovation, by Dr. Lisa Su, Senior VP and GM, Global Business Units, AMD, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
AMD held a developer summit to share updates on their APU and GPU products. They discussed how computing demands are increasing for gaming, simulations and cloud applications. AMD's APUs combine CPU and GPU capabilities on a single chip. Their newest APU, codenamed "Kaveri", will feature heterogeneous system architecture capabilities. It will offer improved graphics and efficiency over previous APU designs. AMD also unveiled their new Radeon R9 290X GPU and discussed how both products will benefit from lower-level APIs like Mantle.
Graphic cards are probably the most complex piece of personal computers nowadays, not only by its hardware but also for its huge software stack, from drivers to ray tracing games. Bad shaders, buggy drivers, faulty hardware, ... there are many occasions for the a GPU to hang and reset. Giving that the programs that it runs are Turing complete, we can't assure that everything will work so resets are expected. What we can do as developers is to assure that the user experience on those incidents will be smooth as possible. To solve this kind of problem, all the stack needs to be work side by side. This talk will explore some scenarios where GPU resets happen, how different DRM drivers recover from it and what's in the roadmap to make it better for users.
Image Processing Application on Graphics processorsCSCJournals
In this work, we introduce real time image processing techniques using modern programmable Graphic Processing Units GPU. GPU are SIMD (Single Instruction, Multiple Data) device that is inherently data-parallel. By utilizing NVIDIA new GPU programming framework, “Compute Unified Device Architecture” CUDA as a computational resource, we realize significant acceleration in image processing algorithm computations. We show that a range of computer vision algorithms map readily to CUDA with significant performance gains. Specifically, we demonstrate the efficiency of our approach by a parallelization and optimization of image processing, Morphology applications and image integral.
Graphics Processing Units (GPUs) are specialized electronic circuits designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. GPUs have evolved from simple video controllers to massively parallel multi-core processors capable of general purpose computing. Modern GPUs use a single-instruction, multiple-thread (SIMT) architecture and have a parallel programming model that maps computations to thousands of threads to maximize throughput. Programming frameworks like CUDA and OpenCL allow general purpose programming on GPUs by mapping algorithms to their highly parallel structure.
This document discusses GPU computing and provides comparisons between CPU and GPU architectures and performance. It begins by introducing hybrid clusters that use accelerators like GPUs and FPGAs to provide high-performance computation. GPUs are discussed as being highly parallel and suitable for general-purpose computations. The document then summarizes GPU architecture and programming models like CUDA and OpenCL that are used to program GPUs. It provides an example GPU hardware architecture and explains how programming models map applications to GPU resources. Benchmark results are mentioned as showing GPUs can provide significantly faster computation times than CPUs for parallel problems.
This document discusses GPU computing and provides comparisons between CPU and GPU architectures and performance. It begins by introducing hybrid clusters that use accelerators like GPUs and FPGAs to provide high-performance computation. GPUs are discussed as being highly parallel and suitable for general-purpose computations. The document then summarizes GPU architecture and programming models like CUDA and OpenCL that are used to program GPUs. It provides an example GPU hardware architecture and explains how programming models map applications to GPU resources. Benchmark results are mentioned as showing GPUs can provide significantly faster computation times than CPUs for parallel problems.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
WeTestAthens: Postman's AI & Automation Techniques
GPGPU algorithms in games
1. GPGPU ALGORITHMS IN GAMES
How Heterogeneous Systems Architecture can be
leveraged to optimize algorithms in video games
Matthijs De Smedt
Nixxes Software B.V.
Lead Graphics Programmer
2. CONTENTS
A short introduction
Current usage of GPGPU in games
Heterogeneous Systems Architecture
Examples made possible by HSA
| HSA Algorithms in Games | June 13th, 2012
4. VIDEOGAMES
Games are near real-time simulations
CPU
Response time is key Input
Most systems run in sync with the output frequency
– Rendering 60 frames per second
– Allows for 16ms of processing time Simulate
Framerate is limited either by:
– GPU
Render
– CPU
– Display (VSync)
GPU
Render
| HSA Algorithms in Games | June 13th, 2012
5. HARDWARE
Typical hardware target for PC games:
– One multicore CPU
– One GPU
Multiple GPUs: CrossFire
– Transparent to the application
– Driver alternates frames between GPUs
GPUs are becoming more general purpose:
– General Purpose GPU algorithms (GPGPU)
CrossFire
| HSA Algorithms in Games | June 13th, 2012
7. INTRODUCTION TO GPGPU
Rendering is a sequence of parallel algorithms
GPUs are great at parallel computation
Evolution of hardware and software to general purpose
First GPGPU was accomplished with programmable rendering
– DirectX
– OpenGL
Second generation using dedicated GPGPU APIs:
– CUDA
– OpenCL
– DirectCompute
Third generation of GPGPU on the way:
– Heterogeneous Systems Architecture
| HSA Algorithms in Games | June 13th, 2012
8. GPGPU IN GAMES
Some GPGPU algorithms are being used in
games right now. For example:
– Physics
Particles
Fluid simulation
Destruction
– Specialized graphics algorithms
Post-processing
All these algorithms drive visual effects
GPU particle system by Fairlight
| HSA Algorithms in Games | June 13th, 2012
9. CURRENT PHYSICS EXAMPLE
GPGPU particle simulation using DirectCompute
CPU
Great for simulating thousands of visible particles
Results of simulation are never copied back to CPU Call GPU
– Can not interfere with gameplay
– Not synced in networked games
Example: Smoke particles that affect game AI GPU
Simulate
particles
Render
particles
| HSA Algorithms in Games | June 13th, 2012
10. GPGPU LIMITATIONS
Why isn’t GPGPU used more for non-graphics?
Latency
– DirectX has many layers and buffers
– DirectX commands are buffered up to multiple frames
– Actual execution on the GPU is delayed
Copy overhead
– GPU cannot directly access application memory
– Must copy all data from and to the application
Functionality
– Constrained programming models
| HSA Algorithms in Games | June 13th, 2012
12. HETEROGENEOUS SYSTEMS ARCHITECTURE
New hardware and software
Hardware Software
New features on discrete GPUs "Drivers"
Accelerated Processing Unit – HSA provides a new, thin Compute API
– Next generation processor – Very low latency
– Multiple CPU and GPU cores on – Unified Address Space
the same die – Exposes more hardware capabilities
– Shared memory access HSA Intermediate Language
– Soon to be as widespread as – Virtual ISA
multicore CPUs
– Introduces CPU programming features to the GPU
| HSA Algorithms in Games | June 13th, 2012
13. USING THE APU
Distinction between two hardware configurations
APU without discrete GPU
– Found in many laptops, soon in many desktops
– Use the on-die GPU for rendering
APU with discrete GPU:
– Hard-core gamers will still use discrete GPUs
– Asymmetrical CrossFire
– Or: Dedicate the on-die GPU to Compute algorithms
Could result in massive speedup of algorithms
Using SIMD co-processors to offload the CPU is familiar to PS3 developers
| HSA Algorithms in Games | June 13th, 2012
14. COPY OVERHEAD
Current Compute APIs require the application to explicitly copy all input and output memory
– Copying can easily takes longer than processing on CPU!
– Only small datasets or very expensive computations benefit from GPGPU
HSA introduces a Unified Address Space for CPU and GPU memory
– CPU pointers on the GPU
– Virtual memory on the GPU
Paging over PCI-Express (discrete) or shared memory controller (APU)
– Fully coherent
– Will make GPGPU an option for many more algorithms
| HSA Algorithms in Games | June 13th, 2012
15. LATENCY
DirectX commands are buffered
When the GPU is fully loaded this buffer is saturated
Delay between scheduling and executing a GPGPU program on a busy GPU can take multiple frames
– Results will be several frames behind
– Game simulation needs all objects to be in sync
GPGPU is currently impractical to use for anything but visual effects
| HSA Algorithms in Games | June 13th, 2012
20. LATENCY
HSA’s new Compute API will reduce latency
How to deal with a saturated GPU?
A second GPU
– Dedicate the APU to Compute
– Virtually no latency
HSA feature: Graphics pre-emption
– Context switching on the GPU
Interrupt a graphics task (typically a large command list)
Execute Compute algorithm
Switch back to graphics
– Can be used both on discrete GPUs or on the APU
Choose the solution best suited to your needs
| HSA Algorithms in Games | June 13th, 2012
21. APU USAGE EXAMPLE
Schedule Execute
DirectCompute
GPU
CPU
HSA
Frame
Execute
| HSA Algorithms in Games | June 13th, 2012
22. PROGRAMMING MODEL
HSA Intermediate Language: HSAIL
Designed for parallel algorithms
JIT compiles your algorithm to CPU or GPU hardware
– Also makes multi-core SIMD programming easy!
High level language features
– Object-oriented programming
– Virtual functions
– Exceptions
Debugging
SysCall support
– I/O
| HSA Algorithms in Games | June 13th, 2012
24. PHYSICS
Current GPGPU physics solutions only output to
the renderer
With HSA you can simulate physics on the GPU
and get the results back in the same frame
Use hardware acceleration to compute physics for
gameplay objects
Reduced CPU load
More objects, higher fidelity
| HSA Algorithms in Games | June 13th, 2012
25. FRUSTUM CULLING
Videogames tend to be GPU-bound
Avoid rendering what cannot be seen
Cull objects outside the camera viewport
– Test the bounding box of every object against
the camera frustum
– Currently done on the CPU
– Lots of vector math
– Can be computed completely in parallel!
CPU needs the results immediately
– HSA will allow low-latency execution
| HSA Algorithms in Games | June 13th, 2012
26. OCCLUSION CULLING
Objects may be hidden behind others: Occlusion
Final per-pixel occlusion is only known after
rendering the scene
Approximate occlusion by rendering low-detail
geometry
– This kind of occlusion culling is currently being
done on CPU or on SPUs
– Rendering is better suited to GPUs
HSA solution:
– Software rasterization in Compute on the GPU
– HSA does not yet expose graphics pipeline
Software occlusion culling in Battlefield 3
– Still much faster than a multicore CPU
| HSA Algorithms in Games | June 13th, 2012
27. SORTING
Typically several long lists per frame need sorting
Sorting on the GPU using a parallel sort algorithm
– Ken Batcher: Bitonic or Odd-even mergesort
Copy overhead currently negates the performance
advantage of using a GPU sorting algorithm
HSA solution:
– Unified Address Space
– GPU can sort in-place in system memory
| HSA Algorithms in Games | June 13th, 2012
28. ASSET DECOMPRESSION
Game assets are stored compressed on disk
Decompression is expensive
The usage of some compression algorithms is
prevented by CPU speed
Games are moving away from loading screens
An APU with Unified Address Space
– Can be used to decompress new assets
without taxing the CPU or discrete GPU
– Perhaps even use HSAIL I/O to read from disk
– A better streaming experience for gamers
| HSA Algorithms in Games | June 13th, 2012
29. PATHFINDING
Some strategy games simulate thousands of units
Pathfinding over complex terrain with thousands of
moving units is very expensive
Clever approximate solutions are often used
– Supreme Commander 2 “Flow field”
GPGPU pathfinding with HSA
– Use one GPU thread per unit to do a deep
search for an optimal path
– With HSA such an algorithm can page all
requisite data from system memory and write
back found paths
– APU could be fully saturated with pathfinding
without impacting framerate
| HSA Algorithms in Games | June 13th, 2012
30. CONCLUSION
Many algorithms in games are suitable for offloading to the GPU
Heterogeneous Systems Architecture solves two major obstacles
– Latency
– Memory access
HSAIL allows for entirely new kinds of GPGPU programs
APUs can be used to offload the CPU
HSA will finally make GPUs available to developers as full-featured co-processors
| HSA Algorithms in Games | June 13th, 2012
31. THANK YOU
Any questions?
| HSA Algorithms in Games | June 13th, 2012
32.
33. Disclaimer & Attribution
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions
and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited
to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product
differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no
obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to
make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.
NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO
RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
INFORMATION.
ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY
DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL
OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF
EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in
this presentation are for informational purposes only and may be trademarks of their respective owners.
The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and
opinions presented in this presentation may not represent AMD’s positions, strategies or opinions. Unless explicitly stated, AMD is
not responsible for the content herein and no endorsements are implied.
| HSA Algorithms in Games | June 13th, 2012