The document discusses the history and evolution of 3D graphics technologies including OpenGL and DirectX, provides an overview of GPU programming models and architectures, and explores how GPUs are increasingly being used for general purpose computing beyond just graphics through technologies like CUDA and OpenCL. It also highlights how GPUs can provide significant performance gains for parallel applications compared to CPUs.
The document discusses the history and evolution of 3D graphics and GPUs, including how graphics processing has expanded from rendering 3D scenes to general purpose computing through technologies like CUDA, OpenCL, and DirectCompute. It also outlines how GPUs are now being used for high performance computing due to their highly parallel architecture and massive floating point processing capabilities. The talk concludes by discussing some key applications of GPU computing beyond just graphics.
This document provides an introduction and overview of GPUs for both 3D graphics and high performance parallel computing. It discusses:
1) How GPUs accelerated the 3D graphics pipeline and enabled real-time rendering of 3D scenes and games.
2) How GPUs are now being used for general purpose computing (GPGPU) due to their highly parallel architecture and ability to handle massive threading. This allows GPUs to accelerate computationally intensive applications beyond just graphics.
3) The advantages of using GPUs for high performance parallel computing applications, including their high floating point performance, inherent parallelism, and ability to provide supercomputing power at a fraction of the cost of traditional CPU-based supercomputers
PEER 1 Offers NVIDIA GPU to Accelerate High Performance Applications
PEER 1 has teamed up with NVIDIA the creator of the GPU and a world leader in visual computing, to provide high performance GPU Cloud applications. NVIDIA’s GPUs are well known for making customer software run faster and PEER 1 is offering a number of services that run on NVIDA’s GPUs. PEER 1’s cloud service is built on NVIDIA Telsa GPU’s delivering supercomputing performance in the cloud to solve much tougher problems. Click here to find out how PEER 1 and NVIDIA can transform your business.
Cameron Swen is the Divisional Marketing Manager for AMD’s Embedded Solutions Division. He is responsible for outbound marketing and works with AMDs customers to develop and market board and system level solutions to serve the COTS market.
An FPGA-based Scalable Simulation Accelerator for Tile Architectures @HEART2011Shinya Takamaeda-Y
1. An FPGA-based Scalable Simulation Accelerator called ScalableCore is presented for simulating Tile architectures like the M-Core manycore processor.
2. ScalableCore partitions the target processor across multiple FPGAs, with each FPGA representing a "ScalableCore Unit" containing part of the processor. Units are connected via a "ScalableCore Board" to simulate the entire processor faster.
3. An initial ScalableCore system was implemented to simulate the M-Core manycore processor with up to 64 cores distributed across 64 ScalableCore Units/FPGAs. This allows simulation speed to scale with the number of FPGAs used.
Creating next-gen VR and MR experiences using Varjo VR-1 and XR-1 - Unite Cop...Unity Technologies
The developers of Varjo VR-1 learned a lot about human eye resolution and the demands it puts on virtual reality (VR) content. In these slides, you'll explore what next-generation VR can mean for your VR experiences. Learn about what matters the most when it comes to visual quality, the possible caveats, and the role performance requirements play in this equation.
Speaker:
Mikko Strandborg - Varjo
This document discusses implementing depth of field (DOF) effects on CPUs. It begins with an introduction to DOF and techniques for generating the effect, including traditional methods like Poisson disk and Gaussian blur as well as more advanced summed area table techniques. It then demonstrates a DOF explorer application that allows comparing different DOF techniques on GPUs and with CPU offloading. Performance results are shown for various DOF techniques on Sandy Bridge processors, finding speedups from CPU offloading for advanced techniques. The document aims to showcase techniques for implementing DOF on CPUs and compare their performance to GPU implementations.
OpenGL - point & line design
introduce the construction of displayers (CRT, Flat-panel, LCD, PDP, projector...)
those render is based on graphic skills (point & line)
The document discusses the history and evolution of 3D graphics and GPUs, including how graphics processing has expanded from rendering 3D scenes to general purpose computing through technologies like CUDA, OpenCL, and DirectCompute. It also outlines how GPUs are now being used for high performance computing due to their highly parallel architecture and massive floating point processing capabilities. The talk concludes by discussing some key applications of GPU computing beyond just graphics.
This document provides an introduction and overview of GPUs for both 3D graphics and high performance parallel computing. It discusses:
1) How GPUs accelerated the 3D graphics pipeline and enabled real-time rendering of 3D scenes and games.
2) How GPUs are now being used for general purpose computing (GPGPU) due to their highly parallel architecture and ability to handle massive threading. This allows GPUs to accelerate computationally intensive applications beyond just graphics.
3) The advantages of using GPUs for high performance parallel computing applications, including their high floating point performance, inherent parallelism, and ability to provide supercomputing power at a fraction of the cost of traditional CPU-based supercomputers
PEER 1 Offers NVIDIA GPU to Accelerate High Performance Applications
PEER 1 has teamed up with NVIDIA the creator of the GPU and a world leader in visual computing, to provide high performance GPU Cloud applications. NVIDIA’s GPUs are well known for making customer software run faster and PEER 1 is offering a number of services that run on NVIDA’s GPUs. PEER 1’s cloud service is built on NVIDIA Telsa GPU’s delivering supercomputing performance in the cloud to solve much tougher problems. Click here to find out how PEER 1 and NVIDIA can transform your business.
Cameron Swen is the Divisional Marketing Manager for AMD’s Embedded Solutions Division. He is responsible for outbound marketing and works with AMDs customers to develop and market board and system level solutions to serve the COTS market.
An FPGA-based Scalable Simulation Accelerator for Tile Architectures @HEART2011Shinya Takamaeda-Y
1. An FPGA-based Scalable Simulation Accelerator called ScalableCore is presented for simulating Tile architectures like the M-Core manycore processor.
2. ScalableCore partitions the target processor across multiple FPGAs, with each FPGA representing a "ScalableCore Unit" containing part of the processor. Units are connected via a "ScalableCore Board" to simulate the entire processor faster.
3. An initial ScalableCore system was implemented to simulate the M-Core manycore processor with up to 64 cores distributed across 64 ScalableCore Units/FPGAs. This allows simulation speed to scale with the number of FPGAs used.
Creating next-gen VR and MR experiences using Varjo VR-1 and XR-1 - Unite Cop...Unity Technologies
The developers of Varjo VR-1 learned a lot about human eye resolution and the demands it puts on virtual reality (VR) content. In these slides, you'll explore what next-generation VR can mean for your VR experiences. Learn about what matters the most when it comes to visual quality, the possible caveats, and the role performance requirements play in this equation.
Speaker:
Mikko Strandborg - Varjo
This document discusses implementing depth of field (DOF) effects on CPUs. It begins with an introduction to DOF and techniques for generating the effect, including traditional methods like Poisson disk and Gaussian blur as well as more advanced summed area table techniques. It then demonstrates a DOF explorer application that allows comparing different DOF techniques on GPUs and with CPU offloading. Performance results are shown for various DOF techniques on Sandy Bridge processors, finding speedups from CPU offloading for advanced techniques. The document aims to showcase techniques for implementing DOF on CPUs and compare their performance to GPU implementations.
OpenGL - point & line design
introduce the construction of displayers (CRT, Flat-panel, LCD, PDP, projector...)
those render is based on graphic skills (point & line)
The document discusses JavaTM Platform, Micro Edition Part 8 – Mobile 3D Graphics, which defines the Mobile 3D Graphics API (JSR 184) for creating 3D graphics on Java ME-powered mobile devices. JSR 184 allows developers to load 3D content from files into scene graphs and render them using classes like Graphics3D and World. The API provides both immediate and retained rendering modes as well as tools for creating, loading, and modifying 3D scenes programmatically or from model files.
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsFisnik Kraja
This document summarizes the performance evaluation of synthetic aperture radar (SAR) image reconstruction on CPUs and GPUs. It describes porting the SAR application to NVIDIA CUDA GPUs and compares the performance results on CPUs with 8 and 16 threads and single and dual GPU configurations. Using GPUs provided better performance than CPUs, especially for large-scale images. A heterogeneous CPU+GPU approach improved performance over the CPU-only version by reducing data transfers between the processors. The best results were achieved with a pipelined dual-GPU implementation that reconstructed separate images in parallel to minimize data movement.
This document provides instructions for installing Poser Pro software. It begins with an overview of key new features in Poser Pro such as network rendering capabilities and 64-bit rendering. It then covers installing Poser Pro, including requirements for Windows and Mac systems. Instructions are also provided for installing the optional Queue Manager for managing network rendering jobs. The document concludes with brief descriptions of technical features and capabilities within Poser Pro such as gamma correction, normal mapping, COLLADA import/export, and hosting plug-ins for other 3D software.
The document discusses the evolution of compute APIs from early graphics APIs like CUDA and CTM to current standards like OpenCL and DirectCompute. It summarizes the key aspects of the 1st generation APIs, including their execution model based on graphics processing and caveats identified by developers. The document proposes that the 2nd generation of APIs will be better suited to current hardware designed for compute by adopting a task-based execution model that maps more directly to multi-threaded CPU and GPU architectures.
This document provides an overview of multimedia application development capabilities on the Android platform. It discusses the major classes for playing, recording, and manipulating audio and video like MediaPlayer, MediaRecorder, SoundPool, and AudioTrack. It also covers graphics APIs like OpenGL-ES for processing images and textures. The document aims to explain what multimedia features are available in Android and how they can be used to build media consumption and production applications.
The document discusses the development of the video game Ghajini: The Game based on the Bollywood film. It describes the producer's role, sales of 25-30k copies, and collaboration with Intel. It outlines the production cycle and discusses technical decisions like using Photoshop to burn lighting into textures. Key features developed include the 3D menu, comic book cutscenes, AI pathfinding, and integrating profiling tools. Some planned features like ranged weapons were omitted to stay true to the source material. Game balance was achieved by adjusting parameters like enemy health and dodge rates on different difficulty levels.
This document discusses using GPUs for image processing instead of CPUs. It notes that GPUs have much higher peak performance than CPUs, growing from 5,000 triangles/second in 1995 to 350 million triangles/second in 2010. However, GPU programming is more complex than CPUs due to the different architecture and programming model. This can make it harder to implement algorithms on GPUs and to optimize for high efficiency. The document proposes a methodology for GPU acceleration including characterizing algorithms, estimating performance, using models like Roofline to analyze bottlenecks, and benchmarking. It also describes establishing a competence center to help others overcome the challenges of GPU programming.
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...Owen Wu
This document discusses code submissions to Unreal Engine 3 to enhance graphics capabilities. It covers additions of phong tessellation and optimizations for tessellation. It also discusses support for multi-monitor configurations through Eyefinity and improvements to bokeh depth of field and post-process anti-aliasing techniques. The presentation provides information on implementation details and performance comparisons for these techniques.
This document provides an overview and summary of features for a 2D/3D CAD solution. It is a fast, powerful, and compatible solution that is widely applied in mechanical, architectural, electrical, and other fields. It offers high quality, stability, new functions, plug-ins, and APIs. It ensures high productivity, speed, and low cost. The product has a history of compatibility with AutoCAD formats and functions. New exciting features highlighted include dynamic block editing, 3D lofting, DWF underlays, hide/isolate objects, and oriented plug-ins for tasks like polyline booleans, CAD sheet to Excel export, PDF to DXF conversion, and super hatching.
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
The document discusses Mali GPU architecture and Arm Mobile Studio. It provides details on Mali GPU components like Bifrost shader cores and tile-based rendering. It also describes features such as index-driven vertex shading, forward pixel kill, and efficient render passes. The document concludes with an overview of the Arm Mobile Studio tools for profiling GPU and CPU performance on mobile devices.
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Fisnik Kraja
This document summarizes research on parallelization techniques for a 2D Fourier matched filtering and interpolation (2DFMFI) synthetic aperture radar (SAR) algorithm. It describes testing the algorithm on shared-memory and distributed-memory architectures. For shared memory, the algorithm was efficiently parallelized but limited by hardware resources. For distributed memory, communication overhead increased with resources from other nodes. Hybrid MPI+OpenMP implementations improved scalability by reducing communication and memory usage. Pipelining processing steps also improved performance by reducing idle time between images. In conclusion, the goal is finding the right balance between performance, power, size, and heat for different architectures.
Google I/O 2013 - Android Graphics PerformanceDouO
Engineers from the Android UI Graphics team will show some tips, tricks, tools, and techniques for getting the best performance and smoothest UI for your Android applications.
This document summarizes the specifications of the SNC-DH160 Network HD Mini Dome Camera. The camera has a 1.3 megapixel CMOS sensor, supports 720p HD video recording at 30 fps using H.264 compression, and has built-in infrared illuminators for low-light recording up to 49 feet away. It is IP66 and IK10 rated for outdoor use and has features like motion detection, dual streaming, and Power over Ethernet connectivity.
[Unite Seoul 2020] Mobile Graphics Best Practices for ArtistsOwen Wu
This document discusses best practices for mobile graphics optimization in Unity for artists. It covers topics like texturing, geometry, shaders, and frame rendering. For texturing, it recommends techniques like mipmapping, bilinear filtering, texture compression, and channel packing. For geometry, it suggests avoiding small/thin triangles and duplicating vertices while using instancing. For shaders, it discusses precision, early Z-testing, overdraw reduction, and dynamic branching. For frame rendering, it recommends reducing state switches and framebuffer writes/clears.
This document summarizes the verification methodology landscape. It discusses languages, methodologies, tools and standards used for hardware verification including OVM, VMM, and eRM. It also covers topics like interoperability between methodologies and convergence of approaches.
This document describes a case study of a staged migration from e to SystemVerilog at a company designing SERDES chips. It discusses advantages like reduced risk and training staff in small groups. Technical challenges addressed include coordinating simulation timelines and communicating between testbench parts in different languages. Solutions involved making SVTB the master, writing testcases as if fully converted, and using Verilog to pass info between languages. A proof of concept showed the converted approach. Supporting multiple simulators involved using a tool that connects a Pioneer-based SVTB to DUTs in other simulators to avoid lowest common denominator issues.
The document discusses various metrics used to measure CPU verification progress including architectural verification, uArchitecture verification, formal verification, and system level verification. It outlines metrics such as functional coverage conditions, bug rates, RTL lines of change, and a health of the model score. Secondary metrics include cycles run, licenses used, and bugs caught at different levels.
The document discusses three major problems in verification: specifying properties to check, specifying the environment, and computational complexity. It then presents several approaches to addressing these problems, including using coverage metrics tailored to detection ability, sequential equivalence checking to avoid testbenches, and "perspective-based verification" using minimal abstract models focused on specific property classes. This allows verification earlier in design when changes are more tractable and catches bugs before implementation.
The document discusses STMicroelectronics' deployment of functional qualification methodologies using Certitude mutation analysis. It outlines ST's initial engagement with Certess in 2004 and how they have expanded usage of the technology to now cover 80% of ST's IPs. The document also provides details on ST's functional qualification methodology, sharing of best practices, detection strategies used, and two case studies on measuring quality of third-party IPs and detecting issues in a video codec design.
1) The document discusses the importance of attitude in validation work, noting that attitude is more important than tools or techniques.
2) It emphasizes that nothing is perfect and all designs have bugs or shortcomings due to compromises, schedules, and unknowns. Accidents are inevitable in engineering work which pushes designs to their limits.
3) The document provides several examples of past engineering failures to illustrate issues like normalization of deviance, unexpected interactions in complex systems, and overreliance on untested assumptions. It stresses the importance of questioning everything, fighting urges to relax requirements, and trusting nothing without proper testing.
This document provides an overview of IBM's mainline functional verification of its POWER7 processor core. It first gives background on the history and roadmap of POWER processors. It then outlines the verification methodology, execution, advances, and concludes with a summary. The POWER7 is IBM's next generation processor that features a multi-core design, on-chip eDRAM, power optimization, and memory subsystem improvements. It follows over 20 years of POWER processors and continues IBM's leadership in this area.
The document discusses JavaTM Platform, Micro Edition Part 8 – Mobile 3D Graphics, which defines the Mobile 3D Graphics API (JSR 184) for creating 3D graphics on Java ME-powered mobile devices. JSR 184 allows developers to load 3D content from files into scene graphs and render them using classes like Graphics3D and World. The API provides both immediate and retained rendering modes as well as tools for creating, loading, and modifying 3D scenes programmatically or from model files.
Performance Evaluation of SAR Image Reconstruction on CPUs and GPUsFisnik Kraja
This document summarizes the performance evaluation of synthetic aperture radar (SAR) image reconstruction on CPUs and GPUs. It describes porting the SAR application to NVIDIA CUDA GPUs and compares the performance results on CPUs with 8 and 16 threads and single and dual GPU configurations. Using GPUs provided better performance than CPUs, especially for large-scale images. A heterogeneous CPU+GPU approach improved performance over the CPU-only version by reducing data transfers between the processors. The best results were achieved with a pipelined dual-GPU implementation that reconstructed separate images in parallel to minimize data movement.
This document provides instructions for installing Poser Pro software. It begins with an overview of key new features in Poser Pro such as network rendering capabilities and 64-bit rendering. It then covers installing Poser Pro, including requirements for Windows and Mac systems. Instructions are also provided for installing the optional Queue Manager for managing network rendering jobs. The document concludes with brief descriptions of technical features and capabilities within Poser Pro such as gamma correction, normal mapping, COLLADA import/export, and hosting plug-ins for other 3D software.
The document discusses the evolution of compute APIs from early graphics APIs like CUDA and CTM to current standards like OpenCL and DirectCompute. It summarizes the key aspects of the 1st generation APIs, including their execution model based on graphics processing and caveats identified by developers. The document proposes that the 2nd generation of APIs will be better suited to current hardware designed for compute by adopting a task-based execution model that maps more directly to multi-threaded CPU and GPU architectures.
This document provides an overview of multimedia application development capabilities on the Android platform. It discusses the major classes for playing, recording, and manipulating audio and video like MediaPlayer, MediaRecorder, SoundPool, and AudioTrack. It also covers graphics APIs like OpenGL-ES for processing images and textures. The document aims to explain what multimedia features are available in Android and how they can be used to build media consumption and production applications.
The document discusses the development of the video game Ghajini: The Game based on the Bollywood film. It describes the producer's role, sales of 25-30k copies, and collaboration with Intel. It outlines the production cycle and discusses technical decisions like using Photoshop to burn lighting into textures. Key features developed include the 3D menu, comic book cutscenes, AI pathfinding, and integrating profiling tools. Some planned features like ranged weapons were omitted to stay true to the source material. Game balance was achieved by adjusting parameters like enemy health and dodge rates on different difficulty levels.
This document discusses using GPUs for image processing instead of CPUs. It notes that GPUs have much higher peak performance than CPUs, growing from 5,000 triangles/second in 1995 to 350 million triangles/second in 2010. However, GPU programming is more complex than CPUs due to the different architecture and programming model. This can make it harder to implement algorithms on GPUs and to optimize for high efficiency. The document proposes a methodology for GPU acceleration including characterizing algorithms, estimating performance, using models like Roofline to analyze bottlenecks, and benchmarking. It also describes establishing a competence center to help others overcome the challenges of GPU programming.
[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...Owen Wu
This document discusses code submissions to Unreal Engine 3 to enhance graphics capabilities. It covers additions of phong tessellation and optimizations for tessellation. It also discusses support for multi-monitor configurations through Eyefinity and improvements to bokeh depth of field and post-process anti-aliasing techniques. The presentation provides information on implementation details and performance comparisons for these techniques.
This document provides an overview and summary of features for a 2D/3D CAD solution. It is a fast, powerful, and compatible solution that is widely applied in mechanical, architectural, electrical, and other fields. It offers high quality, stability, new functions, plug-ins, and APIs. It ensures high productivity, speed, and low cost. The product has a history of compatibility with AutoCAD formats and functions. New exciting features highlighted include dynamic block editing, 3D lofting, DWF underlays, hide/isolate objects, and oriented plug-ins for tasks like polyline booleans, CAD sheet to Excel export, PDF to DXF conversion, and super hatching.
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
The document discusses Mali GPU architecture and Arm Mobile Studio. It provides details on Mali GPU components like Bifrost shader cores and tile-based rendering. It also describes features such as index-driven vertex shading, forward pixel kill, and efficient render passes. The document concludes with an overview of the Arm Mobile Studio tools for profiling GPU and CPU performance on mobile devices.
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Fisnik Kraja
This document summarizes research on parallelization techniques for a 2D Fourier matched filtering and interpolation (2DFMFI) synthetic aperture radar (SAR) algorithm. It describes testing the algorithm on shared-memory and distributed-memory architectures. For shared memory, the algorithm was efficiently parallelized but limited by hardware resources. For distributed memory, communication overhead increased with resources from other nodes. Hybrid MPI+OpenMP implementations improved scalability by reducing communication and memory usage. Pipelining processing steps also improved performance by reducing idle time between images. In conclusion, the goal is finding the right balance between performance, power, size, and heat for different architectures.
Google I/O 2013 - Android Graphics PerformanceDouO
Engineers from the Android UI Graphics team will show some tips, tricks, tools, and techniques for getting the best performance and smoothest UI for your Android applications.
This document summarizes the specifications of the SNC-DH160 Network HD Mini Dome Camera. The camera has a 1.3 megapixel CMOS sensor, supports 720p HD video recording at 30 fps using H.264 compression, and has built-in infrared illuminators for low-light recording up to 49 feet away. It is IP66 and IK10 rated for outdoor use and has features like motion detection, dual streaming, and Power over Ethernet connectivity.
[Unite Seoul 2020] Mobile Graphics Best Practices for ArtistsOwen Wu
This document discusses best practices for mobile graphics optimization in Unity for artists. It covers topics like texturing, geometry, shaders, and frame rendering. For texturing, it recommends techniques like mipmapping, bilinear filtering, texture compression, and channel packing. For geometry, it suggests avoiding small/thin triangles and duplicating vertices while using instancing. For shaders, it discusses precision, early Z-testing, overdraw reduction, and dynamic branching. For frame rendering, it recommends reducing state switches and framebuffer writes/clears.
This document summarizes the verification methodology landscape. It discusses languages, methodologies, tools and standards used for hardware verification including OVM, VMM, and eRM. It also covers topics like interoperability between methodologies and convergence of approaches.
This document describes a case study of a staged migration from e to SystemVerilog at a company designing SERDES chips. It discusses advantages like reduced risk and training staff in small groups. Technical challenges addressed include coordinating simulation timelines and communicating between testbench parts in different languages. Solutions involved making SVTB the master, writing testcases as if fully converted, and using Verilog to pass info between languages. A proof of concept showed the converted approach. Supporting multiple simulators involved using a tool that connects a Pioneer-based SVTB to DUTs in other simulators to avoid lowest common denominator issues.
The document discusses various metrics used to measure CPU verification progress including architectural verification, uArchitecture verification, formal verification, and system level verification. It outlines metrics such as functional coverage conditions, bug rates, RTL lines of change, and a health of the model score. Secondary metrics include cycles run, licenses used, and bugs caught at different levels.
The document discusses three major problems in verification: specifying properties to check, specifying the environment, and computational complexity. It then presents several approaches to addressing these problems, including using coverage metrics tailored to detection ability, sequential equivalence checking to avoid testbenches, and "perspective-based verification" using minimal abstract models focused on specific property classes. This allows verification earlier in design when changes are more tractable and catches bugs before implementation.
The document discusses STMicroelectronics' deployment of functional qualification methodologies using Certitude mutation analysis. It outlines ST's initial engagement with Certess in 2004 and how they have expanded usage of the technology to now cover 80% of ST's IPs. The document also provides details on ST's functional qualification methodology, sharing of best practices, detection strategies used, and two case studies on measuring quality of third-party IPs and detecting issues in a video codec design.
1) The document discusses the importance of attitude in validation work, noting that attitude is more important than tools or techniques.
2) It emphasizes that nothing is perfect and all designs have bugs or shortcomings due to compromises, schedules, and unknowns. Accidents are inevitable in engineering work which pushes designs to their limits.
3) The document provides several examples of past engineering failures to illustrate issues like normalization of deviance, unexpected interactions in complex systems, and overreliance on untested assumptions. It stresses the importance of questioning everything, fighting urges to relax requirements, and trusting nothing without proper testing.
This document provides an overview of IBM's mainline functional verification of its POWER7 processor core. It first gives background on the history and roadmap of POWER processors. It then outlines the verification methodology, execution, advances, and concludes with a summary. The POWER7 is IBM's next generation processor that features a multi-core design, on-chip eDRAM, power optimization, and memory subsystem improvements. It follows over 20 years of POWER processors and continues IBM's leadership in this area.
This document discusses the challenges of pre-silicon validation for Intel Xeon processors. It notes that Xeon validation teams have relatively small sizes compared to the scope of validation required. Key challenges include reusing design components from previous projects, managing cross-site teams, and dealing with ever-growing design complexity that strains simulation and formal verification methods. Specific issues involve integrating disparate design tools and environments, understanding the original intent when reusing unfinished code, minimizing duplicated stimulus code, managing the overhead of coverage instrumentation, and ensuring tests are portable between pre-silicon and post-silicon validation.
The document describes Cisco's Base Environment methodology for digital verification. It aims to standardize the verification process, promote reuse, and improve predictability. The methodology defines a common testbench topology and infrastructure that is vertically scalable from unit to system level and horizontally scalable across projects. It provides templates, scripts, verification IP and documentation to help teams set up verification environments quickly and leverage existing best practices. The standardized approach facilitates extensive code and test reuse and delivers benefits such as faster ramp-up times, improved planning, and higher return on verification IP development.
The document discusses various graphics optimization techniques for game development using Unity, including reducing draw calls through batching, profiling to identify bottlenecks, using level of detail for 3D models, image-based lighting, and physically based rendering. It provides examples of optimizing a game by changing from deferred to forward rendering, using GPU skinning, reducing polygon counts, and comparing different lighting approaches. The goal of optimization is to remove bottlenecks by properly analyzing performance using tools like the Unity profiler across different target devices and settings.
Gentek is a middleware and solution for MMOG development that aims to help teams quickly build production lines and products. It provides a mature and stable foundation that reduces technical risks and costs. Key features include graphics, networking, server architecture, tools, gameplay modules, and technical support. Gentek can help reduce schedules by 3-4 times and costs by 2-3 times compared to building games from scratch. It has been used successfully in several published MMOG titles in China.
Video replay: http://nvidia.fullviewmedia.com/siggraph2012/ondemand/SS104.html
Date: Wednesday, August 8, 2012
Time: 11:50 AM - 12:50 PM
Location: SIGGRAPH 2012, Los Angeles
Attend this session to get the most out of OpenGL on NVIDIA Quadro and GeForce GPUs. Learn about the new features in OpenGL 4.3, particularly Compute Shaders. Other topics include bindless graphics; Linux improvements; and how to best use the modern OpenGL graphics pipeline. Learn how your application can benefit from NVIDIA's leadership driving OpenGL as a cross-platform, open industry standard.
Get OpenGL 4.3 beta drivers for NVIDIA GPUs from http://www.nvidia.com/content/devzone/opengl-driver-4.3.html
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondMark Kilgard
Location: Conference Hall K, Singapore EXPO
Date: Thursday, November 29, 2012
Time: 11:00 AM - 11:50 PM
Presenter: Mark Kilgard (Principal Software Engineer, NVIDIA, Austin, Texas)
Abstract: Attend this session to get the most out of OpenGL on NVIDIA Quadro and GeForce GPUs. Learn about the new features in OpenGL 4.3, particularly Compute Shaders. Other topics include bindless graphics; Linux improvements; and how to best use the modern OpenGL graphics pipeline. Learn how your application can benefit from NVIDIA's leadership driving OpenGL as a cross-platform, open industry standard.
Topic Areas: Computer Graphics; Development Tools & Libraries; Visualization; Image and Video Processing
Level: Intermediate
유니티 그래픽 최적화, 어디까지 해봤니 (Optimizing Unity Graphics) Unite Seoul Ver.ozlael ozlael
This document provides guidance on optimizing graphics in Unity. It discusses common bottlenecks like draw calls and provides techniques for reducing them, such as batching and profiling. Specific optimization techniques covered include image-based lighting, shadow mapping, physically based rendering, and level of detail systems. The document emphasizes identifying and addressing bottlenecks through profiling on target devices.
This document discusses GPU accelerated computing and programming with GPUs. It provides characteristics of GPUs from Nvidia, AMD, and Intel including number of cores, memory size and bandwidth, and power consumption. It also outlines the 7 steps for programming with GPUs which include building and loading a GPU kernel, allocating device memory, transferring data between host and device memory, setting kernel arguments, enqueueing kernel execution, transferring results back, and synchronizing the command queue. The goal is to achieve super parallel execution with GPUs.
This document discusses the implementation of a finite impulse response (FIR) filter on a graphics processing unit (GPU). It outlines how FIR filters can be represented using textures on the GPU and implemented using fragment programs. The performance of FIR filters and related transformations implemented on the GPU is evaluated. Texture upload and download between GPU and main memory accounts for up to 60% of the total processing time. While GPU computation is faster than CPU for these algorithms, optimization techniques from CPU programming do not always apply to the GPU.
The document discusses graphics processing units (GPUs). It begins with an introduction and definition of GPUs as processors designed specifically for processing 3D graphics. It then covers the components of a GPU and compares GPU and CPU architectures. Specifically, it notes that GPUs have many parallel execution units while CPUs have few, and that GPUs have significantly faster memory interfaces than CPUs. The document concludes by noting that GPU development is ongoing and faster GPUs can be expected in the future.
This document discusses a lecture on GPU architecture given by Mark Kilgard at the University of Texas on March 6, 2012. The lecture covers the architecture of graphics processing units and how they have evolved over the past six years. It also includes an in-class quiz, information about homework and projects, and the professor's office hours.
Presented as a pre-conference tutorial at the GPU Technology Conference in San Jose on September 20, 2010.
Learn about NVIDIA's OpenGL 4.1 functionality available now on Fermi-based GPUs.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/tools-for-creating-next-gen-computer-vision-apps-on-snapdragon-a-presentation-from-qualcomm/
Judd Heape, Vice President of Product Management for Camera, Computer Vision and Video Technology at Qualcomm, presents the “Tools for Creating Next-Gen Computer Vision Apps on Snapdragon” tutorial at the May 2022 Embedded Vision Summit.
The Snapdragon Mobile Platform powers the world’s best smartphones, XR headsets, PCs, wearables, cars and IoT products. Thanks to Snapdragon, these products feature powerful computer vision technologies that you can tap into to build next-gen apps. Inside Snapdragon is a hardware engine dedicated to computer vision–the Engine for Visual Analytics (EVA). EVA hardware acceleration gives developers access to high-performance, low-power computer vision functions to enhance apps that rely on advanced camera or video processing.
The EVA includes a motion processing unit, a feature descriptor unit, a depth estimation unit, a geometric correction unit and an object detection unit. These blocks power high-level functions such as electronic image stabilization, multi-frame HDR, face detection and real-time bokeh. In this presentation, Heape does a deep-dive into EVA’s Software Developer Kit (SDK) and available APIs, such as Optical Flow and Depth from Stereo, and explores how these features can be integrated into your apps.
GPUs are dedicated parallel processors that are optimized for accelerating graphical computations. They have many execution units and faster memory interfaces than CPUs in order to process large amounts of graphical data efficiently. The GPU pipeline receives geometry from the CPU as input and provides pictures as output, going through stages like vertex processing, triangle setup, pixel processing, and output merging. GPUs are highly programmable and widely used for applications like gaming, shading, and global illumination. Future advancements may include more processing cores, tighter integration with CPUs, and fully programmable hardware.
PG-Strom - A FDW module utilizing GPU deviceKohei KaiGai
PG-Strom is a module that utilizes GPUs to accelerate query processing in PostgreSQL. It uses a foreign data wrapper to push query execution to the GPU. Benchmark results show a query running 10 times faster on a table using the PG-Strom FDW compared to a regular PostgreSQL table. Future plans include supporting writable foreign tables, accelerating sort and aggregate operations using the GPU, and inheritance between regular and foreign tables. Help from the community is needed to review code, provide large real-world datasets, and understand common analytic queries.
Adobe AIR - Mobile Performance – Tips & TricksMihai Corlan
This document provides an overview and tips for optimizing mobile performance in Adobe AIR applications. It discusses understanding the mobile landscape, choosing between CPU and GPU rendering modes, caching display objects, and general optimization tricks like avoiding memory leaks and heavy code execution. The document also covers Flex considerations and potential bottlenecks to focus on for optimization.
This document discusses Core Image, Apple's framework for processing images and video on iOS and OS X. It provides over 90 filters that can be combined in chains to apply effects like sepia tone, blur, distortion, and more. The framework renders filters efficiently on the GPU. The document demonstrates how to use Core Image filters to build an app called Hipstaroid that applies photo effects to live camera images.
The document provides an overview of graphics processing units (GPUs). It defines a GPU as a processor optimized for graphics, video, and visual computing. GPUs have a highly parallel architecture with thousands of smaller cores designed to handle multiple tasks simultaneously, unlike CPUs which have fewer serial cores. The document compares CPU and GPU architectures, describes the physical components of a GPU including the motherboard, graphics processor, memory, and display connector. It provides details on GPU memory, pipelines, and manufacturers like NVIDIA, AMD, and Intel. The document concludes with information on latest GPU technologies such as CUDA, PhysX, 3D Vision, and examples of high-end consumer GPUs.
The document discusses NVIDIA graphics hardware over seven years, the Cg programming language, and transparency techniques. It describes the evolution of NVIDIA GPUs and features like GeForce cards, increased processing power, and support for DirectX. It promotes Cg as a cross-platform language for GPU programming. It also explains the depth peeling algorithm for rendering transparency in real-time using multiple rendering passes.
The document discusses how shaders are created and validated for graphics processing units (GPUs). Shaders are created by applications and sent to the GPU through graphics APIs and drivers. They are then executed by the GPU's shader processors. The validation process uses layered testbenches at the sub-block, block, and system levels for maximum controllability and observability. It also employs a reference model methodology using C++ models and hardware emulation to debug designs faster than simulation alone. This methodology helps improve the schedule and find bugs earlier in the development cycle.
The document is a presentation on verification of graphics ASICs given by Shaw Yang and Gary Greenstein of AMD. The presentation covers an overview of AMD, GPU systems, 3D graphics basics including vertices, polygons, pixels and textures, verification challenges related to size and complexity, and approaches used including layered code and testbenches, hardware emulation, and functional coverage.
The document discusses the importance of using verification metrics to predict the functional closure of a CPU design project and discusses challenges in relying solely on metrics. It outlines two key types of metrics - verification test plan based metrics that track testing progress and health of the design metrics that assess bug rates and stability. Examples are provided on using bug rate data and breaking bugs down by design unit to help evaluate the progress and health of a verification effort.
The document discusses efficient verification methodology. It recommends defining a conceptual framework or methodology to standardize some aspects while allowing diversity. The methodology should define interfaces and transactions upfront using an interface definition language to generate verification components and reusable assertions. It also recommends modeling systems at the transaction level using executable specifications to frontload the verification schedule.
The document discusses the challenges of validating next generation CPUs. It notes that validation is increasingly critical for product success but requires constant innovation. Design complexity is growing exponentially, requiring up to 70% of resources for functional validation. The number of pre-silicon logic bugs found per generation has also increased significantly. Shorter timelines and cross-site development further complicate the validation process.
The document discusses validation and design in small teams with limited resources. It proposes constraining designs to a single clock rate, using FIFO interfaces between blocks, and separating algorithm from IO verification to simplify validation. This approach allows designs to be completed more quickly with fewer verification engineers through standardized, repeatable validation methods at the cost of optimal performance.
Verification challenges have increased with the globalization of chip design. Time zone differences and documentation issues can reduce efficiency, but greater collaboration across sites can also lead to new ideas. AMD addresses these challenges through a Verification Center of Expertise (COE) that coordinates methodologies across multiple sites. The COE develops tools and techniques while partnering with project teams to jointly improve processes over time through continuous review and rotation of engineers between the COE and projects.
Greg Tierney of Avid presented on their experiences using SystemC for design verification. SystemC provides hardware constructs and simulation capabilities in C++. Avid chose SystemC to enhance their existing C++ verification code and take advantage of its industry acceptance and built-in verification features. SystemC helped Avid solve issues like crossing language boundaries between HDL modules and testbenches, connecting ports and channels, implementing randomization, using multi-threaded processes, and defining module hierarchies. However, Avid also encountered issues with SystemC like slow compile/link times and limitations in its foreign language interface.
Bob Colwell documented notes from a meeting discussing the need for better software visualization tools to help localize bugs, diagnose problems, and monitor software behavior. The notes also reflect on important words in science according to Isaac Newton and reference a book about creative analogies. Finally, they caution against agreeing to sign a document just because a product is shipping.
The document outlines the verification strategy for a PCI-Express presenter device. It discusses the PCI-Express protocol overview including terminology, hierarchy and functions at various layers. It emphasizes the importance of design-for-verification using techniques like modular architectures, standardized interfaces and reference models to aid in functional verification closure and compliance testing. Performance verification is also highlighted as critical given the real-time requirements of the standard.
The document discusses verification strategies for PCI-Express. It outlines the PCI-Express protocol and highlights challenges in verifying chips that implement open standards. The verification paradigm focuses on functionality, performance, interoperability, reusability, scalability, and comprehensiveness using techniques like constrained-random testing, assertions, reference models, emulation, and compliance checkers. The goal is to deliver compliant and high-performing chips with zero bugs through an effective verification methodology.
The document discusses methodologies for improving verification efficiency at Cisco. It advocates separating testbench creation into three stages: component design, testbench integration, and testcase creation. It also recommends using standardized methodologies like testflow to synchronize component behavior, reusing unit-level component models and checkers, linking transactions between checkers, and generating common testbench infrastructure from templates to reduce duplication of effort. The key is pushing reusable behavior into components and standardizing common elements to maximize efficiency.
This document discusses the importance of pre-silicon verification for post-silicon validation. It notes that post-silicon validation schedules are growing due to increasing design complexity, while pre-silicon verification investment and methodologies have not kept pace. The document highlights mixed-signal verification, power-on/reset verification, and design-for-testability verification as key focus areas needed to improve pre-silicon verification and enable faster post-silicon validation. It provides examples of mixed-signal and power-on bugs that were found post-silicon due to insufficient pre-silicon verification of these areas. The document argues that pre-silicon verification must move beyond just functional verification and own mixed-signal effects
This document discusses challenges in low-power design and verification. It addresses why low-power is now a priority given trends in mobile applications. Key challenges include increased leakage due to process scaling, accounting for active leakage, and handling process variations. The document also discusses low-power design methodologies, including multiple power domains, voltage scaling, and clock gating. Verification challenges are presented, such as needing good test patterns and coordination across design domains. Overall power analysis is more complex than timing analysis due to its pattern dependence and need to optimize for performance per watt.
Verilog-AMS allows for mixed-signal modeling and simulation in a single language. It provides benefits like simplified mixed-signal modeling, decreased simulation time, and improved mixed-signal verification. Previous solutions involved using two simulators or approximating analog circuits, which caused issues like slow simulation and lack of analog results. Verilog-AMS uses constructs from Verilog and Verilog-A to model both analog and digital content together. This avoids issues with interface elements between domains.
This document discusses the verification of Intel's Atom processor. It describes the key verification challenges, methodology used, and results. The main challenges were verifying a new microarchitecture with aggressive schedules and limited resources. The methodology involved cluster-level validation, functional coverage, architectural validation, and formal verification. Metrics like coverage, bug rates, and a "health of model" indicator were used. The results showed a successful pre-silicon verification with few escapes and debug/survivability features working as intended. Key learnings included the importance of keeping the full-chip design healthy early and putting equal focus on testability features.
The document discusses verification strategies based on Sun Tzu's classic book "The Art of War". Some key points:
1. Sun Tzu emphasized understanding the objective conditions and subjective opinions of competitors to determine strategic positioning. This relates to verification where it is important to understand the design and "Murphy the Designer".
2. Sun Tzu's 13 chapters provide guidance on tactics like laying plans, attacking weaknesses, maneuvering, and using intelligence sources. These lessons can help verification engineers successfully navigate different stages of a competitive campaign against bugs and errors.
3. Effective verification requires knowing the design, understanding one's own verification process, preparing appropriate tools, and using feedback to improve. Coverage metrics alone do
Here are the key challenges faced in low power design without a common power format:
1. Domain definitions, level shifters, isolation cells, and other low power techniques are specified differently in each tool using tool-specific commands files and languages. This makes cross-tool consistency and validation difficult.
2. Power functionality cannot be easily verified at the RTL level without changing the RTL code, since power domains and low power techniques are not represented. This limits verification coverage.
3. Iteration between design creation and verification is difficult, since changes to the low power implementation require updates to multiple tool-specific specification files rather than a single cross-tool definition. This impacts design schedule and risks inconsistencies.
4.
This document discusses various metrics used to measure the progress and health of CPU verification. It describes architectural verification to ensure implementation meets specifications, as well as unit architecture and system level verification. Key metrics include pass rates for legacy tests, functional coverage, bug rates, lines of code changes, and a health of the model score to measure convergence. Secondary metrics like cycles run, bugs found at different levels, and test bench quality are also outlined.
This document discusses Freescale's verification of the QorIQ communication platform containing the CoreNet fabric using SystemVerilog. It describes the verification challenges, methodology used, and verification IP developed. Key aspects included developing a SystemVerilog testbench, CoreNet VIP, and hierarchical verification. This approach successfully verified the CoreNet platform and resulted in first silicon sampling to customers within 3 weeks with no major functional bugs found.
1. An Introduction to GPU
3D Games to HPC
Krishnaraj Rao
Presented at Bangalore DV Club, 03/12/2010
2. Agenda
3D Graphics
The Big Picture
Quick Overview
Programming Model
Importance of 3D
High Performance Parallel Computing
Why GPUs for HPPC?
Available APIs
GPU Computing architecture
Q&A
3. The Big Picture ! Movies
Capture Models Scene Rendering Post
API Processing
Creation
Creation
4. The Big Picture - Games
Capture Models Scene Rendering Post
API Processing
!"#$%
Drivers
Creation HLSL,
Creation Cg
5. Models end up in World Space
Worldspace includes everything!
Position and orientation for all
items is needed to accurately calculate
transformations into screen space.
Light Source
Y
Z View Point
or Camera
Screen
X
World Coordinate Space
7. Simple Interactive 3D Graphics App
A simple example
Static scene geometry, Vertex
Setup Raster
Fragment Raster
Engine Engine Ops
moving viewer
Repeat this loop: Z Cull Texture
CPU takes user input from
joystick or mouse
CPU re-calculates viewer
position, view direction,
and light positions in 3-D
world space
GPU clears memory and Update Viewer
Read Draw all
draws the complete scene Joystick Position and Light Scene
geometry with the new Position Direction Objects
viewer and light positions
Repeat forever
8. Adding Programmability to the
Graphics Pipeline
3D Application
or Game
3D API
Commands
3D API:
OpenGL or
Direct3D
CPU ! GPU Boundary
GPU Assembled
Command & Polygons, Pixel
Data Stream Vertex Index Lines, and Location Pixel
Stream Points Stream Updates
GPU
Primitive Rasterization & Raster
Front Framebuffer
Assembly Interpolation Operations
End
Rasterized
Pre-transformed Transformed Pre-transformed Transformed
Vertices Vertices Fragments Fragments
Programmable Programmable
Vertex Fragment
Processor Processor
9. A History of Innovation
1995 1999 2002 2003 2004 2005 2006-2007
NV1 GeForce 256 GeForce4 GeForce FX GeForce 6 GeForce 7 GeForce 8
1 Million 22 Million 63 Million 130 Million 222 Million 302 Million 754 Million
Transistors Transistors Transistors Transistors Transistors Transistors Transistors
2008
GeForce GTX 200
1.4 Billion
Transistors
"#$but what do all these extra transistors do?
NVIDIA Confidential
10. GPU continues to offload CPU work
Geom Geom Triangle Pixel
Z / Blend
Gather Proc Proc Proc
1996
CPU GPU
Geom Geom Triangle Pixel
Z / Blend
Gather Proc Proc Proc 2000
CPU GPU
Scene Physics Geom Geom Triangle Pixel
Z / Blend
Mgmt and AI Gather Proc Proc Proc
2004
CPU GPU
Scene Physics Geom Geom Triangle Pixel
Z / Blend
Mgmt and AI Gather Proc Proc Proc
2008
CPU GPU
11. Programming Model
API: Set of functions, procedures or classes
that an OS, library or service provides to
support requests made by computer
programs
DirectX: Collection of APIs to handle
multimedia, esp. game programming and
video tasks, on MS platforms.
OpenGL (Open Graphics Library) is a
standard specification defining a cross-
language, cross-platform API for writing
applications that produce 2D and 3D
computer graphics.
12. Why is 3D Graphics important?
More than just Fun and Games....
Tokyo, Japan California Coastline
16. GPU Processing Power
CPU, meet your new partner!
'9C
$"# !"#
#;<2=&>012&?@&AB, !"#$#%&'()&*+,
-&/0123 *-.&/0123
4.*&'6789: 45.-&(6789:
>9C
17. Beyond Graphics
With floating-point math and textures, graphics
processors can be used for more than just graphics
%&%&'$($)%*+*,-.$&/,012*$3140/56+7$1+$%&'28
Lots of ongoing research mapping algorithms and
problems onto programmable GPUs
Solving Linear Equations
Black-Scholes Options Pricing
Rigid- and Soft-Body Dynamics
Middleware layers being developed to accelerate
)*9*$:-+;98$7-4*$0<926:2$1+$%&'2$=HavokFX)
18. What is GPGPU ?
General Purpose computation using GPU
in applications other than 3D graphics
GPU accelerates critical path of application
Data parallel algorithms leverage GPU attributes
Large data arrays, streaming throughput
Fine-grain SIMD parallelism
Floating point (FP) computation
%,*-5$>1,$)*4?-,,-226+7.9$0-,-..*.8$-.71,65<42
Applications ! see //GPGPU.org
Game effects (FX) physics, image processing
Physical modeling, computational engineering, matrix
algebra, convolution, correlation, sorting
19. Why Computation on the GPU?
A quiet buildup of potential
Calculation Throughput and Memory Bandwidth: 10X
Equivalent performance at fraction of power & cost
GPU in every PC ! pervasive presence and massive impact
%&'2$<-@*$-.A-92$?**+$0-,-..*.$)4/.56-:1,*8
Natively designed to handle massive threading
Every pixel is a thread
Increased precision (fp32), programmability, flexibility
GPUs are a mass-market parallel processor
Economies of scale
Peak floating point performance is much higher than comparable
CPUs
ATI x1900XT Intel Core 2 Duo E6600
!$400 (video card) !$400 (processor only)
!250 GFLOPs (SP Float) !40 GFLOPS (SP Float)
!46 GB main memory BW !8.5 GB main memory BW
20. Why Computation on the GPU?
Supercomputing Performance
Inherently Parallel Architecture
1000+ cores, massively parallel processing
250x the compute performance of a PC
Personal
)B+*$Researcher, One C/0*,:140/5*,8
Supercomputer in a desktop system
Plugs into standard power strip
Accessible
Program in C, C++, Fortran for Windows or Linux
Available from OEMs and resellers worldwide and priced
like a workstation
21. Compute Applications
Computational Fluid Dynamics Data Mining, Analytics &
Computer Aided Engineering Databases
Digital Content Creation MATLAB Acceleration
Electronic Design Automation Molecular Dynamics
Finance Weather, Atmospheric, Ocean
Game Physics Modeling, and Space Sciences
Graphics Libraries
Imaging and Computer Vision Oil & Gas
Medical Imaging Programming Tools
Numerics Ray Tracing
Bio-Informatics and Life Signal Processing
Sciences Video & Audio
Computational Chemistry
Computational
Electromagnetics &
Electrodynamics
24. APIs for Heterogeneous Computing
CUDA (Compute Unified Device Architecture) is a
parallel computing architecture developed by NVIDIA.
Programmers use 'C for CUDA' (C with NVIDIA
extensions), compiled through a PathScale Open64 C
compiler, to code algorithms for execution on the
GPU. Both low/high level APIs are provided
OpenCL (Open Computing Language) is a framework
for writing programs that execute across
heterogeneous platforms consisting of CPUs, GPUs,
and other processors.
Microsoft DirectCompute is an API that supports
General-purpose computing on GPUs on Microsoft
Win Vista or Win 7. DirectCompute is part of the
Microsoft DirectX collection of APIs.
26. OpenCL: Platform Model & Program Structure
One Host+ one or more Compute Devices
Each Compute Device is composed of one or more
Compute Units
Each Compute Unit is further divided into one or more
Processing Elements
27. CUDA Parallel Computing Architecture
ISA and hardware
compute engine
Includes a C-compiler
plus support for
OpenCL and
DX11 Compute
Architected to natively
support all
computational
interfaces
(standard languages
and APIs)
28. Option 1
OpenCL and C for CUDA
Entry point for
C for CUDA developers who
prefer high-level C
Entry point for
developers who want OpenCL
low-level API
Shared back-end
compiler and PTX
optimization
technology
GPU
29. CUDA SuccessDScience & Computation
Not 2x or 3x, but speedups are 20x to 150x
146X 36X 18X 50X 100X
Medical Molecular Video Matlab Astrophysic
Imaging Dynamics Transcoding Computing s
U of Utah U of Illinois, Elemental Tech AccelerEyes RIKEN
Urbana
149X 47X 20X 130X 30X
Financial Linear Algebra 3D Quantum Gene
simulation Universidad Ultrasound Chemistry Sequencing
Oxford Jaime Techniscan U of Illinois, U of Maryland
Urbana
30. 100x more affordable
20x less power
Tesla
250x consumption Personal
Supercomputer
Performance
Supercomputing
Cluster 250x
Faster
1x E1;-9F2
Workstations
$100K - $1M < $10 K
Accessibility
32. Grand Computing Challenges
Personalized Mathematics for Information
Renewable
Medicine Scientific Data Mining
Energy
Discovery
Machines That Natural Human Predict
Economic
Think Machine Environmental
Analysis
Interaction Changes
33. Final Thoughts
GPU and heterogeneous parallel
architecture will revolutionize computing
Parallel computing needed to solve some of
the most interesting and important human
challenges ahead
Learning parallel programming is imperative
for students in computing and sciences
34. From Virtua Fighter to Tsubame
1995 ! NV1 2008 ! GT200
0.8M transistors 1,200M transistors
50MHz 1.3GHz
1M Bytes 4G Bytes
0 GFLOPS 1 TFLOPS
Another 1000x in 15 years?
38. OpenGL ES
Designed for hand-held and embedded devices
Goal is smaller footprint to support OpenGL
PlayStation 3 and cell phone industry adopting ES
OpenGL ES 1.1
Strips out anything deemed extra in OpenGL
Keeps conventional fixed-function vertex and fragment
processing
OpenGL ES 2.0
Adds programmable vertex and fragment shaders
Shaders specified in binary format
Drops support for fixed-function vertex and fragment
processing
39. OpenGL ES ! Cont
OpenGL ES 1.0 : Symbian OS, Android Platform
OpenGL ES 1.0+ : Playstation 3
OpenGL ES 1.1 : iPhone SDK, Bberry (Some Models)
Open GL ES 2.0 : iPhone 3GS, iPOD touch