This document discusses using SPURS tasks on the Cell Broadband Engine to offload work from the PowerPC Processor Element (PPE) to the Synergistic Processor Elements (SPEs). It provides an example of using a SPURS task to parallelize an object update method across SPEs. By DMA transferring the object data to SPE local stores, executing the update method on each SPE, and transferring the data back, a 5x speedup was achieved over executing the method on the PPE alone. Further optimization could be gained by overlapping the DMA transfers with computation.
PlayStation®3 Leads Stereoscopic 3D Entertainment World Slide_N
PlayStation 3 leads the stereoscopic 3D entertainment world as every PS3 supports S3D. Games are ideal for S3D as they are immersive and don't mind wearing glasses. While 2.5D/3D chips could improve rendering performance for better S3D, challenges include cost, testing, reliability, and supporting multiple suppliers. Future S3D technologies may include personal 3D displays, head mounted displays, motion tracking, and interactive 3D experiences.
Optimization for Making Stereoscopic 3D Games on PlayStation® (PS3™)Slide_N
This document provides an overview of optimization techniques for developing stereoscopic 3D games on the PlayStation 3. It discusses how stereoscopic 3D works, including setting up dual cameras and managing parallax. Implementation details for the PS3 are covered, such as frame packing and APIs for enabling 3D. Technical considerations like performance overhead from rendering scenes twice are addressed. Optimization strategies like using lower resolutions, state caching, and view-independent rendering are presented.
Basic Optimization and Unity Tips & Tricks by Yogie AdityagamelanYK
This document provides tips and tricks for basic optimization in Unity. It discusses optimization at various stages of development from creation through polishing. It emphasizes the roles of different team members like programmers, artists, and designers. Technical optimization strategies covered include reducing draw calls, batching, culling, using object pools, avoiding Find, baking, and optimizing textures, audio, and code. The document also provides tips for using Unity tools like the profiler and execution order system to optimize games.
Gamebryo LightSpeed provides improved runtime performance, a modular game framework, entity modeling tools, Lua scripting and debugging, and rapid iteration capabilities. New features include deferred lighting for improved rendering, a decoration system for terrain customization, terrain streaming for unlimited map sizes, and an enhanced water editor. It offers an integrated development environment with tools in Visual Studio, 3DS Max, and the Toolbench plugin suite.
This document discusses the history and architecture of 3D game engines. It covers engines from id Software like RAGE and id Tech, as well as Unreal Engine. It describes the core components of a 3D game engine including scene management, rendering modules, and material systems. It also discusses cross-platform solutions, techniques for minimizing draw calls, and specific technologies like parallel-split shadow maps and deferred rendering. The presentation concludes with a discussion of Unity and questions from the audience.
GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Au...AMD Developer Central
Presentation, GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Audio!" by Jerry Mahabub and Michel Henein at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
This document discusses game engines and their components. A game engine is the core software that powers games and interactive applications. It provides underlying technologies like graphics rendering, physics simulation, sound, scripting, animation, artificial intelligence, and networking. Popular game engines include Unreal Engine, CryEngine, Unity, and Gamebryo. Game engines aim to simplify development and allow games to run on multiple platforms by providing tools and technologies that developers would otherwise have to program themselves.
Creating the Game Output Design discusses creating the visual design and deciding the output parameters for a game. It covers selecting 2D or 3D graphics and design components like bitmaps, sprites, textures and lighting. The document also discusses user interface layout, including diegetic/nondiegetic and spatial/meta components as well as common UI elements like menus, heads-up displays, and buttons. It emphasizes choosing output parameters based on the rendering engine, resolutions, and compression techniques used.
PlayStation®3 Leads Stereoscopic 3D Entertainment World Slide_N
PlayStation 3 leads the stereoscopic 3D entertainment world as every PS3 supports S3D. Games are ideal for S3D as they are immersive and don't mind wearing glasses. While 2.5D/3D chips could improve rendering performance for better S3D, challenges include cost, testing, reliability, and supporting multiple suppliers. Future S3D technologies may include personal 3D displays, head mounted displays, motion tracking, and interactive 3D experiences.
Optimization for Making Stereoscopic 3D Games on PlayStation® (PS3™)Slide_N
This document provides an overview of optimization techniques for developing stereoscopic 3D games on the PlayStation 3. It discusses how stereoscopic 3D works, including setting up dual cameras and managing parallax. Implementation details for the PS3 are covered, such as frame packing and APIs for enabling 3D. Technical considerations like performance overhead from rendering scenes twice are addressed. Optimization strategies like using lower resolutions, state caching, and view-independent rendering are presented.
Basic Optimization and Unity Tips & Tricks by Yogie AdityagamelanYK
This document provides tips and tricks for basic optimization in Unity. It discusses optimization at various stages of development from creation through polishing. It emphasizes the roles of different team members like programmers, artists, and designers. Technical optimization strategies covered include reducing draw calls, batching, culling, using object pools, avoiding Find, baking, and optimizing textures, audio, and code. The document also provides tips for using Unity tools like the profiler and execution order system to optimize games.
Gamebryo LightSpeed provides improved runtime performance, a modular game framework, entity modeling tools, Lua scripting and debugging, and rapid iteration capabilities. New features include deferred lighting for improved rendering, a decoration system for terrain customization, terrain streaming for unlimited map sizes, and an enhanced water editor. It offers an integrated development environment with tools in Visual Studio, 3DS Max, and the Toolbench plugin suite.
This document discusses the history and architecture of 3D game engines. It covers engines from id Software like RAGE and id Tech, as well as Unreal Engine. It describes the core components of a 3D game engine including scene management, rendering modules, and material systems. It also discusses cross-platform solutions, techniques for minimizing draw calls, and specific technologies like parallel-split shadow maps and deferred rendering. The presentation concludes with a discussion of Unity and questions from the audience.
GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Au...AMD Developer Central
Presentation, GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Audio!" by Jerry Mahabub and Michel Henein at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
This document discusses game engines and their components. A game engine is the core software that powers games and interactive applications. It provides underlying technologies like graphics rendering, physics simulation, sound, scripting, animation, artificial intelligence, and networking. Popular game engines include Unreal Engine, CryEngine, Unity, and Gamebryo. Game engines aim to simplify development and allow games to run on multiple platforms by providing tools and technologies that developers would otherwise have to program themselves.
Creating the Game Output Design discusses creating the visual design and deciding the output parameters for a game. It covers selecting 2D or 3D graphics and design components like bitmaps, sprites, textures and lighting. The document also discusses user interface layout, including diegetic/nondiegetic and spatial/meta components as well as common UI elements like menus, heads-up displays, and buttons. It emphasizes choosing output parameters based on the rendering engine, resolutions, and compression techniques used.
Introduction into Procedural Content Generation by Yogie AdityagamelanYK
Procedural content generation (PCG) refers to automatically creating game content through algorithms rather than human design. PCG aims to generate a large variety of content from a few parameters. It is not completely random, but uses techniques from AI, math, and other fields to structure the randomness. PCG offers opportunities like high content diversity, faster creation, lower costs, and adaptive gameplay, but also faces challenges like ensuring variety, aesthetics, and multiplayer compatibility. Successful PCG combines domain knowledge, AI techniques, structured randomness, and other specialized algorithms.
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing CamerasQualcomm Developer Network
Smart Terrain is a feature of the Qualcomm Vuforia SDK that allows devices with depth sensing cameras to reconstruct environments and interact with objects and surfaces. It provides advantages over standard cameras like automatic initialization and faster, richer results without requiring motion. Smart Terrain uses infrared depth sensors to project infrared light, fuse the RGB image with a depth map, and reconstruct surfaces and objects in the environment. Developers can utilize the Smart Terrain API for consistent lifecycle management and event handling across depth sensing devices.
Game engines provide a software framework for developing games across multiple platforms. They include components like a rendering engine for graphics, a physics engine, tools for in-game sound, scripting, artificial intelligence, networking and more. Popular commercial game engines like Unity and Unreal Engine are used widely by developers to build games efficiently. The rendering engine converts 3D models to 2D images using the GPU, while the physics engine simulates real-world phenomena through calculations from the CPU. Scripting is also important for programming game logic and artificial intelligence. Overall, game engines aim to deliver a unified development platform for creators.
The document discusses various technologies used for graphics rendering, including pixels, resolution, frame rate, GPUs, rendering, anti-aliasing, ambient occlusion, high dynamic range rendering, anisotropic filtering, PhysX, motion blur, depth of field, vertical sync, bloom, bump mapping, particle systems, and crepuscular rays. It provides examples of these techniques and how they are used to produce more realistic computer graphics images, especially in video games. Future areas that may improve graphics are also mentioned like parallel processing, virtual reality headsets, and higher resolution displays.
This document discusses identifying and managing game requirements. It covers identifying basic requirements such as input devices like controllers, keyboards, and motion sensors and output devices like displays, speakers. It also covers managing performance requirements including platform memory needs, graphics like resolution, and networking architecture. The key aspects of networking like TCP, UDP, and web services are also introduced.
The document discusses various types of output devices used by computers. It describes visual display units (VDUs or monitors), printers, plotters, and speakers. It provides details on different types of printers like dot matrix, inkjet, daisy wheel, and laser printers. It explains that output devices display, print, or transmit the results of processing from the computer's memory. Monitors can display graphics, text, and video, while printers provide hard copies in various speeds and qualities. Plotters are useful for engineering drawings and produce high quality outputs. Speakers convert electrical signals to sound.
This document discusses designing specific game components such as game states, objects, characters, and physics-based animations. It covers creating gameflow with challenges and pace, scripted events and training areas. It also discusses managing game performance through scene hierarchy, frame rate, and the graphics pipeline. Game loops, transforming and animating objects, and creating realistic characters through lighting, shaders, and projections are also summarized.
Output devices such as monitors, printers, and speakers are discussed. Monitors can be CRT or TFT, with TFT being newer, smaller, and quieter than CRT but with potentially less quality. Printers include laser, inkjet, dot matrix, and plotters. Laser printers are fast and high quality but expensive, while inkjet printers are cheaper but slower. Dot matrix printers can make carbon copies but are low quality and loud. Plotters can print on large paper but are slow. Speakers help with using computers but can disturb others, while lights provide status signals.
The HP Envy 27 display is a 4K monitor with a micro-edge IPS panel that provides high resolution, accurate colors, and versatile USB-C connectivity. It features 3840x2160 resolution, 100% sRGB color accuracy, and AMD FreeSync technology to eliminate tearing. The USB-C port allows charging of devices and powers the PC with up to 60 watts while viewing content on the large 27-inch, borderless screen.
Computer output devices include monitors, graphic plotters, and printers. Monitors display information and come in CRT and flat panel varieties. Printers produce hard copies and are either impact printers that use physical contact or non-impact printers like laser and inkjet that do not. Impact printers include dot matrix and line printers while non-impact printers offer faster speeds and higher quality output.
The Control Panel allows users to view and modify system settings and controls. It includes control panels that manage hardware and software settings, such as display, keyboard, mouse, date and time, power options, and fonts. Users can access the Control Panel through the Start menu by selecting Control Panel. It contains categories for managing appearance, network connections, programs, user accounts, and other computer settings.
This document discusses various types of output devices used with computers. It describes CRT monitors, TFT monitors, laser printers, inkjet printers, 3D inkjet printers, dot matrix printers, plotters, speakers, and multimedia projectors. For each device type, it provides details on their uses, advantages, and disadvantages.
The console uses a dual-core 2.1GHz processor and relies on cloud processing to reduce the need for high-end local processing. It has 3GB of RAM, a 720P 3D display, Wi-Fi and optical audio connectivity, a 166MHz graphics chip, and connects to a 220V external power supply via HDMI. Interaction is solely through controllers without additional sensors.
The document discusses various aspects of developing a game user interface (UI) in XNA, including loading and managing UI assets, configuring audio/video, detecting player input, creating menus and save-load screens, defining UI states, and programming UI controls. It provides code samples for loading assets, checking keyboard/gamepad input, creating a custom menu component, and making a checkbox UI control. The overall aim is to explain how to design and program the interactive elements that allow players to interact with a game.
An output device is any device that sends data from a computer to another device or user. Output devices relay the computer's response visually. There are two main types of output devices: display devices and printers. Display devices like monitors use technologies like LCD, LED, and plasma to output visual information. Printers output information onto paper and can be either impact printers that use ink ribbons, like dot matrix printers, or non-impact printers like inkjet and laser printers.
This document provides information about a television program called "Soccer Saturday" that discusses live soccer games. There are animated captions on the right side of the screen displaying the soccer league tables that change every 10 seconds. At the bottom, animated captions display scores and game updates. Techniques used include overlaying the tables/scores, using video clips, appropriate colors, and continually moving text. The graphics are clear and easy to read with no blurring or distortion.
Batman: The Telltale Series is an episodic point-and-click adventure game released in 2016. The player controls Batman and Bruce Wayne across 5 episodes, making choices that impact the story. It uses cel-shaded graphics to emulate the visual style of comic books. The game focuses on narrative and player choice over traditional gameplay mechanics or difficulty settings. Quick time events are used during action sequences to advance the story. The game aims to immerse players in the world of Batman through its interactive narrative and faithfulness to the comic book aesthetic.
This document discusses various computer output devices. It describes speakers that convert output data into sound, monitors that receive signals from a video card and provide a graphical or textual display, and printers that create images on paper or other media using technologies like ink transfer or heat transfer. The document discusses different types of printers like laser printers, inkjet printers, and dot matrix printers, and how they are classified by characteristics like quality, speed, whether they are impact or non-impact. It also briefly mentions other output devices like headphones and projectors.
Gaming consoles are dedicated systems optimized for game playing. Early consoles included the Magnavox Odyssey in 1972 and the Atari Pong home version in 1975. Common console components are controllers, a power supply, core unit with CPU and RAM, game media like cartridges or discs, and sometimes memory cards. Popular consoles include the PlayStation, Xbox, Wii, Nintendo DS, and PSP. Each has distinct controllers and hardware specifications. Portable handheld gaming devices like the Game Boy also emerged as a popular form of gaming.
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)byteLAKE
byteLAKE's presentation from the PPAM 2019 conference.
Abstract:
The goal of this work is to adapt 4 CFD kernels to the Xilinx ALVEO U250 FPGA, including first-order step of the non-linear iterative upwind advection MPDATA schemes (non-oscillatory forward in time), the divergence part of the matrix-free linear operator formulation in the iterative Krylov scheme, tridiagonal Thomas algorithm for vertical matrix inversion inside preconditioner for the iterative solver, and computation of the psuedovelocity for the second pass of upwind algorithm in MPDATA. All the kernels use 3-dimensional compute domain consisted from 7 to 11 arrays. Since all kernels belong to the group of memory bound algorithms, our main challenge is to provide the highest utilization of global memory bandwidth. Our adaptation allows us to reduce the execution time upto 4x.
Find out more at: www.byteLAKE.com/en/CFD
Foot note:
This is the presentation about the non-AI version of byteLAKE's CFD kernels, highly optimized for Alveo FPGA. Based on this research project and many others in the CFD space, we decided to shift the course of the CFD Suite product development and leverage AI to accelerate computations and enable new possibilities. Instead of adapting CFD solvers to accelerators, we use AI and work on a cross-platform solution. More on the latest: www.byteLAKE.com/en/CFDSuite.
-
Update for 2020: byteLAKE is currently developing CFD Suite as AI for CFD Suite, a collection of AI/ Artificial Intelligence Models to accelerate and enable new features for CFD simulations. It is a cross-platform solution (not only for FPGAs). More: www.byteLAKE.com/en/CFDSuite.
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Community
This document discusses modeling and predicting performance in Ceph distributed storage systems. It provides an overview of Ceph, including its object storage, block storage, and file system capabilities. It then discusses various factors that impact Ceph performance, such as network configuration, storage node hardware, number of disks, caching, redundancy settings, and placement groups. The document notes there are many configuration choices and tradeoffs to consider when designing a Ceph cluster to meet performance requirements.
Introduction into Procedural Content Generation by Yogie AdityagamelanYK
Procedural content generation (PCG) refers to automatically creating game content through algorithms rather than human design. PCG aims to generate a large variety of content from a few parameters. It is not completely random, but uses techniques from AI, math, and other fields to structure the randomness. PCG offers opportunities like high content diversity, faster creation, lower costs, and adaptive gameplay, but also faces challenges like ensuring variety, aesthetics, and multiplayer compatibility. Successful PCG combines domain knowledge, AI techniques, structured randomness, and other specialized algorithms.
Qualcomm Vuforia Mobile Vision Platform: Smart Terrain for Depth-Sensing CamerasQualcomm Developer Network
Smart Terrain is a feature of the Qualcomm Vuforia SDK that allows devices with depth sensing cameras to reconstruct environments and interact with objects and surfaces. It provides advantages over standard cameras like automatic initialization and faster, richer results without requiring motion. Smart Terrain uses infrared depth sensors to project infrared light, fuse the RGB image with a depth map, and reconstruct surfaces and objects in the environment. Developers can utilize the Smart Terrain API for consistent lifecycle management and event handling across depth sensing devices.
Game engines provide a software framework for developing games across multiple platforms. They include components like a rendering engine for graphics, a physics engine, tools for in-game sound, scripting, artificial intelligence, networking and more. Popular commercial game engines like Unity and Unreal Engine are used widely by developers to build games efficiently. The rendering engine converts 3D models to 2D images using the GPU, while the physics engine simulates real-world phenomena through calculations from the CPU. Scripting is also important for programming game logic and artificial intelligence. Overall, game engines aim to deliver a unified development platform for creators.
The document discusses various technologies used for graphics rendering, including pixels, resolution, frame rate, GPUs, rendering, anti-aliasing, ambient occlusion, high dynamic range rendering, anisotropic filtering, PhysX, motion blur, depth of field, vertical sync, bloom, bump mapping, particle systems, and crepuscular rays. It provides examples of these techniques and how they are used to produce more realistic computer graphics images, especially in video games. Future areas that may improve graphics are also mentioned like parallel processing, virtual reality headsets, and higher resolution displays.
This document discusses identifying and managing game requirements. It covers identifying basic requirements such as input devices like controllers, keyboards, and motion sensors and output devices like displays, speakers. It also covers managing performance requirements including platform memory needs, graphics like resolution, and networking architecture. The key aspects of networking like TCP, UDP, and web services are also introduced.
The document discusses various types of output devices used by computers. It describes visual display units (VDUs or monitors), printers, plotters, and speakers. It provides details on different types of printers like dot matrix, inkjet, daisy wheel, and laser printers. It explains that output devices display, print, or transmit the results of processing from the computer's memory. Monitors can display graphics, text, and video, while printers provide hard copies in various speeds and qualities. Plotters are useful for engineering drawings and produce high quality outputs. Speakers convert electrical signals to sound.
This document discusses designing specific game components such as game states, objects, characters, and physics-based animations. It covers creating gameflow with challenges and pace, scripted events and training areas. It also discusses managing game performance through scene hierarchy, frame rate, and the graphics pipeline. Game loops, transforming and animating objects, and creating realistic characters through lighting, shaders, and projections are also summarized.
Output devices such as monitors, printers, and speakers are discussed. Monitors can be CRT or TFT, with TFT being newer, smaller, and quieter than CRT but with potentially less quality. Printers include laser, inkjet, dot matrix, and plotters. Laser printers are fast and high quality but expensive, while inkjet printers are cheaper but slower. Dot matrix printers can make carbon copies but are low quality and loud. Plotters can print on large paper but are slow. Speakers help with using computers but can disturb others, while lights provide status signals.
The HP Envy 27 display is a 4K monitor with a micro-edge IPS panel that provides high resolution, accurate colors, and versatile USB-C connectivity. It features 3840x2160 resolution, 100% sRGB color accuracy, and AMD FreeSync technology to eliminate tearing. The USB-C port allows charging of devices and powers the PC with up to 60 watts while viewing content on the large 27-inch, borderless screen.
Computer output devices include monitors, graphic plotters, and printers. Monitors display information and come in CRT and flat panel varieties. Printers produce hard copies and are either impact printers that use physical contact or non-impact printers like laser and inkjet that do not. Impact printers include dot matrix and line printers while non-impact printers offer faster speeds and higher quality output.
The Control Panel allows users to view and modify system settings and controls. It includes control panels that manage hardware and software settings, such as display, keyboard, mouse, date and time, power options, and fonts. Users can access the Control Panel through the Start menu by selecting Control Panel. It contains categories for managing appearance, network connections, programs, user accounts, and other computer settings.
This document discusses various types of output devices used with computers. It describes CRT monitors, TFT monitors, laser printers, inkjet printers, 3D inkjet printers, dot matrix printers, plotters, speakers, and multimedia projectors. For each device type, it provides details on their uses, advantages, and disadvantages.
The console uses a dual-core 2.1GHz processor and relies on cloud processing to reduce the need for high-end local processing. It has 3GB of RAM, a 720P 3D display, Wi-Fi and optical audio connectivity, a 166MHz graphics chip, and connects to a 220V external power supply via HDMI. Interaction is solely through controllers without additional sensors.
The document discusses various aspects of developing a game user interface (UI) in XNA, including loading and managing UI assets, configuring audio/video, detecting player input, creating menus and save-load screens, defining UI states, and programming UI controls. It provides code samples for loading assets, checking keyboard/gamepad input, creating a custom menu component, and making a checkbox UI control. The overall aim is to explain how to design and program the interactive elements that allow players to interact with a game.
An output device is any device that sends data from a computer to another device or user. Output devices relay the computer's response visually. There are two main types of output devices: display devices and printers. Display devices like monitors use technologies like LCD, LED, and plasma to output visual information. Printers output information onto paper and can be either impact printers that use ink ribbons, like dot matrix printers, or non-impact printers like inkjet and laser printers.
This document provides information about a television program called "Soccer Saturday" that discusses live soccer games. There are animated captions on the right side of the screen displaying the soccer league tables that change every 10 seconds. At the bottom, animated captions display scores and game updates. Techniques used include overlaying the tables/scores, using video clips, appropriate colors, and continually moving text. The graphics are clear and easy to read with no blurring or distortion.
Batman: The Telltale Series is an episodic point-and-click adventure game released in 2016. The player controls Batman and Bruce Wayne across 5 episodes, making choices that impact the story. It uses cel-shaded graphics to emulate the visual style of comic books. The game focuses on narrative and player choice over traditional gameplay mechanics or difficulty settings. Quick time events are used during action sequences to advance the story. The game aims to immerse players in the world of Batman through its interactive narrative and faithfulness to the comic book aesthetic.
This document discusses various computer output devices. It describes speakers that convert output data into sound, monitors that receive signals from a video card and provide a graphical or textual display, and printers that create images on paper or other media using technologies like ink transfer or heat transfer. The document discusses different types of printers like laser printers, inkjet printers, and dot matrix printers, and how they are classified by characteristics like quality, speed, whether they are impact or non-impact. It also briefly mentions other output devices like headphones and projectors.
Gaming consoles are dedicated systems optimized for game playing. Early consoles included the Magnavox Odyssey in 1972 and the Atari Pong home version in 1975. Common console components are controllers, a power supply, core unit with CPU and RAM, game media like cartridges or discs, and sometimes memory cards. Popular consoles include the PlayStation, Xbox, Wii, Nintendo DS, and PSP. Each has distinct controllers and hardware specifications. Portable handheld gaming devices like the Game Boy also emerged as a popular form of gaming.
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)byteLAKE
byteLAKE's presentation from the PPAM 2019 conference.
Abstract:
The goal of this work is to adapt 4 CFD kernels to the Xilinx ALVEO U250 FPGA, including first-order step of the non-linear iterative upwind advection MPDATA schemes (non-oscillatory forward in time), the divergence part of the matrix-free linear operator formulation in the iterative Krylov scheme, tridiagonal Thomas algorithm for vertical matrix inversion inside preconditioner for the iterative solver, and computation of the psuedovelocity for the second pass of upwind algorithm in MPDATA. All the kernels use 3-dimensional compute domain consisted from 7 to 11 arrays. Since all kernels belong to the group of memory bound algorithms, our main challenge is to provide the highest utilization of global memory bandwidth. Our adaptation allows us to reduce the execution time upto 4x.
Find out more at: www.byteLAKE.com/en/CFD
Foot note:
This is the presentation about the non-AI version of byteLAKE's CFD kernels, highly optimized for Alveo FPGA. Based on this research project and many others in the CFD space, we decided to shift the course of the CFD Suite product development and leverage AI to accelerate computations and enable new possibilities. Instead of adapting CFD solvers to accelerators, we use AI and work on a cross-platform solution. More on the latest: www.byteLAKE.com/en/CFDSuite.
-
Update for 2020: byteLAKE is currently developing CFD Suite as AI for CFD Suite, a collection of AI/ Artificial Intelligence Models to accelerate and enable new features for CFD simulations. It is a cross-platform solution (not only for FPGAs). More: www.byteLAKE.com/en/CFDSuite.
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Community
This document discusses modeling and predicting performance in Ceph distributed storage systems. It provides an overview of Ceph, including its object storage, block storage, and file system capabilities. It then discusses various factors that impact Ceph performance, such as network configuration, storage node hardware, number of disks, caching, redundancy settings, and placement groups. The document notes there are many configuration choices and tradeoffs to consider when designing a Ceph cluster to meet performance requirements.
This document discusses the evolution of computer architecture from semiconductor memory in the 1970s to recent processor trends. Key points covered include the development of microprocessors from the 4004 in 1971 to recent multi-core and many-integrated core processors. The document also discusses RISC architectures like ARM and benchmarks for evaluating system performance.
The document discusses the architecture of high-end processors such as Core 2 Duo and Core i5. It describes their microarchitecture including features like wide dynamic execution, advanced smart cache, smart memory access, and digital media boost. It also covers the Nehalem microarchitecture used in Core i3, i5 and i7 processors and its components such as caches, instruction execution, decoding, optimization, and write-back.
This document provides an overview of computer hardware components and concepts. It describes the basic components of a computer system including the system unit, CPU, memory, storage, and peripheral devices. It discusses different types of computer systems and platforms. The document also covers hardware basics such as motherboards, buses, caches and processors. Finally, it discusses computer networks, storage technologies, input devices and output devices.
The Next Generation of PhyreEngine aims to target the NGP, PS3, and PC. It provides an upgraded renderer, improved tool integration, and a new data-oriented object model. A key focus is the deferred lighting solution implemented on SPU for PS3, which is now extended to the NGP. The presentation demonstrates the engine's rendering techniques like deferred shading, MLAA, and particles on different platforms. Porting to NGP required moving tasks like skinning and scene traversal to the CPU while optimizing meshes, textures, and shaders for the NGP's GPU.
Quick-and-Easy Deployment of a Ceph Storage ClusterPatrick Quairoli
Quick & Easy Deployment of a Ceph Storage Cluster with SUSE Enterprise Storage
The document discusses deploying a Ceph storage cluster using SUSE Enterprise Storage. It begins with an introduction to Ceph and how it works as a distributed object storage system. It then covers designing Ceph clusters based on workload needs and measuring performance. The document concludes with step-by-step instructions for deploying a basic three node Ceph cluster with monitoring using SUSE Enterprise Storage.
The document discusses strategies for optimizing Ceph performance at scale. It describes the presenters' typical node configurations, including storage nodes with 72 HDDs and NVME journals, and monitor/RGW nodes. Various techniques are discussed like ensuring proper NUMA alignment of processes, IRQs, and mount points. General tuning tips include using latest drivers, OS tuning, and addressing network issues. The document stresses that monitors can become overloaded during large rebalances and deleting large pools, so more than one monitor is needed for large clusters.
This document provides information about computer organization and architecture. It discusses the motherboard as the central component that connects all other components like the CPU, RAM, expansion slots and ports. It describes how the chipset and its components like the northbridge and southbridge facilitate data exchange. It covers CPU components like the ALU and registers, and characteristics like clock speed and instruction sets. It also discusses the memory hierarchy including caches, RAM and disk storage. In summary, the document is an overview of key components and concepts in computer organization and architecture.
Preparing Codes for Intel Knights Landing (KNL)AllineaSoftware
The document discusses preparing codes to take advantage of the Intel Knights Landing (KNL) processor. It outlines three main steps:
1. Use Allinea Performance Reports to analyze memory access and identify optimization opportunities like improving cache usage or increasing thread count.
2. If memory access is identified as a bottleneck, improve cache usage through techniques like blocking or increase thread count to hide latency. The KNL processor includes high bandwidth memory (HBM) that codes should leverage.
3. Identify loops dominating execution time using Allinea MAP and apply vectorization either through compiler flags or manual techniques. Vectorization is key to exploiting the KNL's processing capabilities.
The document discusses computer hardware components including central processing units, memory, storage, input/output devices, and various computer system types. It provides learning objectives on selecting hardware to meet organizational needs, describes the functions of main memory and secondary storage, and discusses trends toward multiprocessing, higher storage capacities, and green computing initiatives.
The document discusses computer hardware components including central processing units, memory, storage, input/output devices, and various computer system types. It provides learning objectives on selecting hardware to meet organizational needs, describes the functions of main memory and secondary storage, and discusses trends toward green computing and high-performance multicore processors.
The document discusses computer hardware components and their functions. It covers the central processing unit (CPU) and memory, as well as secondary storage devices, input/output devices, and different types of computer systems. The key points are that RAM is temporary memory and ROM contains permanent instructions, multicore processors allow workload sharing, secondary storage provides higher capacity storage, and green computing aims to reduce environmental impact.
The CPU acts as the brain of the computer and controls all functions. It has three main components: the control unit which directs operations, the ALU which performs calculations, and registers for temporary storage. The CPU fetches and decodes instructions from memory and directs the ALU and registers to perform operations. Faster CPUs are achieved through higher clock speeds, larger bus widths, cache memory, and parallel processing using multiple processors.
I hope You all like it. I hope It is very beneficial for you all. I really thought that you all get enough knowledge from this presentation. This presentation is about materials and their classifications. After you read this presentation you knowledge is not as before.
This document provides an overview of basic computer architecture. It discusses the history of computers, components like the CPU, motherboard, and connections between parts. The document outlines CPU architecture including the fetch-decode-execute cycle and components like the ALU, control unit, and registers. It also describes memory, addressing, cache, and different memory types like RAM, ROM, and CMOS.
The document discusses the central processing unit (CPU) and its components. The CPU consists of a control unit and arithmetic logic unit that work together to execute instructions. It describes how the CPU fetches and decodes instructions from memory and controls the flow of data. It also discusses various types of memory like RAM, ROM, and cache that the CPU uses to store and access data and instructions. Finally, it covers factors that influence computer processing speed like microprocessor speed, bus width, and parallel processing.
The document discusses the central processing unit (CPU) and its components. The CPU consists of a control unit and arithmetic logic unit that work together to execute instructions. It describes how the CPU fetches and decodes instructions from memory and controls the flow of data. It also discusses various types of memory like RAM, ROM, and cache that the CPU uses to store and access data and instructions. Finally, it covers factors that influence computer processing speed like microprocessor speed, bus width, and parallel processing.
The Central Processing Unit(CPU) for Chapter 4MKKhaing
The document discusses the central processing unit (CPU) and its components. The CPU consists of a control unit and arithmetic logic unit that work together to execute instructions. It describes how the CPU fetches and decodes instructions from memory and works with registers and the system clock to perform arithmetic and logical operations on data. It also summarizes different types of memory like RAM, ROM, and cache that the CPU uses to store and access instructions and data.
The document discusses the central processing unit (CPU) and how it works. It describes the CPU's main components - the control unit and arithmetic logic unit - and how they execute instructions. It explains how programs are executed via a fetch-decode-execute cycle and how data is stored and accessed in memory and caches. It also covers various CPU technologies like microprocessors, memory types, bus speeds, and approaches to increasing a computer's processing speed.
Similar to Sony Computer Entertainment Europe Research & Development Division (20)
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
2. Slide 2
• PlayStation® Development Resources
• Architecture of the PlayStation®3 (PS3™)
• Case studies
– Cover techniques used by 1st party titles for both the PSP®
(PlayStation®Portable) and PS3™
• PlayStation® Advanced Programming
– Graphics performance
– PS3™ SPE utilisation
What We Will Be Covering
3. Kish Hirani
Head of Developer Services
Sony Computer Entertainment Europe
(SCEE)
Developer Services
SCEE - Research & Development Division
5. Slide 5
• Support – SCE DevNet
• Field Engineering
• Technical and Strategic Consultancy
• Technical Training
• TRC Consultancy
• Projects
What We Cover
10. Slide 10
• Option for Digital Distribution via
PlayStation® Store as well as disc
• Same with full PSP titles
• And now minis
Digital Distribution Options
11. Slide 11
• PSN available in 58 countries
– 12 languages, 22 currencies
– Including Russia! 90% of all titles release are now localised.
• 700+ games and demos available
• 600+ million items downloaded worldwide
• More than 130K registered in Russia
• Cumulative Russia PSN Sales €650K
– as of FY2009 Q3
PlayStation®Network Vital Statistics
12. Slide 12
• 52.9 million globally (as of end June 2009)
– 17 million in SCEE Region
– More than a 850K in Russia
• Russia Tie Ratio: 2:1
PSP Vital Statistics
13. Slide 13
• 23.7 million PS3s (as of end of June 2009)
– One million of the new model sold in three weeks in
September
• More than 190K in the Russia so far
– PlayStation and PS2 both sold more than 2 million in
Russia during their lifespan (PS2 still selling well)
• 68% of PS3 users in Russia go online
• Russia Tie Ratio 5:1
PS3 Vital Statistics
17. Slide 17
• Console vs PC Development
• Main Memory
• RSX™
– Bandwidth, Pipeline and Data Placement
• Cell Broadband Engine™
– Element Interconnect Bus, PPE and SPE
• Benefits of SPU Programming
• Data Storage
• Peripherals
PS3™ Hardware Overview
18. Slide 18
PC vs Console Game Development
PC
• Open Platform
• Limitless Resources
• Generic API
• IHV Drivers
Console
• Closed Platform
• Fixed Hardware
• Tiny HW Abstraction
• Direct access
19. Slide 19
• High performance on a budget
• Fixed hardware target with a long life cycles
– Games have to get better every year
– Waiting for hardware to catch up with software is not an option
• Have to understand the strengths and weaknesses of the
hardware
Why Consoles Are Different
20. Slide 20
• Performance benefits from understanding of how the
hardware works
– Does not mean coding in assembly
– Need to have an awareness of how software design may affect
performance
– Generally code optimized for consoles run also faster on PC
Why Consoles Are Different
21. Slide 21
• Memory is a very finite resource
• Memory access is slow compare to processor speed
– This gulf is getting larger on all platforms
– Cache is trying to insulate the application from memory latency
– But cache on consoles are generally smaller than on a PC
Why Consoles Are Different
24. Slide 24
• Traditionally, heap function is called (malloc/new)
– Has to be contiguous
– Fragmentation possible
• Recommend acquiring memory through mmapper
– Build heap on top of this
– Non-contiguous pages mapped to contiguous address space
– Not subject to fragmentation
– More control: page sizes, access rights
Memory Management
25. Slide 25
• Games will typically have their own memory manager
– Small amount of memory compared to PC
– Conservative with data structures and memory allocations
• Different strategies for different uses
– Memory pools, custom heaps, memory tracking
• Understand the fragmentation, performance and space implications
for each tool and apply them appropriately
Memory Management
26. Slide 26
• Fetching data from memory into a processor can easily dominate
performance
– L1 cache miss 10~30 cycles
– L2 cache miss ~300 cycles
• Most applications exhibit locality of code or data
– A loop executes the same small set of instructions on data located
contiguously in memory
• Caches aim to increase overall system performance by taking advantage
of this
– Fast on die memory is very expensive so they are limited in size
Caches
28. Slide 28
• RSX™ has access to two memories
• Local Memory
–249MB usable (256MB – 7MB reserved by OS)
• System Memory
–213MB usable (256MB – 43MB reserved by OS)
–RSX™ usable through memory mapping by application
in 1MB blocks
Memory Setup
34. Slide 34
• Able to utilise graphics data in either memory
– Vertex attributes, Textures
• Frame Buffer stores colour & depth surfaces
– Can also be in either memory
– Local memory supports colour & Z compression for MSAA modes
• Saves bandwidth rather than space
• Generally local memory accesses are faster
• If local memory bandwidth is bottleneck → consider moving
assets to system memory
Data Placement
35. Slide 35
How triangles end up on screen
RSX™
Local Memory
Cell
Main Memory
Verts
Textures
Shaders
Verts
Textures
ShadersCmd Buffer
41. Slide 41
• This connects all units in the CBE
• Consist of 4 128bit buses
• All data transfers are 128 byte packets
– Smaller transfers sent as partial packed
Element Interconnect Bus
45. Slide 45
• 64 bit CPU core
• 25.6 GFLOPs FPU
• In order execution
• Execution units include:
– IEEE double precision FPU
– Integer
– Float vector unit (VMX)
PowerPC Processing Element
47. Slide 47
• Intrinsics are supported, preferable to inline assembly
–Scheduling and optimisation
• Two hardware threads
–2 or more active software threads act to mitigate stalls
• Good use of caches can make big difference in
performance
PPE Performance
48. Slide 48
• Power PC issue that can severely affect performance
• PPU has a store queue, that queues up stores waiting to
get sent to the cache
• If a load occurs that conflicts with data in store queue, the
load will stall until the data is stored
– Pipeline stall, around 40 cycles
• Find using Tuner
– LHS performance counter
Load Hit Store
49. Slide 49
• Type conversion
– Integer, floating point or vector registers
– Moves have to go through memory as there are no asm
instructions to move data between register sets
• Stack access - when calling a function that has:
– Large parameters - will be passed on the stack
– Many parameters - some may end up on the stack
– Few instructions - with stack based parameters
LHS Cases
51. Slide 51
SPE Architecture
• 3.2GHz processor
• 128 Vector registers
• 128 bit
• No scalar registers
• 256kB Local Store
• Very fast, like L1 cache
• Runs Asynchronously from rest of system
• Independent Memory Flow Controller (MFC)
• SIMD (single instruction multiple data)
• No Operating System, Very few system calls
• Performance is high and deterministic
SPE
EIB
MFC
SPU
Local Store
52. Slide 52
• SPE means Synergistic Processing Element
– Mutual dependence between PPE and SPEs
• PPE runs operating system and performs top level control
– General Purpose Hardware (PowerPC)
• SPEs provide bulk of application performance
– Simple hardware
– Simple pipeline
– Simple memory architecture
– Massive computational horsepower
SPEs
53. Slide 53
• Simple, predictable instruction set
– Has full SIMD support
– Intrinsics are supported for all instructions
• Optimised compiler
– Handles scalar conversion
– Often as good as PPE for scalar code, can be quicker
• Easy to program; can be programmed in C/C++
• Core power of system, key to performance
• Frees up RSX™, frees up PPE
Synergistic Processor Elements
54. Slide 54
• Ported PPU code usually faster for optimised & non-optimised code
– Port your code
• Easy to program; with some limitations can be programmed in C/C++
• Scalar code can be run
– Full SIMD instruction set
• Multithreaded engine analogy - can move threads to SPUs
SPU Development
55. Slide 55
• Raw
– You own the hardware
– Complete control, direct hardware access
– Once loaded – stays resident
• SPU Threads
– The OS owns the hardware
– Must belong to a thread group
– Scheduled by Cell OS Lv-2 kernel – based on group priority
• SPURS
– Higher level abstraction
– Plays well with middleware
Ways of Using SPUs
56. Slide 56
• Runs on-top of SPU Threads
– Small kernel on SPU
• Thread switching more efficient
– Requires no PPU resources
– PPU SPURS Handler: only to restore thread group when necessary
• Easier to synchronise threads
• Easier to balance load on multiple SPUs
– SPUs grab work when available
SPURS (SPU Run Time System)
57. Slide 57
SPURS (SPU Run Time System)
Policy Module
SPURS Kernel
Work
Load
Policy Module
SPURS Kernel
Work
Load
Policy Module
SPURS Kernel
SPURS
Handler
PPU SPU0 SPU1 SPU2
Thread Group for SPURS
• Implemented as an
SPU Thread Group
• Kernel, Policy
Module, Workload
• Minimal PPU
Overhead
Work
Load
58. Slide 58
• Jobs
– A chain of execution units
– Suited to short execution times
– Automatic DMA transfer of input & output data
– Executed with associated data
– No context switching – run complete
• Tasks
– Different data model, larger programs
– Can yield - synchronisation mechanisms
– Optimised context switch
– Similar to threads
SPURS Workload Types
59. Slide 59
• SPURS JobChain uses manual control of the execution sequence
• SPURS JobQueue is designed for dynamic job submission in parallel
SPURS Job Queue
SPU
SPU
SPURS Job Queue
PPU Fiber
Port
PPU Thread
SPU
Submitted jobs
SPURS Task
submitting in parallel executing in parallel
61. Slide 61
• Problem: Update an array of objects and submit the visible
ones for rendering
• Object.update() method was bottleneck on PPE
– Determined using SN Tuner
– This function generates the world space matrices for each
object
– Embarrassingly parallel
• No data dependencies between objects
• If we had 1 processor for each object we could do that
Function off load example using a SPURS Task
62. Slide 62
SPURS Tasks
//---Conditionally calling SPU Code
if (usingSPU==false) //---------------------------generic version
{
for(int i=0; i<OBJECT_COUNT; i++)
{
testObject[i].update();
}
}
else //-------------------------------------------SPU version
{
int test=spurs.ThrowTaskRunComplete(taskUpdateObject_elf_start,
(int)testObject,
OBJECT_COUNT,0,0);
//could do something useful here…
int result = spurs.CatchTaskRunComplete(test);
}
63. Slide 63
• Getting data in and out of SPU is first problem
• Get this working before worrying about actual processing
– Brute force often works just fine
• DMA entire object array into SPE memory
• Run update method
• DMA entire object array back to main memory
• List DMA allows you to get fancy
– Gather and Scatter operation
• If the data set is really big or if you need every last drop of performance
– Streaming model
• Overlap DMA transfers with processing
• Double or quad buffering
SPURS Task
64. Slide 64
SPURS Task: Getting Data in and out
int cellSpuMain(.....)
{
int sourceAddr = ....; //source address
int count = ....; //number of data elements
int dataSize = count * sizeof(gfxObject); //amount of memory to get
gfxObject *buf = (gfxObject*)memalign(128,dataSize); //alloc mem
DmaGet((void*)buf,sourceAddr,dataSize);
DmaWait(....);
//---data is loaded at this point so do something interesting
DmaPut((void*)buf,sourceAddr,dataSize);
DmaWait(....);
return(0);
}
65. Slide 65
• Keep code the same as PPE Version
– Conditional compilation based on processor type
– Might have to split code into a separate file
SPURS Task: Executing the Code
//--- SPU code to call the update method
//--- data is loaded at this point so do something interesting
//--- step through array of objects and update
gfxObject *tp;
for(int i=0; i<count; i++)
{
tp = &buf[i];
tp->update(); //same method as on PPE compiled for SPU
}
66. Slide 66
• 5x Speed up in this instance
– Brute force solution
– Could add more SPUS
– Problem is embarrassingly parallel
• Note that the same source code for the method is being
used on PPU and SPU
– Code stays in sync
– No fancy SPU specific optimisations
SPURS Task: Results
67. Slide 67
• PPU Stalled while waiting for SPU to Process
• SPU Data get and put were synchronous
– Not using hardware to its fullest extent
– Easy to get more if we need it
SPURS Task: Time Waster
PPU PPU
GET PUTExec
TIME
SPU
73. Slide 73
• Simple pipeline
– No stalls like LHS
• No Operating System, very few system calls
– Performance is high and deterministic
• Memory is very fast like L1 cache
– Memory transfers a asynchronous so can be hidden
• Even if performance is comparable, frees up the PPE to do other stuff
SPEs are faster than the PPE
74. Slide 74
• Double buffer
• Align data and transfers optimally
• Vectorise data: SOA instead of AOS
• Use SIMD
– 4 operations at once = 4x speed up
• Use intrinsics
• Program directives - branch hinting
Further SPU Optimisations
75. Slide 75
Double-Buffering
Buffer0 Buffer0 Buffer0 Buffer0
Buffer1 Buffer1 Buffer1 Buffer1
DMA Computation
• Allows overlapping of DMA and computation
• Transfer to/from one buffer while using data from another
• Use different tag group for each buffer
• Depends on space available in LS
TIME
76. Slide 76
• Graphics API
• Multistream – Audio
• Bullet – Physics
• PlayStation®Edge – Geometry/Compression/Animation
• FIOS – Optimised disc access
• Various Codecs
Our Libraries that run on SPUs
77. Slide 77
• Game engine including
– Modular run time
– Samples, docs and whitepapers
– Full source and art work
• Optimised for multi-core platforms especially PS3™
• Extractable and reusable components
PhyreEngine™
78. Slide 78
• A lot of 3rd party middleware partners use the SPUs
– Physics
– Audio
– AI
– Game Engines
– Many more…
• Examples by guest speakers at the end of the talk
– NVIDIA® PhysX ®, Epic Games Unreal ® Engine3
and Crytek CryENGINE ® 3
In Addition
79. Slide 79
• Animation
• AI Threat prediction
• AI Line of fire
• AI Obstacle avoidance
• Collision detection
• Physics
• Particle simulation
• Particle rendering
• Scene graph
Killzone®2 SPU Usage
• Display list building
• IBL Light probes
• Graphics post-processing
• Dynamic music system
• Skinning
• MP3 Decompression
• Edge Zlib
• etc.
44 Job types
80. Slide 80
• Effect done on SPU
– Motion Blur, Bloom, Depth of Field
– Quarter-res image on XDR
• SPUs assist RSX™
1. RSX™ prepares low-res image buffers in XDR
2. RSX™ triggers interrupt to start SPUs
3. SPUs perform image operations
4. RSX™ already starts on next frame
5. Result of SPUs processed by RSX™ early in next frame
Post Processing
81. Slide 81
Resolve to quarter resolution
Image generated on the SPUs
(bloom, DoF, motion blur)
Final image by the RSX™
(Blending the out-of-focus and blurry areas in)
XDR DDR
83. Slide 83
• Faster than Blu-ray
– Over 20MB/s vs. 9MB/s of Blu-ray
– Cache data to HDD for faster subsequent load times
– Install titles
– Stream data
• Virtual memory
• Decrease usage of system memory
Storage – Hard drive
84. Slide 84
• High capacity
– 25GB single layer
– 50GB dual layer
• Constant linear velocity
– Constant read speed over disc - 9 MB/s
– Optimise layout, duplicate and pack data
• Use compression
– Lowers bandwidth cost and lowers read time
– Use SPUs to decompress – Edge Zlib
• Can emulate on Devkit
Blu-ray
85. Slide 85
• Can access both Blu-ray drive (~9 MB/s) and
HDD (~20MB/s) simultaneously
– On HDD, data come from the system cache and
game data partitions
• SPU Zlib decompression
– Increase the throughput
• libFIOS
– Optimize I/O transfers
– Ease background HDD caching implementation
PS3TM I/O Performance
System
Cache
(2GB)
25 ms apart
Game Data
(up to 5GB
per title)