SlideShare a Scribd company logo
1 of 33
Download to read offline
SPU Physics
Erwin Coumans
Simulation Support Lead
Sony Computer Entertainment America
Eric Christensen
Principal Engine Programmer
Insomniac Games
Slide 2
Insomniac Physics System
Feel free to ask questions at any time.
Slide 3
SPU Shaders
• Can be written for the physics system to do specific things without
having to change the physics pipeline.
• Built individually, resulting code lives in a header file that contains a
byte array.
• Debug/Read-only section resides at the bottom of the byte array.
• Shaders written by gameplay programmers are registered with the
physics system through a simple API.
• The physics system uploads registered shaders for a particular context
at a particular stage.
• 2k of local store is allocated for the shader code, 512 bytes is allocated
for the shader work buffer.
Slide 4
SPU Shaders
• Shaders access a common set of functions that are passed to them, as
well as pre-allocated dma tags.
Some functions for example:
• DMA get/put
• Printf
• Local store allocator
• Physics system has its own set of shaders that are native and expected
in the pipeline, however, they are built the same way.
• Native physics system shaders are only slightly optimized currently
and range from 2k to 12k in size.
Slide 5
Insomniac Physics Pipeline
Environment
Update
Triangle Cache
Update
Object Cache
Update
Start
Physics Jobs
Sync
Physics Jobs
Update
Rigid Bodies
PPU Work
Upload
Object Cache
Collide
Triangles
Collide
Primitives
Build
Simulation Pools
Simulate
Pools
Post Update
For Each Iteration
Upload Tri-Cache
Upload RB Prims
Upload Intersect Funcs
PPU SPU Execution
Intersection Tests
Upload CO Prims
Upload Intersect Funcs
Intersection Tests
Upload Physics
Joints
Sort Joint Types
Per Joint Type Upload
Jacobian Generation
Code
Calculate Jacobian
Data
Solve Constraints
Integrate
For Each Physics
Object Upload
Anim Joints
Transform Anim Joints
Using Rigid Body Data
Send Update To PPU
Slide 6
Insomniac Physics Pipeline
Environment
Update
Triangle Cache
Update
Object Cache
Update
Start
Physics Jobs
Sync
Physics Jobs
Update
Rigid Bodies
PPU Work
PPU •Called every frame
•Environments for:
•Full Simulation
•Cheap Simulation
•IK Each object has its own small
triangle cache. The cache volume is
sized based on object velocity.
Resampling of triangles only occurs
on volume breach.
Potentially colliding objects are
grouped by proximity. The group
volume also samples non-physics
objects in order to collide with
Their primitives. A Physics Job is submitted for each
physics object group so that they
can collide and simulate.
For instance: A stack of boxes would
be grouped together and sent as one
job.
Do as much work as possible on the
PPU before needing to sync.
Physics jobs can even be synced on the
next frame.
Wait for Physics jobs to finish.
In most cases the wait time is
(if anything) very small.
Runtime rigid body data is updated
in order to be accessed by the game
on the PPU side. Any modifications
made directly to rigid bodies (such
as mass changes, etc..) will be used
in the next frame’s job submission.
Slide 7
Collide Triangles
Upload Tri-Cache
Upload Rigid Body Primitives
Upload Triangle Intersection Code
Perform Intersection Tests
•For each physics object, upload the collision primitives for
each rigid body contained in the object.
•Sort the primitives by type.
For Each Physics Object (which can contain many rigid bodies i.e. Ragdolls)
Allocate local store for triangle cache.
Tri-cache size is fixed, so this only needs
to occur once.
•Upload the corresponding triangle cache.
•It goes in the throw-away memory allocated for all tri-caches.
•Local store is allocated as needed for intersection code.
•For each list of primitive types, upload the corresponding code.
i.e. Sphere, Capsule, OBB vs Trimesh
•Perform intersection and contact point generation.
Roll back local store to beginning of
Pre-allocated contact point buffer.
Slide 8
Collide Primitives
Upload Rigid Body Primitives
Upload Triangle Intersection Code
Perform Intersection Tests
•For each physics object, upload the collision primitives for
each rigid body contained in the object.
•Sort the primitives by type.
For Each Physics Object
•Local store is allocated as needed for intersection code.
•For each list of primitive types, upload the corresponding code.
i.e. Sphere vs. Sphere, Capsule vs. OBB, OBB vs OBB etc..
•Perform intersection and contact point generation.
Roll back local store to beginning of
Pre-allocated contact point buffer.
Slide 9
Upload Rigid Body Data & Joints
Upload Rigid Bodies
Store Rigid Bodies data in its corresponding local store buffer.
Insert Rigid bodies into sort list nodes.
For Each Physics Object
Store Joint data in it’s corresponding local store buffer.
Insert Joints into sort list nodes.
Allocate local store for all Rigid Bodies
and Joints
Upload Joints
Build Simulation Pools
Associate Rigid Bodies
Each Joint (including contact points) has a relationship with one
or two Rigid Bodies.
Pools are built based on Rigid Body connectivity.
Each Joint is also inserted into it’s respective simulation pool.
Slide 10
Simulate Pools
Sort Joints by Type
For each Simulation Pool “package”
•Joints are sorted based on constraint type.
i.e. Single Axis Hinge, Dual Axis Hinge, Spring, etc…
•The constraint type id references a table that describes the
address and size of the corresponding Jacobian generation
code.
Generate Jacobian Data
•For each constraint type
•Allocate local store for size of code fragment
•Upload code fragment for constraint.
i.e. ContactPointBuildJD
•After generating Jacobian data, roll back local store to make
room for next code fragment.
Allocate local store for solver data.
Size is determined by number of Rigid Bodies in the pool
as well as the number of DOF removed from the system.
Solve Constraints Iterative LCP solver to generate corrective forces.
Integrate Update rigid body transform.
Slide 11
Post Update
Upload Animation Joint data
For Each Physics Object
•Allocate local store for animation data.
•Matrix array
•Mapping of physics joints to animation joints
Transform Animation Joints
•Update animation joints that are mapped to rigid bodies using
current Rigid Body transforms.
•This also includes updating “Game Actor” world-space matrix
DMA Results to PPU
•Send updated Rigid Body transforms.
•Send updated animation joints.
•Send updated “Game Actor” world-space matrix.
Rollback All Local Store to the end of
the contact point buffer for the next iteration.
Slide 12
Summary Insomniac Physics
Limitations
• Limited Number of Collision Triangles per Physics Object
• Limited Number of Object Interactions per Job
• Limit on Number of Constraints per Job
• Limited Number of Contact Points per Job
• Not Possible to Run Collision and Simulation Seperately
Benefits
• Discrete Pipeline Allows heavy Optimization
• No Intermediate Data Swapping Between PPU and SPU
• Physics Jobs Are Completely Independent
• Fixed PPU Memory Size Eliminates Dynamic Allocation from SPU
• Implemented and Optimized for a Specific Platform
• No Restrictions on Code Size
• Start and Sync Jobs Only Once Per Frame
Slide 13
Limitations
• Limited Number of Collision Triangles per Physics Object
Triangle Cache is a fixed size, therefore environment collision must be optimized to prevent
exceeding the buffer size, which can result in thrown away geometry. This means that while
the game is in development it’s possible that objects might fall through the floor or pass
through a wall if there are too many triangles to cache. However, we can make this type of
assumption because some of our other systems rely on optimized environment collision as
well
• Limited Number of Object Interactions per Job
Object Cache is a fixed size, meaning that there is a limitation on the amount of object to
object interaction.
• Limit on Total Number of Rigid Bodies per Physics Object
The number of rigid bodies that a physics object can contain is limited by a fixed size.
Cautious rigging of physics object is required as a result.
Slide 14
Limitations
• Limited Number of Constraints per Job
The maximum number of simultaneously solved constraints is limited to a fixed size. I.e.
there is a certain limit on how many ragdolls (which on average are our most joint heavy
objects) can be stacked by default. However there are tricks used to stack a virtually
unlimited amount of objects, in which case some interactions can appear to be “non-
physical”.
• Limited Number of Contact Points per Job
Number of contact points is fixed per job (contact points are generated every iteration).
Because of our other limitations, this doesn’t present a problem.
• Not Possible to Run Collision and Simulation Separately
Our Physics collision and simulation live together, so only using collision or simulation
separately isn’t possible.
Slide 15
Benefits
• Discrete Pipeline Allows Heavy Optimization
Because we know our limits, the data transformation functions don’t have to be generalized,
allowing for deep optimizations.
• No Intermediate Data Swapping Between PPU and SPU
Once a job is in flight, intermediate data doesn’t’ have to be sent back to the PPU to make
more room in local store because we’ve already pre-calculated our memory requirements.
Additionally we take advantage of the fact that we can upload code and data whenever we
want, freeing up valuable local store when we need it.
• Physics Jobs Are Completely Independent
Physics jobs aren’t dependent on other physics jobs, therefore once a pool of objects has
been created, we can send the job any time we want, and retrieve the results any time we
want.
• Fixed PPU Memory Size Eliminates Dynamic Allocation from SPU
The system doesn’t need to allocate more resources than it’s prepared for, so the job never
needs to requiest that a PPU thread allocate more memory in case something needs to be
cached.
Slide 16
Benefits
• Implemented and Optimized for a Specific Platform
The system is designed around a specific platform (PS), and built around the limitations of
the game engine. Any changes to the engine result in customized changes to the physics
system, whether it be lifting restrictions, or adding more. Our pipeline is relatively simple so
any system-wide changes are not that much of a hassle. (such as the addition of new joint
types or collision functions). In the worst case scenario, we break up our functions even
further to free up more local store for an increased data requirement.
• No Restrictions on Code Size
Since the majority of the code is “shader” based, we don’t have to take into consideration
local store restrictions when adding more functions. I.e. adding 5 more collision or jacobian
building functions that are 10k a piece on average makes no difference. Currently, all of the
SPU code from the physics system exceeds 100k, but it doesn’t affect our local store limits.
• Start and Sync Jobs Only Once Per Frame
We only need to sync once since our physics collision and simulation live together (see
limitations ), and we can sync anywhere we want.
Slide 17
Debugging Techniques
• Asserts for catchable cases, which print basic information about the crash. File, Line,
Function, etc…
• Each function entry point is wrapped with a macro that inserts the Function name,
File, Line into a buffer that is at a fixed address in local store.
• A buffer for each SPU exists on the PPU that is large enough to accommodate 4
levels of call-stack data.
• With TRAP_CALLSTACK enabled, at function entry, the UPDATE_CALLSTACK macro
will call a stubbed function that uploads the current call-stack buffer from the PPU,
modifies it, then sends it back.
• In the event of a crash, we time-out trying to sync on the PPU, which will result in
dumping the call-stack data to the TTY.
Slide 18
Debugging Techniques
• In addition to keeping track of function calls, there is also an option to DMA function
input data to the PPU in roughly the same manner as the call-stack.
• Since most of the functions are shaders, they can be built independently and run on a
PS3 Linux box.
• In the event of a crash, function input data that has been sent to the PPU can now be
used as input to the same function, which can now be debugged in isolation with the
conditions that caused the crash.
• These techniques are still in their infancy but can be very useful in tracking down
previously very painful bugs.
Slide 19
Bullet Physics System
Free open source SDK at http://bulletphysics.com
Contact Erwin Coumans for Playstation 3 optimized version.
Feel free to ask questions at any time.
Slide 20
Portable SPURS Tasks
• Porting the implementation for
– Local Store memory
– DMA transfers
– Threading
– Synchronization
• Supported threading environments:
– PS3 Cell SPURS Task
– PS3 Cell Libspe2 (IBM Cell Blade, PS3 Linux)
– Windows/Xbox 360 Threads
– etc
Slide 21
Update
Overlapping Pairs
Dispatch
Overlapping Pairs
Update
Rigid Bodies
PPU Work
Dispatch
Collision Pairs
Gather
Collision Info
Find Overlapping
Triangles
Build Islands &
Hash Cells Setup Constraints
Upload Collision Pair
Upload Transforms
Upload Collision Shape
PPU SPU Execution
Upload Contact Cache
Upload Tree Parts
Traverse AABB Tree
Upload Triangle
Collide Primitives
Upload Rigid Body
Generic GJK Collision
Integrate
Send Contact Cache
Apply Forces/
Predict Transform
Send Packed Data
Pack Solver Bodies
Setup Solver Body
Send Bodies
Solve Constraints
Send Velocities
Upload Contact Caches
Upload Joints
Find Next Cell
Upload Packed Data
Unpack Bodies
Solve Constraints
Write Velocities
Solve
Constraints
PPU Work
Sync
Collision Tasks
Sync
Physics Tasks
Prepare Constraints
Send Packed ConstraintsFor Each Solver Iteration
Synchronize SPUs
Bullet Physics Pipeline
Slide 22
Update
Overlapping Pairs
Dispatch
Overlapping Pairs
Update
Rigid Bodies
PPU Work
Build Islands &
Hash Cells
PPU
Integrate
Apply Forces/
Predict Transform
Solve
Constraints
PPU Work
Sync
Collision Tasks
Sync
Physics Tasks
•Predict transforms for
swept / continuous collisions
•Persistent overlapping Pair Cache
•Only track changes
•Perform Collision Filtering
•PPU allocates contact cache
•Each frame, points can be
added, modified/removed •PPU fallback for cases that
are not implemented on SPU yet
•PPU collision callbacks
•Use spatial hash function to divide bodies & constraints into cells
•Build dependency table between cells, using constraint connectivity
•Use islands to determine (de)activation state
•Option to use time of impact to avoid missing collisions
(for fast / small moving objects)
Bullet Physics Pipeline
Slide 23
Upload Tree Parts
Perform Intersection Test
•Perform stackless tree traversal
For each Collision Pair involving Triangle Meshes
•Interleave search with triangle-intersection test
•All code fits in SPU memory
•Fast acceleration structure to find bounding volume overlap
•Only DMA Tree Part if necessary
•Perform a triangle versus convex primitive intersection test
•Generic GJK Intersection test reduces code size
•Throw away triangle after intersection test
DMA contact point caches from/to main memory
Traverse AABB Tree
Upload Triangle •DMA the 3 triangle indices and vertices
•Use software cache re-use shared vertices
Collide Triangles
Slide 24
•Most primitive collision shapes are fixed size
•Convex meshes need additional temporary storage
For Each Overlapping Pair of Convex Shapes
•Each combination of convex shape uses generic algorithm
•Sphere, OBB, Capsule, Cylinder, Cone, Convex Hull etc..
•For deep penetrations use simplified algorithm that reduces
code size and memory usage
•Perform contact point cache reduction on SPU
Upload Collision Pair
Upload Transforms
Upload Collision Shape
Upload Contact Cache
Generic GJK Collision
Send Contact Cache
DMA contact point caches from/to main memory
Collide Primitives
Slide 25
For each Rigid Body in the Cell, create a packed Solver Body
Keep a mapping between them, by index
For Each Simulation Cell
•Pack each joint in the Cell, referencing the packed solver body
•Build Jacobian data for each constraint
Stream Rigid Bodies and Constraints
Upload Rigid Body
Setup Solver Body
Send Bodies
Upload Contact Caches
Upload Joints
Prepare Constraints
Send Packed Constraints
DMA Packed Bodies and Constraints to main memory
Pack Rigid Body Data & Setup Constraints
Slide 26
•Make sure each SPU is processing the same solver iteration
•Use spinlock to wait for other SPUs
•Upload Packed bodies and constraints for this Cell
•Perform one solver iteration on all constraints
•Send packed bodies and constraints to main memory
•Find the next available Cell in main memory
•Use Dependency Table to avoid conflicts with other SPUs
•Update Cell status
Send Packed Data
Find Next Cell
Upload Packed Data
Solve Constraints
Synchronize SPUs
For Each Solver Iteration
Send Updated Body Velocities to main memory
Simulate Cells
Slide 27
Summary Bullet Physics
Limitations
• Code has to fit entirely in SPU memory
• Generic convex collision algorithm performs a bit less then specific
• Requires Intermediate Data Swapping Between PPU and SPU
• Generic approach requires synchronization between SPUs
• Contact points need to be stored in main memory
Benefits
• Multi-platform makes development and debugging easy
• SPURS Task ports well to Win32/Xbox 360 threads (fits L2 cache)
• Easy to extend with new custom collision algorithms
• No Restrictions on Data Sizes
• Automatic broadphase allows arbitrary environments
• Fixed PPU Memory Size Eliminates Dynamic Allocation from SPU
• Flexible use of Collision and Simulation Seperately
Slide 28
Limitations
• Code has to fit entirely in SPU memory
Although we carefully chose compact algorithms, the 256kb size
restriction can hinder development. Debugging the SPU can be a
challenge (only add debug information for some files). SPU
Shaders or overlays would help.
• Generic convex collision algorithm performs a bit less then
specific
Especially for very simple cases like sphere versus sphere, the
computational cost is higher. However, for more complex cases,
the cost is closer to an optimized special case.
Slide 29
Limitations
• Generic approach requires Intermediate Data Swapping Between PPU
and SPU and additional synchronization
Dealing with unrestricted number of collision shapes, rigid bodies and
constraints cost performance. However, hiding DMA latency and parallel
processing on SPUs still gives good performance.
• Contact points need to be stored in main memory
The generic convex collision detection only computes one contact point at a
time. Stable stacking in rigid body dynamics requires multiple contact points.
Our current solution is to maintain a persistent contact cache and add,
refresh, remove contact points. Other solutions exist, that compute all
contact points at a time. On the flipside, keeping contact points allows to
store additional information that can improve constraint solver quality (warm
starting), and easier access to the forces/impulses applied by the solver.
Slide 30
Benefits
• Multi-platform makes development and debugging easy
At any stage of development it is possible to debug or prototype
improvements on other platforms. Sharing a common code-base reduces
maintenance.
• SPURS Task ports well to Win32/Xbox 360 threads
Porting a SPURS Task to Win32/XBox360 can be very smooth. The data
access is cache friendly. Each thread creates its own local copy of the Local
Store memory. To avoid a memcpy, a read-only DMA returns the original
address.
• Easy to extend with new custom collision algorithms
Instead of creating N new collision algorithms for each collision type, we only
need to implement a single support function. We can also add or improve
additional collision shapes using combinations of compounds, scaling,
translation and rotation etc..
Slide 31
Benefits
• No Restrictions on Data Sizes
Cell SPUs have enough power to deal with many interacting rigid bodies. Our dynamic
splitting method using dependencies simulates extremely large simulation islands correctly.
This means the SPUs can handle huge concave triangle meshes with local deformation.
• Automatic broadphase allows arbitrary environments
Large outdoor environment or dynamic changing environments can be simulated efficiently
using the incremental broadphase. This approach doesn’t require any manual partitioning.
• Fixed PPU Memory Size Eliminates Dynamic Allocation from SPU
We allocate all memory on the PPU and avoid dynamic allocations using memory pools and
stack allocators.
• Flexible use of Collision and Simulation Seperately
The SPU collision detection system can be re-used for synchronous collision queries by the
AI system or game logic. For example batches of ray tests for line-of-sight, swept collision
checks for player movement or camera positioning etc.
Slide 32
Debugging Techniques
• Since the SPURS Task also runs multiplatform, we can
debug under Windows
– Easier to detect Data and DMA alignment issues
– Avoid forking implementation
• Since the SPURS Task also runs under Libspe2, we can
prototype in a very similar environment on a PS3 Linux box
• Asserts for catchable cases, which print basic information
about the crash. File, Line, Function, etc…
Slide 33
Contact info
erwin_coumans@playstation.sony.com
Sony Computer Entertainment America
ec@insomniacgames.com
Insomniac Games

More Related Content

What's hot

Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Scene Graphs & Component Based Game Engines
Scene Graphs & Component Based Game EnginesScene Graphs & Component Based Game Engines
Scene Graphs & Component Based Game EnginesBryan Duggan
 
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGLOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGijccsa
 
Killzone Shadow Fall: Creating Art Tools For A New Generation Of Games
Killzone Shadow Fall: Creating Art Tools For A New Generation Of GamesKillzone Shadow Fall: Creating Art Tools For A New Generation Of Games
Killzone Shadow Fall: Creating Art Tools For A New Generation Of GamesGuerrilla
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsHPCC Systems
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudQian Lin
 

What's hot (7)

Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Scene Graphs & Component Based Game Engines
Scene Graphs & Component Based Game EnginesScene Graphs & Component Based Game Engines
Scene Graphs & Component Based Game Engines
 
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGLOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
 
Killzone Shadow Fall: Creating Art Tools For A New Generation Of Games
Killzone Shadow Fall: Creating Art Tools For A New Generation Of GamesKillzone Shadow Fall: Creating Art Tools For A New Generation Of Games
Killzone Shadow Fall: Creating Art Tools For A New Generation Of Games
 
Moving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC SystemsMoving Toward Deep Learning Algorithms on HPCC Systems
Moving Toward Deep Learning Algorithms on HPCC Systems
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the Cloud
 
Soc research
Soc researchSoc research
Soc research
 

Similar to SPU Physics

[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient SimulationDongwonSon1
 
Insomniac Physics
Insomniac PhysicsInsomniac Physics
Insomniac PhysicsSlide_N
 
Unity - Internals: memory and performance
Unity - Internals: memory and performanceUnity - Internals: memory and performance
Unity - Internals: memory and performanceCodemotion
 
Memory Management & Garbage Collection
Memory Management & Garbage CollectionMemory Management & Garbage Collection
Memory Management & Garbage CollectionAbhishek Sur
 
GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay
GDC 2010 - A Dynamic Component Architecture for High Performance GameplayGDC 2010 - A Dynamic Component Architecture for High Performance Gameplay
GDC 2010 - A Dynamic Component Architecture for High Performance GameplayTerrance Cohen
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
More Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoMore Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoKota Tsuyuzaki
 
GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...
GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...
GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...Terrance Cohen
 
Cloud Native Compiler
Cloud Native CompilerCloud Native Compiler
Cloud Native CompilerSimon Ritter
 
【Unite 2017 Tokyo】Unity最適化講座 ~スペシャリストが教えるメモリとCPU使用率の負担最小化テクニック~
【Unite 2017 Tokyo】Unity最適化講座 ~スペシャリストが教えるメモリとCPU使用率の負担最小化テクニック~【Unite 2017 Tokyo】Unity最適化講座 ~スペシャリストが教えるメモリとCPU使用率の負担最小化テクニック~
【Unite 2017 Tokyo】Unity最適化講座 ~スペシャリストが教えるメモリとCPU使用率の負担最小化テクニック~Unity Technologies Japan K.K.
 
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksKernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksAnne Nicolas
 
Os7 2
Os7 2Os7 2
Os7 2issbp
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engineBhuvaneshwaran R
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionSearce Inc
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Community
 
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...ORAU
 

Similar to SPU Physics (20)

[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
 
Insomniac Physics
Insomniac PhysicsInsomniac Physics
Insomniac Physics
 
Unity - Internals: memory and performance
Unity - Internals: memory and performanceUnity - Internals: memory and performance
Unity - Internals: memory and performance
 
Memory Management & Garbage Collection
Memory Management & Garbage CollectionMemory Management & Garbage Collection
Memory Management & Garbage Collection
 
GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay
GDC 2010 - A Dynamic Component Architecture for High Performance GameplayGDC 2010 - A Dynamic Component Architecture for High Performance Gameplay
GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay
 
Java On CRaC
Java On CRaCJava On CRaC
Java On CRaC
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
More Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoMore Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit Juno
 
GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...
GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...
GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...
 
Cloud Native Compiler
Cloud Native CompilerCloud Native Compiler
Cloud Native Compiler
 
【Unite 2017 Tokyo】Unity最適化講座 ~スペシャリストが教えるメモリとCPU使用率の負担最小化テクニック~
【Unite 2017 Tokyo】Unity最適化講座 ~スペシャリストが教えるメモリとCPU使用率の負担最小化テクニック~【Unite 2017 Tokyo】Unity最適化講座 ~スペシャリストが教えるメモリとCPU使用率の負担最小化テクニック~
【Unite 2017 Tokyo】Unity最適化講座 ~スペシャリストが教えるメモリとCPU使用率の負担最小化テクニック~
 
HP-3 Presentation.pptx
HP-3 Presentation.pptxHP-3 Presentation.pptx
HP-3 Presentation.pptx
 
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksKernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
 
Spock
SpockSpock
Spock
 
Os7 2
Os7 2Os7 2
Os7 2
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engine
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in Production
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
 
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
Non equilibrium Molecular Simulations of Polymers under Flow Saving Energy th...
 

More from Slide_N

New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiSlide_N
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSlide_N
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60 Slide_N
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomSlide_N
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationSlide_N
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignSlide_N
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotSlide_N
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: TheorySlide_N
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMSlide_N
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Slide_N
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionSlide_N
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerSlide_N
 
The Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesThe Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesSlide_N
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Slide_N
 
MLAA on PS3
MLAA on PS3MLAA on PS3
MLAA on PS3Slide_N
 
SPU gameplay
SPU gameplaySPU gameplay
SPU gameplaySlide_N
 
SPU Shaders
SPU ShadersSPU Shaders
SPU ShadersSlide_N
 
Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Slide_N
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War IIISlide_N
 
The Technology of Uncharted: Drake’s Fortune
The Technology of Uncharted: Drake’s FortuneThe Technology of Uncharted: Drake’s Fortune
The Technology of Uncharted: Drake’s FortuneSlide_N
 

More from Slide_N (20)

New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of Destruction
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM Power
 
The Visual Computing Revolution Continues
The Visual Computing Revolution ContinuesThe Visual Computing Revolution Continues
The Visual Computing Revolution Continues
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
 
MLAA on PS3
MLAA on PS3MLAA on PS3
MLAA on PS3
 
SPU gameplay
SPU gameplaySPU gameplay
SPU gameplay
 
SPU Shaders
SPU ShadersSPU Shaders
SPU Shaders
 
Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2Deferred Rendering in Killzone 2
Deferred Rendering in Killzone 2
 
Practical SPU Programming in God of War III
Practical SPU Programming in God of War IIIPractical SPU Programming in God of War III
Practical SPU Programming in God of War III
 
The Technology of Uncharted: Drake’s Fortune
The Technology of Uncharted: Drake’s FortuneThe Technology of Uncharted: Drake’s Fortune
The Technology of Uncharted: Drake’s Fortune
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

SPU Physics

  • 1. SPU Physics Erwin Coumans Simulation Support Lead Sony Computer Entertainment America Eric Christensen Principal Engine Programmer Insomniac Games
  • 2. Slide 2 Insomniac Physics System Feel free to ask questions at any time.
  • 3. Slide 3 SPU Shaders • Can be written for the physics system to do specific things without having to change the physics pipeline. • Built individually, resulting code lives in a header file that contains a byte array. • Debug/Read-only section resides at the bottom of the byte array. • Shaders written by gameplay programmers are registered with the physics system through a simple API. • The physics system uploads registered shaders for a particular context at a particular stage. • 2k of local store is allocated for the shader code, 512 bytes is allocated for the shader work buffer.
  • 4. Slide 4 SPU Shaders • Shaders access a common set of functions that are passed to them, as well as pre-allocated dma tags. Some functions for example: • DMA get/put • Printf • Local store allocator • Physics system has its own set of shaders that are native and expected in the pipeline, however, they are built the same way. • Native physics system shaders are only slightly optimized currently and range from 2k to 12k in size.
  • 5. Slide 5 Insomniac Physics Pipeline Environment Update Triangle Cache Update Object Cache Update Start Physics Jobs Sync Physics Jobs Update Rigid Bodies PPU Work Upload Object Cache Collide Triangles Collide Primitives Build Simulation Pools Simulate Pools Post Update For Each Iteration Upload Tri-Cache Upload RB Prims Upload Intersect Funcs PPU SPU Execution Intersection Tests Upload CO Prims Upload Intersect Funcs Intersection Tests Upload Physics Joints Sort Joint Types Per Joint Type Upload Jacobian Generation Code Calculate Jacobian Data Solve Constraints Integrate For Each Physics Object Upload Anim Joints Transform Anim Joints Using Rigid Body Data Send Update To PPU
  • 6. Slide 6 Insomniac Physics Pipeline Environment Update Triangle Cache Update Object Cache Update Start Physics Jobs Sync Physics Jobs Update Rigid Bodies PPU Work PPU •Called every frame •Environments for: •Full Simulation •Cheap Simulation •IK Each object has its own small triangle cache. The cache volume is sized based on object velocity. Resampling of triangles only occurs on volume breach. Potentially colliding objects are grouped by proximity. The group volume also samples non-physics objects in order to collide with Their primitives. A Physics Job is submitted for each physics object group so that they can collide and simulate. For instance: A stack of boxes would be grouped together and sent as one job. Do as much work as possible on the PPU before needing to sync. Physics jobs can even be synced on the next frame. Wait for Physics jobs to finish. In most cases the wait time is (if anything) very small. Runtime rigid body data is updated in order to be accessed by the game on the PPU side. Any modifications made directly to rigid bodies (such as mass changes, etc..) will be used in the next frame’s job submission.
  • 7. Slide 7 Collide Triangles Upload Tri-Cache Upload Rigid Body Primitives Upload Triangle Intersection Code Perform Intersection Tests •For each physics object, upload the collision primitives for each rigid body contained in the object. •Sort the primitives by type. For Each Physics Object (which can contain many rigid bodies i.e. Ragdolls) Allocate local store for triangle cache. Tri-cache size is fixed, so this only needs to occur once. •Upload the corresponding triangle cache. •It goes in the throw-away memory allocated for all tri-caches. •Local store is allocated as needed for intersection code. •For each list of primitive types, upload the corresponding code. i.e. Sphere, Capsule, OBB vs Trimesh •Perform intersection and contact point generation. Roll back local store to beginning of Pre-allocated contact point buffer.
  • 8. Slide 8 Collide Primitives Upload Rigid Body Primitives Upload Triangle Intersection Code Perform Intersection Tests •For each physics object, upload the collision primitives for each rigid body contained in the object. •Sort the primitives by type. For Each Physics Object •Local store is allocated as needed for intersection code. •For each list of primitive types, upload the corresponding code. i.e. Sphere vs. Sphere, Capsule vs. OBB, OBB vs OBB etc.. •Perform intersection and contact point generation. Roll back local store to beginning of Pre-allocated contact point buffer.
  • 9. Slide 9 Upload Rigid Body Data & Joints Upload Rigid Bodies Store Rigid Bodies data in its corresponding local store buffer. Insert Rigid bodies into sort list nodes. For Each Physics Object Store Joint data in it’s corresponding local store buffer. Insert Joints into sort list nodes. Allocate local store for all Rigid Bodies and Joints Upload Joints Build Simulation Pools Associate Rigid Bodies Each Joint (including contact points) has a relationship with one or two Rigid Bodies. Pools are built based on Rigid Body connectivity. Each Joint is also inserted into it’s respective simulation pool.
  • 10. Slide 10 Simulate Pools Sort Joints by Type For each Simulation Pool “package” •Joints are sorted based on constraint type. i.e. Single Axis Hinge, Dual Axis Hinge, Spring, etc… •The constraint type id references a table that describes the address and size of the corresponding Jacobian generation code. Generate Jacobian Data •For each constraint type •Allocate local store for size of code fragment •Upload code fragment for constraint. i.e. ContactPointBuildJD •After generating Jacobian data, roll back local store to make room for next code fragment. Allocate local store for solver data. Size is determined by number of Rigid Bodies in the pool as well as the number of DOF removed from the system. Solve Constraints Iterative LCP solver to generate corrective forces. Integrate Update rigid body transform.
  • 11. Slide 11 Post Update Upload Animation Joint data For Each Physics Object •Allocate local store for animation data. •Matrix array •Mapping of physics joints to animation joints Transform Animation Joints •Update animation joints that are mapped to rigid bodies using current Rigid Body transforms. •This also includes updating “Game Actor” world-space matrix DMA Results to PPU •Send updated Rigid Body transforms. •Send updated animation joints. •Send updated “Game Actor” world-space matrix. Rollback All Local Store to the end of the contact point buffer for the next iteration.
  • 12. Slide 12 Summary Insomniac Physics Limitations • Limited Number of Collision Triangles per Physics Object • Limited Number of Object Interactions per Job • Limit on Number of Constraints per Job • Limited Number of Contact Points per Job • Not Possible to Run Collision and Simulation Seperately Benefits • Discrete Pipeline Allows heavy Optimization • No Intermediate Data Swapping Between PPU and SPU • Physics Jobs Are Completely Independent • Fixed PPU Memory Size Eliminates Dynamic Allocation from SPU • Implemented and Optimized for a Specific Platform • No Restrictions on Code Size • Start and Sync Jobs Only Once Per Frame
  • 13. Slide 13 Limitations • Limited Number of Collision Triangles per Physics Object Triangle Cache is a fixed size, therefore environment collision must be optimized to prevent exceeding the buffer size, which can result in thrown away geometry. This means that while the game is in development it’s possible that objects might fall through the floor or pass through a wall if there are too many triangles to cache. However, we can make this type of assumption because some of our other systems rely on optimized environment collision as well • Limited Number of Object Interactions per Job Object Cache is a fixed size, meaning that there is a limitation on the amount of object to object interaction. • Limit on Total Number of Rigid Bodies per Physics Object The number of rigid bodies that a physics object can contain is limited by a fixed size. Cautious rigging of physics object is required as a result.
  • 14. Slide 14 Limitations • Limited Number of Constraints per Job The maximum number of simultaneously solved constraints is limited to a fixed size. I.e. there is a certain limit on how many ragdolls (which on average are our most joint heavy objects) can be stacked by default. However there are tricks used to stack a virtually unlimited amount of objects, in which case some interactions can appear to be “non- physical”. • Limited Number of Contact Points per Job Number of contact points is fixed per job (contact points are generated every iteration). Because of our other limitations, this doesn’t present a problem. • Not Possible to Run Collision and Simulation Separately Our Physics collision and simulation live together, so only using collision or simulation separately isn’t possible.
  • 15. Slide 15 Benefits • Discrete Pipeline Allows Heavy Optimization Because we know our limits, the data transformation functions don’t have to be generalized, allowing for deep optimizations. • No Intermediate Data Swapping Between PPU and SPU Once a job is in flight, intermediate data doesn’t’ have to be sent back to the PPU to make more room in local store because we’ve already pre-calculated our memory requirements. Additionally we take advantage of the fact that we can upload code and data whenever we want, freeing up valuable local store when we need it. • Physics Jobs Are Completely Independent Physics jobs aren’t dependent on other physics jobs, therefore once a pool of objects has been created, we can send the job any time we want, and retrieve the results any time we want. • Fixed PPU Memory Size Eliminates Dynamic Allocation from SPU The system doesn’t need to allocate more resources than it’s prepared for, so the job never needs to requiest that a PPU thread allocate more memory in case something needs to be cached.
  • 16. Slide 16 Benefits • Implemented and Optimized for a Specific Platform The system is designed around a specific platform (PS), and built around the limitations of the game engine. Any changes to the engine result in customized changes to the physics system, whether it be lifting restrictions, or adding more. Our pipeline is relatively simple so any system-wide changes are not that much of a hassle. (such as the addition of new joint types or collision functions). In the worst case scenario, we break up our functions even further to free up more local store for an increased data requirement. • No Restrictions on Code Size Since the majority of the code is “shader” based, we don’t have to take into consideration local store restrictions when adding more functions. I.e. adding 5 more collision or jacobian building functions that are 10k a piece on average makes no difference. Currently, all of the SPU code from the physics system exceeds 100k, but it doesn’t affect our local store limits. • Start and Sync Jobs Only Once Per Frame We only need to sync once since our physics collision and simulation live together (see limitations ), and we can sync anywhere we want.
  • 17. Slide 17 Debugging Techniques • Asserts for catchable cases, which print basic information about the crash. File, Line, Function, etc… • Each function entry point is wrapped with a macro that inserts the Function name, File, Line into a buffer that is at a fixed address in local store. • A buffer for each SPU exists on the PPU that is large enough to accommodate 4 levels of call-stack data. • With TRAP_CALLSTACK enabled, at function entry, the UPDATE_CALLSTACK macro will call a stubbed function that uploads the current call-stack buffer from the PPU, modifies it, then sends it back. • In the event of a crash, we time-out trying to sync on the PPU, which will result in dumping the call-stack data to the TTY.
  • 18. Slide 18 Debugging Techniques • In addition to keeping track of function calls, there is also an option to DMA function input data to the PPU in roughly the same manner as the call-stack. • Since most of the functions are shaders, they can be built independently and run on a PS3 Linux box. • In the event of a crash, function input data that has been sent to the PPU can now be used as input to the same function, which can now be debugged in isolation with the conditions that caused the crash. • These techniques are still in their infancy but can be very useful in tracking down previously very painful bugs.
  • 19. Slide 19 Bullet Physics System Free open source SDK at http://bulletphysics.com Contact Erwin Coumans for Playstation 3 optimized version. Feel free to ask questions at any time.
  • 20. Slide 20 Portable SPURS Tasks • Porting the implementation for – Local Store memory – DMA transfers – Threading – Synchronization • Supported threading environments: – PS3 Cell SPURS Task – PS3 Cell Libspe2 (IBM Cell Blade, PS3 Linux) – Windows/Xbox 360 Threads – etc
  • 21. Slide 21 Update Overlapping Pairs Dispatch Overlapping Pairs Update Rigid Bodies PPU Work Dispatch Collision Pairs Gather Collision Info Find Overlapping Triangles Build Islands & Hash Cells Setup Constraints Upload Collision Pair Upload Transforms Upload Collision Shape PPU SPU Execution Upload Contact Cache Upload Tree Parts Traverse AABB Tree Upload Triangle Collide Primitives Upload Rigid Body Generic GJK Collision Integrate Send Contact Cache Apply Forces/ Predict Transform Send Packed Data Pack Solver Bodies Setup Solver Body Send Bodies Solve Constraints Send Velocities Upload Contact Caches Upload Joints Find Next Cell Upload Packed Data Unpack Bodies Solve Constraints Write Velocities Solve Constraints PPU Work Sync Collision Tasks Sync Physics Tasks Prepare Constraints Send Packed ConstraintsFor Each Solver Iteration Synchronize SPUs Bullet Physics Pipeline
  • 22. Slide 22 Update Overlapping Pairs Dispatch Overlapping Pairs Update Rigid Bodies PPU Work Build Islands & Hash Cells PPU Integrate Apply Forces/ Predict Transform Solve Constraints PPU Work Sync Collision Tasks Sync Physics Tasks •Predict transforms for swept / continuous collisions •Persistent overlapping Pair Cache •Only track changes •Perform Collision Filtering •PPU allocates contact cache •Each frame, points can be added, modified/removed •PPU fallback for cases that are not implemented on SPU yet •PPU collision callbacks •Use spatial hash function to divide bodies & constraints into cells •Build dependency table between cells, using constraint connectivity •Use islands to determine (de)activation state •Option to use time of impact to avoid missing collisions (for fast / small moving objects) Bullet Physics Pipeline
  • 23. Slide 23 Upload Tree Parts Perform Intersection Test •Perform stackless tree traversal For each Collision Pair involving Triangle Meshes •Interleave search with triangle-intersection test •All code fits in SPU memory •Fast acceleration structure to find bounding volume overlap •Only DMA Tree Part if necessary •Perform a triangle versus convex primitive intersection test •Generic GJK Intersection test reduces code size •Throw away triangle after intersection test DMA contact point caches from/to main memory Traverse AABB Tree Upload Triangle •DMA the 3 triangle indices and vertices •Use software cache re-use shared vertices Collide Triangles
  • 24. Slide 24 •Most primitive collision shapes are fixed size •Convex meshes need additional temporary storage For Each Overlapping Pair of Convex Shapes •Each combination of convex shape uses generic algorithm •Sphere, OBB, Capsule, Cylinder, Cone, Convex Hull etc.. •For deep penetrations use simplified algorithm that reduces code size and memory usage •Perform contact point cache reduction on SPU Upload Collision Pair Upload Transforms Upload Collision Shape Upload Contact Cache Generic GJK Collision Send Contact Cache DMA contact point caches from/to main memory Collide Primitives
  • 25. Slide 25 For each Rigid Body in the Cell, create a packed Solver Body Keep a mapping between them, by index For Each Simulation Cell •Pack each joint in the Cell, referencing the packed solver body •Build Jacobian data for each constraint Stream Rigid Bodies and Constraints Upload Rigid Body Setup Solver Body Send Bodies Upload Contact Caches Upload Joints Prepare Constraints Send Packed Constraints DMA Packed Bodies and Constraints to main memory Pack Rigid Body Data & Setup Constraints
  • 26. Slide 26 •Make sure each SPU is processing the same solver iteration •Use spinlock to wait for other SPUs •Upload Packed bodies and constraints for this Cell •Perform one solver iteration on all constraints •Send packed bodies and constraints to main memory •Find the next available Cell in main memory •Use Dependency Table to avoid conflicts with other SPUs •Update Cell status Send Packed Data Find Next Cell Upload Packed Data Solve Constraints Synchronize SPUs For Each Solver Iteration Send Updated Body Velocities to main memory Simulate Cells
  • 27. Slide 27 Summary Bullet Physics Limitations • Code has to fit entirely in SPU memory • Generic convex collision algorithm performs a bit less then specific • Requires Intermediate Data Swapping Between PPU and SPU • Generic approach requires synchronization between SPUs • Contact points need to be stored in main memory Benefits • Multi-platform makes development and debugging easy • SPURS Task ports well to Win32/Xbox 360 threads (fits L2 cache) • Easy to extend with new custom collision algorithms • No Restrictions on Data Sizes • Automatic broadphase allows arbitrary environments • Fixed PPU Memory Size Eliminates Dynamic Allocation from SPU • Flexible use of Collision and Simulation Seperately
  • 28. Slide 28 Limitations • Code has to fit entirely in SPU memory Although we carefully chose compact algorithms, the 256kb size restriction can hinder development. Debugging the SPU can be a challenge (only add debug information for some files). SPU Shaders or overlays would help. • Generic convex collision algorithm performs a bit less then specific Especially for very simple cases like sphere versus sphere, the computational cost is higher. However, for more complex cases, the cost is closer to an optimized special case.
  • 29. Slide 29 Limitations • Generic approach requires Intermediate Data Swapping Between PPU and SPU and additional synchronization Dealing with unrestricted number of collision shapes, rigid bodies and constraints cost performance. However, hiding DMA latency and parallel processing on SPUs still gives good performance. • Contact points need to be stored in main memory The generic convex collision detection only computes one contact point at a time. Stable stacking in rigid body dynamics requires multiple contact points. Our current solution is to maintain a persistent contact cache and add, refresh, remove contact points. Other solutions exist, that compute all contact points at a time. On the flipside, keeping contact points allows to store additional information that can improve constraint solver quality (warm starting), and easier access to the forces/impulses applied by the solver.
  • 30. Slide 30 Benefits • Multi-platform makes development and debugging easy At any stage of development it is possible to debug or prototype improvements on other platforms. Sharing a common code-base reduces maintenance. • SPURS Task ports well to Win32/Xbox 360 threads Porting a SPURS Task to Win32/XBox360 can be very smooth. The data access is cache friendly. Each thread creates its own local copy of the Local Store memory. To avoid a memcpy, a read-only DMA returns the original address. • Easy to extend with new custom collision algorithms Instead of creating N new collision algorithms for each collision type, we only need to implement a single support function. We can also add or improve additional collision shapes using combinations of compounds, scaling, translation and rotation etc..
  • 31. Slide 31 Benefits • No Restrictions on Data Sizes Cell SPUs have enough power to deal with many interacting rigid bodies. Our dynamic splitting method using dependencies simulates extremely large simulation islands correctly. This means the SPUs can handle huge concave triangle meshes with local deformation. • Automatic broadphase allows arbitrary environments Large outdoor environment or dynamic changing environments can be simulated efficiently using the incremental broadphase. This approach doesn’t require any manual partitioning. • Fixed PPU Memory Size Eliminates Dynamic Allocation from SPU We allocate all memory on the PPU and avoid dynamic allocations using memory pools and stack allocators. • Flexible use of Collision and Simulation Seperately The SPU collision detection system can be re-used for synchronous collision queries by the AI system or game logic. For example batches of ray tests for line-of-sight, swept collision checks for player movement or camera positioning etc.
  • 32. Slide 32 Debugging Techniques • Since the SPURS Task also runs multiplatform, we can debug under Windows – Easier to detect Data and DMA alignment issues – Avoid forking implementation • Since the SPURS Task also runs under Libspe2, we can prototype in a very similar environment on a PS3 Linux box • Asserts for catchable cases, which print basic information about the crash. File, Line, Function, etc…
  • 33. Slide 33 Contact info erwin_coumans@playstation.sony.com Sony Computer Entertainment America ec@insomniacgames.com Insomniac Games