SlideShare a Scribd company logo
1 of 73
Download to read offline
Insomniac Physics
Eric Christensen
GDC 2009
Overview
• Go over the evolution of IG physics
system
• Shaders
• Library Shaders
• Custom event shaders
Original Design
Resistance: Fall of Man
• Ported From PC to PS3
• PPU Heavy
• SPU Processes Blocked
• Two Jobs (Collision, Simulation)
• Simulation Jobs too memory heavy
dispatched to PPU version.
• Expensive
Original Design
Resistance: Fall of Man
Physics Update
Gather Potentially
Colliding Objects
Original Design
Resistance: Fall of Man
Physics Update
Cache Collision
Geometry
Original Design
Resistance: Fall of Man
Physics Update
Run
SPU Collision Jobs
Original Design
Resistance: Fall of Man
Physics Update
Sync
Original Design
Resistance: Fall of Man
Physics Update
Process Contact
Constraints
Original Design
Resistance: Fall of Man
Physics Update
Create Simulation
Pools
Original Design
Resistance: Fall of Man
Physics Update
Run
SPU Simulation Jobs
Original Design
Resistance: Fall of Man
Physics Update
Run Simulations
Too Big For SPU!
Original Design
Resistance: Fall of Man
Physics Update
Sync
Original Design
Resistance: Fall of Man
Physics Update
Process Results
Original Design
Resistance: Fall of Man
Physics Update
Call Events
Original Design
Resistance: Fall of Man
Physics Update
Update Joints
Original Design
Resistance: Fall of Man
• Simulation Jobs Ran as Pools were
generated.
• PPU Simulation Jobs ran concurrently with
the SPU Simulation Jobs
• This was the ONLY asynchronous benefit!
• Not much!
Original Design
Resistance: Fall of Man
• Physics had the largest impact on frame
rate
• Pipeline design made it difficult to reliably
optimize
• There was A LOT to learn
Phase 2
Ratchet & Clank Future
• Collision and Simulation run in a single
SPU Job
• Single sync-point
• Large PPU window from start of Job to
End of Job
• Use of Physics Shaders
Physics Update
PPU Work
Phase 2
Ratchet & Clank Future
Gather Potentially Colliding Objects
Physics Update
PPU Work
Phase 2
Ratchet & Clank Future
Cache Collision Geometry
Physics Update
PPU Work
Phase 2
Ratchet & Clank Future
Start Physics SPU Jobs
Physics Update
PPU Work
Phase 2
Ratchet & Clank Future
Do Collision
Physics Update
PPU Work
Phase 2
Ratchet & Clank Future
Generate Simulation Pools
Physics Update
PPU Work
Phase 2
Ratchet & Clank Future
Simulate
Physics Update
PPU Work
Phase 2
Ratchet & Clank Future
Update Joints
Physics Update
PPU Work
Phase 2
Ratchet & Clank Future
DMA Results
Physics Update
PPU Work
Phase 2
Ratchet & Clank Future
Sync
Physics Update
PPU Work
Phase 2
Ratchet & Clank Future
Update Events
Phase 2
Ratchet & Clank Future
• Shaders helped free up local store
• Each big component had it’s own set of
shaders
• Constraints
• Solvers
• User customized data transformation
Physics Intersection Shaders
Example Function Prototype
unsigned int SphereOBB(const CollPrim &a, const CollPrim &b, CollResult *results)
• Shaders are loaded into local store during
the collision process and called via a
function table using a mask created by
geometry ID
• Rollback local store when done
• Savings of up to 70k of local store usage
Physics Jacobian Shaders
Example Function Prototype
unsigned int BuildJDBall(Constraint *c, Manifold *m, RigidBody *rblist, float fps,
float error, float dscale, Jacobian *jlist,
CommonTrig *trig_funcs, CommonFunc *common_funcs,
ConstraintFunc *constraint_util);
• An example of a shader being called from
another shader
• Constraints are sorted by type, then the
corresponding shader is loaded to process a
group of like constraints
• Saves us roughly 100k!
• We can add more constraint types without
worrying about impact on kernel size
Physics Jacobian Shaders
Example Function Prototype
unsigned int BuildJDBall(Constraint *c, Manifold *m, RigidBody *rblist, float fps,
float error, float dscale, Jacobian *jlist,
CommonTrig *trig_funcs, CommonFunc *common_funcs,
ConstraintFunc *constraint_util);
• CommonTrig contains pointers to
trigonometry functions that live in the main
physics kernel
• Sin, Cos, ACos, Atan, etc…
• Any optimizations will benefit the shaders
without having to re-build them
Physics Jacobian Shaders
Example Function Prototype
unsigned int BuildJDBall(Constraint *c, Manifold *m, RigidBody *rblist, float fps,
float error, float dscale, Jacobian *jlist,
CommonTrig *trig_funcs, CommonFunc *common_funcs,
ConstraintFunc *constraint_util);
• CommonFunc contains pointers to
standard functions stored in the physics
kernel
• Printf, Dma(get,put), etc…
Physics Jacobian Shaders
Example Function Prototype
unsigned int BuildJDBall(Constraint *c, Manifold *m, RigidBody *rblist, float fps,
float error, float dscale, Jacobian *jlist,
CommonTrig *trig_funcs, CommonFunc *common_funcs,
ConstraintFunc *constraint_util);
• ConstraintFunc contains pointers to constraint
utility functions that live in the physics kernel
• Generating test vectors for limits
• Constraint smoothing
• Shared between all constraints that have limits
so optimization is a great benefit
Physics Solver Shaders
Example Function Prototype
void SolverSim(SimPool *sim_pool, Manifold *m, char *dimensions, int *jd_build_ea,
int *jd_build_size, ManagedLS *allocator, CommonFunc *common_funcs,
CommonTrig *trig_funcs, ConstraintFunc *constraint_util);
• One of many solver shaders that get
loaded by the main physics kernel
• Full Simulation, IK, or “cheap” objects
• jd_build_ea/size tells us about our
Jacobian functions (where they live / size)
• Local store allocator provided for scratch
Custom Event Shaders
• Anyone can author their own custom event
shader for physics
• Currently we have two custom event
shaders.
• The physics kernel passes common
functions and a list of DMA tags
Custom Event Shaders
• Work memory is passed from to kernel to
accommodate any temporary data.
Currently this is 2k
• Shader author can DMA new data to a
PPU buffer of choice
Phase 3
Resistance 2
• Immediate and Deferred Modes
• Constraint Data Streaming
• Using library shaders for collision
Physics Update
PPU
Work PPU Work
Phase 3
Resistance 2
PPU Work
Create Entity (moby) Lists
Cache Collision Geometry
[Immediate Jobs]
Physics Update
PPU
Work PPU Work
Phase 3
Resistance 2
PPU Work
Start Immediate Jobs
Physics Update
PPU
Work PPU Work
Phase 3
Resistance 2
PPU Work
Update Immediate Physics Jobs
Physics Update
PPU
Work PPU Work
Phase 3
Resistance 2
PPU Work
Sync Immediate Physics Jobs
Physics Update
PPU
Work PPU Work
Phase 3
Resistance 2
PPU Work
Call Events [immediate]
Physics Update
PPU
Work PPU Work
Phase 3
Resistance 2
PPU Work
Create Entity (moby) Lists
Cache Collision Geometry
[Deferred Jobs]
Physics Update
PPU
Work PPU Work
Phase 3
Resistance 2
PPU Work
Start Deferred Physics Jobs
Physics Update
PPU
Work PPU Work
Phase 3
Resistance 2
PPU Work
Update Deferred Jobs
Physics Update
PPU
Work PPU Work
Phase 3
Resistance 2
PPU Work
Sync Deferred Physics Jobs
Physics Update
PPU
Work PPU Work
Phase 3
Resistance 2
PPU Work
Call Events [deferred]
Immediate and Deferred Modes
• Physics objects that had no other
gameplay or animation based
dependencies didn’t need to finish in one
frame
• Ragdolls had a one frame immediate
update and then defaulted to deferred so
they could reflect one frame of simulation
without “popping”
Immediate and Deferred Modes
• IK is run in immediate mode because it is
being constantly being tweaked by
gameplay. Lag is not an option
• Having a deferred process improved our
frame rate immensely since the majority of
the high volume environments had “fire-
and-forget” physics objects
Constraint Data Streaming
• Even with shaders, solver could run out of
local store
• Changed the solver update so that only 8
chunks of constraint data were allocated
• Solver chews on data while DMAing next
list of constraints
Collision Shader Library
• Having multiple versions of the same type
of thing adds more work and you have to
optimize more than once.
• Not practical
• Physics native collision routines made
available to all
• Great re-use and optimization benefit
• Resistance 2 successfully shipped with
this model in place
Current Phase
• Building of physics object lists as an SPU
job
• Atomic allocation of PPU memory for
heavily used data types as well as physics
scratch memory
• Use of library shaders for broad phase
collision caching
Physics Update
P
P
U
PPU PPU
Phase 3
Resistance 2
PPU PPU PPU
P
P
U
Start Entity Gathering & Collision Caching
SPU Job [for immediate jobs]
Physics Update
Phase 3
Resistance 2
Gather Entities
Cache Collision, Pre-culling
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Physics Update
Phase 3
Resistance 2
Sync Gathering Jobs [for immediate]
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Physics Update
Phase 3
Resistance 2
Start Immediate Physics Jobs
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Physics Update
Phase 3
Resistance 2
Update Immediate Physics Jobs
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Physics Update
Phase 3
Resistance 2
Sync Immediate Jobs
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Physics Update
Phase 3
Resistance 2
Call Events [immediate]
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Physics Update
Phase 3
Resistance 2
Start Entity Gathering & Collision Caching
SPU Job [for deferred jobs]
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Physics Update
Phase 3
Resistance 2
Gather Entities
Cache Collision, Pre-culling
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Physics Update
Phase 3
Resistance 2
Sync Gathering Jobs [for deferred]
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Physics Update
Phase 3
Resistance 2
Start Deferred Jobs
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Physics Update
Phase 3
Resistance 2
Update Deferred Jobs
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Physics Update
Phase 3
Resistance 2
Sync Deferred Jobs
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Physics Update
Phase 3
Resistance 2
Call Events [deferred]
P
P
U
PPU PPU PPU PPU PPU
P
P
U
Building Object Lists
• Object list building was taking up valuable
time
• Caching geometry was a blocked process
on the PPU
• Very expensive
• Now all object lists and geometry caches
are generated on the SPU
Building Object Lists
• Larger physics data types organized for
streaming
• Generating object lists requires allocation
of data structures from the PPU
• This includes allocating scratch space for
joint re-ordering and packed rigid body
data
Atomic Allocation
• Converted PPU fixed block allocations to
atomic allocations
• Physics scratch buffer allocation had to be
atomic as well
• Rather straight-forward but…
• Exposed a lot of pre-existing problems
with the way data was allocated on the
PPU
Broad Phase Collision Shaders
• Previously, was only possible to gather
game collision geometry on the PPU
• Insomniac Collision System ran on its own
SPU
• Now the functions are in a shader library
• We can build physics collision data on the
SPU through the use of the shader library
interface
• Saved valuable time!
Looking Forward
• Optimize DMAs
• Better data organization
• Convert more of the physics kernel into
Shaders
• Find more opportunities for interleaving
SPU update with PPU
Eric Christensen
Insomniac Games
Principal Engine Programmer
ec@insomniacgames.com

More Related Content

Similar to Insomniac Physics

Thesis F. Redaelli UIC Slides EN
Thesis F. Redaelli UIC Slides ENThesis F. Redaelli UIC Slides EN
Thesis F. Redaelli UIC Slides EN
Marco Santambrogio
 
Worst-Case Scheduling of Software Tasks
Worst-Case Scheduling of Software TasksWorst-Case Scheduling of Software Tasks
Worst-Case Scheduling of Software Tasks
Lionel Briand
 
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDBMongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
Boxed Ice
 

Similar to Insomniac Physics (20)

SPU Physics
SPU PhysicsSPU Physics
SPU Physics
 
Python Async IO Horizon
Python Async IO HorizonPython Async IO Horizon
Python Async IO Horizon
 
Thesis F. Redaelli UIC Slides EN
Thesis F. Redaelli UIC Slides ENThesis F. Redaelli UIC Slides EN
Thesis F. Redaelli UIC Slides EN
 
Physics Solutions for Innovative Game Design
Physics Solutions for Innovative Game DesignPhysics Solutions for Innovative Game Design
Physics Solutions for Innovative Game Design
 
An Energy Efficient Demand- Response Model for High performance Computing System
An Energy Efficient Demand- Response Model for High performance Computing SystemAn Energy Efficient Demand- Response Model for High performance Computing System
An Energy Efficient Demand- Response Model for High performance Computing System
 
Finalizers –The not so Good, the Bad and the Ugly
Finalizers –The not so Good, the Bad and the UglyFinalizers –The not so Good, the Bad and the Ugly
Finalizers –The not so Good, the Bad and the Ugly
 
Unite Boston 2015 Physics
Unite Boston 2015 PhysicsUnite Boston 2015 Physics
Unite Boston 2015 Physics
 
Worst-Case Scheduling of Software Tasks
Worst-Case Scheduling of Software TasksWorst-Case Scheduling of Software Tasks
Worst-Case Scheduling of Software Tasks
 
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDBMongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
 
Calibration Issues in FRC: Camera, Projector, Kinematics based Hybrid Approac...
Calibration Issues in FRC: Camera, Projector, Kinematics based Hybrid Approac...Calibration Issues in FRC: Camera, Projector, Kinematics based Hybrid Approac...
Calibration Issues in FRC: Camera, Projector, Kinematics based Hybrid Approac...
 
Intelligent Control Systems for Humanoid Robot: Master Thesis_Owen Chih-Hsuan...
Intelligent Control Systems for Humanoid Robot: Master Thesis_Owen Chih-Hsuan...Intelligent Control Systems for Humanoid Robot: Master Thesis_Owen Chih-Hsuan...
Intelligent Control Systems for Humanoid Robot: Master Thesis_Owen Chih-Hsuan...
 
Start MPC
Start MPC Start MPC
Start MPC
 
A Physical Units Library for the Next C++
A Physical Units Library for the Next C++A Physical Units Library for the Next C++
A Physical Units Library for the Next C++
 
UIImageView vs Metal #tryswiftconf
UIImageView vs Metal #tryswiftconfUIImageView vs Metal #tryswiftconf
UIImageView vs Metal #tryswiftconf
 
Modern_Java_Workshop manjunath np hj slave
Modern_Java_Workshop manjunath np hj slaveModern_Java_Workshop manjunath np hj slave
Modern_Java_Workshop manjunath np hj slave
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Don’t be Homer Simpson 
with your Reactor!
 Don’t be Homer Simpson 
with your Reactor! Don’t be Homer Simpson 
with your Reactor!
Don’t be Homer Simpson 
with your Reactor!
 
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal FilteringStable SSAO in Battlefield 3 with Selective Temporal Filtering
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Paris Master Class 2011 - 06 Gpu Particle System
Paris Master Class 2011 - 06 Gpu Particle SystemParis Master Class 2011 - 06 Gpu Particle System
Paris Master Class 2011 - 06 Gpu Particle System
 

More from Slide_N

Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
Slide_N
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
Slide_N
 

More from Slide_N (20)

SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
SpursEngine A High-performance Stream Processor Derived from Cell/B.E. for Me...
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
 
Experiences with PlayStation VR - Sony Interactive Entertainment
Experiences with PlayStation VR  - Sony Interactive EntertainmentExperiences with PlayStation VR  - Sony Interactive Entertainment
Experiences with PlayStation VR - Sony Interactive Entertainment
 
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3SPU-based Deferred Shading for Battlefield 3 on Playstation 3
SPU-based Deferred Shading for Battlefield 3 on Playstation 3
 
Filtering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-AliasingFiltering Approaches for Real-Time Anti-Aliasing
Filtering Approaches for Real-Time Anti-Aliasing
 
Chip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdfChip Multiprocessing and the Cell Broadband Engine.pdf
Chip Multiprocessing and the Cell Broadband Engine.pdf
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
 
New Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - KutaragiNew Millennium for Computer Entertainment - Kutaragi
New Millennium for Computer Entertainment - Kutaragi
 
Sony Transformation 60 - Kutaragi
Sony Transformation 60 - KutaragiSony Transformation 60 - Kutaragi
Sony Transformation 60 - Kutaragi
 
Sony Transformation 60
Sony Transformation 60 Sony Transformation 60
Sony Transformation 60
 
Moving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living RoomMoving Innovative Game Technology from the Lab to the Living Room
Moving Innovative Game Technology from the Lab to the Living Room
 
The Technology behind PlayStation 2
The Technology behind PlayStation 2The Technology behind PlayStation 2
The Technology behind PlayStation 2
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
 
Industry Trends in Microprocessor Design
Industry Trends in Microprocessor DesignIndustry Trends in Microprocessor Design
Industry Trends in Microprocessor Design
 
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with OcelotTranslating GPU Binaries to Tiered SIMD Architectures with Ocelot
Translating GPU Binaries to Tiered SIMD Architectures with Ocelot
 
Cellular Neural Networks: Theory
Cellular Neural Networks: TheoryCellular Neural Networks: Theory
Cellular Neural Networks: Theory
 
Network Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTMNetwork Processing on an SPE Core in Cell Broadband EngineTM
Network Processing on an SPE Core in Cell Broadband EngineTM
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
 
Developing Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of DestructionDeveloping Technology for Ratchet and Clank Future: Tools of Destruction
Developing Technology for Ratchet and Clank Future: Tools of Destruction
 
NVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM PowerNVIDIA Tesla Accelerated Computing Platform for IBM Power
NVIDIA Tesla Accelerated Computing Platform for IBM Power
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Insomniac Physics

  • 2. Overview • Go over the evolution of IG physics system • Shaders • Library Shaders • Custom event shaders
  • 3. Original Design Resistance: Fall of Man • Ported From PC to PS3 • PPU Heavy • SPU Processes Blocked • Two Jobs (Collision, Simulation) • Simulation Jobs too memory heavy dispatched to PPU version. • Expensive
  • 4. Original Design Resistance: Fall of Man Physics Update Gather Potentially Colliding Objects
  • 5. Original Design Resistance: Fall of Man Physics Update Cache Collision Geometry
  • 6. Original Design Resistance: Fall of Man Physics Update Run SPU Collision Jobs
  • 7. Original Design Resistance: Fall of Man Physics Update Sync
  • 8. Original Design Resistance: Fall of Man Physics Update Process Contact Constraints
  • 9. Original Design Resistance: Fall of Man Physics Update Create Simulation Pools
  • 10. Original Design Resistance: Fall of Man Physics Update Run SPU Simulation Jobs
  • 11. Original Design Resistance: Fall of Man Physics Update Run Simulations Too Big For SPU!
  • 12. Original Design Resistance: Fall of Man Physics Update Sync
  • 13. Original Design Resistance: Fall of Man Physics Update Process Results
  • 14. Original Design Resistance: Fall of Man Physics Update Call Events
  • 15. Original Design Resistance: Fall of Man Physics Update Update Joints
  • 16. Original Design Resistance: Fall of Man • Simulation Jobs Ran as Pools were generated. • PPU Simulation Jobs ran concurrently with the SPU Simulation Jobs • This was the ONLY asynchronous benefit! • Not much!
  • 17. Original Design Resistance: Fall of Man • Physics had the largest impact on frame rate • Pipeline design made it difficult to reliably optimize • There was A LOT to learn
  • 18. Phase 2 Ratchet & Clank Future • Collision and Simulation run in a single SPU Job • Single sync-point • Large PPU window from start of Job to End of Job • Use of Physics Shaders
  • 19. Physics Update PPU Work Phase 2 Ratchet & Clank Future Gather Potentially Colliding Objects
  • 20. Physics Update PPU Work Phase 2 Ratchet & Clank Future Cache Collision Geometry
  • 21. Physics Update PPU Work Phase 2 Ratchet & Clank Future Start Physics SPU Jobs
  • 22. Physics Update PPU Work Phase 2 Ratchet & Clank Future Do Collision
  • 23. Physics Update PPU Work Phase 2 Ratchet & Clank Future Generate Simulation Pools
  • 24. Physics Update PPU Work Phase 2 Ratchet & Clank Future Simulate
  • 25. Physics Update PPU Work Phase 2 Ratchet & Clank Future Update Joints
  • 26. Physics Update PPU Work Phase 2 Ratchet & Clank Future DMA Results
  • 27. Physics Update PPU Work Phase 2 Ratchet & Clank Future Sync
  • 28. Physics Update PPU Work Phase 2 Ratchet & Clank Future Update Events
  • 29. Phase 2 Ratchet & Clank Future • Shaders helped free up local store • Each big component had it’s own set of shaders • Constraints • Solvers • User customized data transformation
  • 30. Physics Intersection Shaders Example Function Prototype unsigned int SphereOBB(const CollPrim &a, const CollPrim &b, CollResult *results) • Shaders are loaded into local store during the collision process and called via a function table using a mask created by geometry ID • Rollback local store when done • Savings of up to 70k of local store usage
  • 31. Physics Jacobian Shaders Example Function Prototype unsigned int BuildJDBall(Constraint *c, Manifold *m, RigidBody *rblist, float fps, float error, float dscale, Jacobian *jlist, CommonTrig *trig_funcs, CommonFunc *common_funcs, ConstraintFunc *constraint_util); • An example of a shader being called from another shader • Constraints are sorted by type, then the corresponding shader is loaded to process a group of like constraints • Saves us roughly 100k! • We can add more constraint types without worrying about impact on kernel size
  • 32. Physics Jacobian Shaders Example Function Prototype unsigned int BuildJDBall(Constraint *c, Manifold *m, RigidBody *rblist, float fps, float error, float dscale, Jacobian *jlist, CommonTrig *trig_funcs, CommonFunc *common_funcs, ConstraintFunc *constraint_util); • CommonTrig contains pointers to trigonometry functions that live in the main physics kernel • Sin, Cos, ACos, Atan, etc… • Any optimizations will benefit the shaders without having to re-build them
  • 33. Physics Jacobian Shaders Example Function Prototype unsigned int BuildJDBall(Constraint *c, Manifold *m, RigidBody *rblist, float fps, float error, float dscale, Jacobian *jlist, CommonTrig *trig_funcs, CommonFunc *common_funcs, ConstraintFunc *constraint_util); • CommonFunc contains pointers to standard functions stored in the physics kernel • Printf, Dma(get,put), etc…
  • 34. Physics Jacobian Shaders Example Function Prototype unsigned int BuildJDBall(Constraint *c, Manifold *m, RigidBody *rblist, float fps, float error, float dscale, Jacobian *jlist, CommonTrig *trig_funcs, CommonFunc *common_funcs, ConstraintFunc *constraint_util); • ConstraintFunc contains pointers to constraint utility functions that live in the physics kernel • Generating test vectors for limits • Constraint smoothing • Shared between all constraints that have limits so optimization is a great benefit
  • 35. Physics Solver Shaders Example Function Prototype void SolverSim(SimPool *sim_pool, Manifold *m, char *dimensions, int *jd_build_ea, int *jd_build_size, ManagedLS *allocator, CommonFunc *common_funcs, CommonTrig *trig_funcs, ConstraintFunc *constraint_util); • One of many solver shaders that get loaded by the main physics kernel • Full Simulation, IK, or “cheap” objects • jd_build_ea/size tells us about our Jacobian functions (where they live / size) • Local store allocator provided for scratch
  • 36. Custom Event Shaders • Anyone can author their own custom event shader for physics • Currently we have two custom event shaders. • The physics kernel passes common functions and a list of DMA tags
  • 37. Custom Event Shaders • Work memory is passed from to kernel to accommodate any temporary data. Currently this is 2k • Shader author can DMA new data to a PPU buffer of choice
  • 38. Phase 3 Resistance 2 • Immediate and Deferred Modes • Constraint Data Streaming • Using library shaders for collision
  • 39. Physics Update PPU Work PPU Work Phase 3 Resistance 2 PPU Work Create Entity (moby) Lists Cache Collision Geometry [Immediate Jobs]
  • 40. Physics Update PPU Work PPU Work Phase 3 Resistance 2 PPU Work Start Immediate Jobs
  • 41. Physics Update PPU Work PPU Work Phase 3 Resistance 2 PPU Work Update Immediate Physics Jobs
  • 42. Physics Update PPU Work PPU Work Phase 3 Resistance 2 PPU Work Sync Immediate Physics Jobs
  • 43. Physics Update PPU Work PPU Work Phase 3 Resistance 2 PPU Work Call Events [immediate]
  • 44. Physics Update PPU Work PPU Work Phase 3 Resistance 2 PPU Work Create Entity (moby) Lists Cache Collision Geometry [Deferred Jobs]
  • 45. Physics Update PPU Work PPU Work Phase 3 Resistance 2 PPU Work Start Deferred Physics Jobs
  • 46. Physics Update PPU Work PPU Work Phase 3 Resistance 2 PPU Work Update Deferred Jobs
  • 47. Physics Update PPU Work PPU Work Phase 3 Resistance 2 PPU Work Sync Deferred Physics Jobs
  • 48. Physics Update PPU Work PPU Work Phase 3 Resistance 2 PPU Work Call Events [deferred]
  • 49. Immediate and Deferred Modes • Physics objects that had no other gameplay or animation based dependencies didn’t need to finish in one frame • Ragdolls had a one frame immediate update and then defaulted to deferred so they could reflect one frame of simulation without “popping”
  • 50. Immediate and Deferred Modes • IK is run in immediate mode because it is being constantly being tweaked by gameplay. Lag is not an option • Having a deferred process improved our frame rate immensely since the majority of the high volume environments had “fire- and-forget” physics objects
  • 51. Constraint Data Streaming • Even with shaders, solver could run out of local store • Changed the solver update so that only 8 chunks of constraint data were allocated • Solver chews on data while DMAing next list of constraints
  • 52. Collision Shader Library • Having multiple versions of the same type of thing adds more work and you have to optimize more than once. • Not practical • Physics native collision routines made available to all • Great re-use and optimization benefit • Resistance 2 successfully shipped with this model in place
  • 53. Current Phase • Building of physics object lists as an SPU job • Atomic allocation of PPU memory for heavily used data types as well as physics scratch memory • Use of library shaders for broad phase collision caching
  • 54. Physics Update P P U PPU PPU Phase 3 Resistance 2 PPU PPU PPU P P U Start Entity Gathering & Collision Caching SPU Job [for immediate jobs]
  • 55. Physics Update Phase 3 Resistance 2 Gather Entities Cache Collision, Pre-culling P P U PPU PPU PPU PPU PPU P P U
  • 56. Physics Update Phase 3 Resistance 2 Sync Gathering Jobs [for immediate] P P U PPU PPU PPU PPU PPU P P U
  • 57. Physics Update Phase 3 Resistance 2 Start Immediate Physics Jobs P P U PPU PPU PPU PPU PPU P P U
  • 58. Physics Update Phase 3 Resistance 2 Update Immediate Physics Jobs P P U PPU PPU PPU PPU PPU P P U
  • 59. Physics Update Phase 3 Resistance 2 Sync Immediate Jobs P P U PPU PPU PPU PPU PPU P P U
  • 60. Physics Update Phase 3 Resistance 2 Call Events [immediate] P P U PPU PPU PPU PPU PPU P P U
  • 61. Physics Update Phase 3 Resistance 2 Start Entity Gathering & Collision Caching SPU Job [for deferred jobs] P P U PPU PPU PPU PPU PPU P P U
  • 62. Physics Update Phase 3 Resistance 2 Gather Entities Cache Collision, Pre-culling P P U PPU PPU PPU PPU PPU P P U
  • 63. Physics Update Phase 3 Resistance 2 Sync Gathering Jobs [for deferred] P P U PPU PPU PPU PPU PPU P P U
  • 64. Physics Update Phase 3 Resistance 2 Start Deferred Jobs P P U PPU PPU PPU PPU PPU P P U
  • 65. Physics Update Phase 3 Resistance 2 Update Deferred Jobs P P U PPU PPU PPU PPU PPU P P U
  • 66. Physics Update Phase 3 Resistance 2 Sync Deferred Jobs P P U PPU PPU PPU PPU PPU P P U
  • 67. Physics Update Phase 3 Resistance 2 Call Events [deferred] P P U PPU PPU PPU PPU PPU P P U
  • 68. Building Object Lists • Object list building was taking up valuable time • Caching geometry was a blocked process on the PPU • Very expensive • Now all object lists and geometry caches are generated on the SPU
  • 69. Building Object Lists • Larger physics data types organized for streaming • Generating object lists requires allocation of data structures from the PPU • This includes allocating scratch space for joint re-ordering and packed rigid body data
  • 70. Atomic Allocation • Converted PPU fixed block allocations to atomic allocations • Physics scratch buffer allocation had to be atomic as well • Rather straight-forward but… • Exposed a lot of pre-existing problems with the way data was allocated on the PPU
  • 71. Broad Phase Collision Shaders • Previously, was only possible to gather game collision geometry on the PPU • Insomniac Collision System ran on its own SPU • Now the functions are in a shader library • We can build physics collision data on the SPU through the use of the shader library interface • Saved valuable time!
  • 72. Looking Forward • Optimize DMAs • Better data organization • Convert more of the physics kernel into Shaders • Find more opportunities for interleaving SPU update with PPU
  • 73. Eric Christensen Insomniac Games Principal Engine Programmer ec@insomniacgames.com