SlideShare a Scribd company logo
1 of 32
Emergent Game Technologies Gamebryo Element Engine Thread for Performance
Goals for Cross-Platform Threading ,[object Object],[object Object],[object Object]
Write Once, Use Everywhere ,[object Object],[object Object],[object Object],[object Object],[object Object]
Emergent's Gamebryo Element ,[object Object],[object Object],[object Object],[object Object],[object Object]
Cross-Platform Threading Requires Common Primitives  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Choosing a Processing Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Stream Processing (Formal) ‏ Wikipedia:  Given a set of input and output data (streams), the principle essentially defines a series of computer-intensive operations (kernel functions) to be applied for each element in the stream. Input 1 Kernel  1 Input 2 Kernel 2 Output
Generalized Stream Processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Morphing+Skinning Example Morph Target 1 Vertices Morph Weights Morph Kernel (MK) ‏ Skin Vertices Bone Matrices Blend Weights Skinning Kernel (SK) ‏ Vertex Locations Morph Target 2 Vertices
Morphing+Skinning Example MW Fixed MK Instance 1 Matrices Fixed Weights Fixed Verts Part 1 MT 1 V Part 1 MT 1 V Part 2 MT 2 V Part 1 MT 2 V Part 2 MK Instance 2 Skin V Part 1 Skin V Part 2 SK Instance 1 SK Instance 2 Verts Part 2
Floodgate ,[object Object],[object Object],[object Object],[object Object],[object Object]
Floodgate Basics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Kernel Example: Times2 // Include Kernel Definition macros #include  <NiSPKernelMacros.h> // Declare the Timer2Kernel NiSPDeclareKernel(Times2Kernel) ‏
Kernel Example: Times2 #include  &quot;Times2Kernel.h&quot; NiSPBeginKernelImpl(Times2Kernel) ‏ { // Get the input stream float * pInput = kWorkload.GetInput< float > (0); // Get the output stream float * pOutput = kWorkload.GetOutput< float > (0); // Process data NiUInt32 uiBlockCount = kWorkload.GetBlockCount(); for  (NiUInt32 ui = 0; ui < uiBlockCount; ui++) ‏ { pOutput[ui] = pInput[ui] * 2; } } NiSPEndKernelImpl(Times2Kernel) ‏
Life of a Workflow ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example Workflow // Setup input and output streams from existing buffers NiTSPStream< float>  inputStream(SomeInputBuffer, MAX_BLOCKS); NiTSPStream< float>  outputStream(SomeOutputBuffer, MAX_BLOCKS); // Get a Workflow and setup a new task for it NiSPWorkflow* pWorkflow = NiStreamProcessor::Get()->GetFreeWorkflow(); NiSPTask* pTask = pWorkflow->AddNewTask(); // Set the kernel and streams pTask->SetKernel(&Times2Kernel); pTask->AddInput(&inputStream); pTask->AddOutput(&outputStream); // Submit workflow for execution NiStreamProcessor::Get()->Submit(pWorkflow); // Do other operations... // Wait for workflow to complete NiStreamProcessor::Get()->Wait(pWorkflow);
Floodgate Internals ,[object Object],[object Object],[object Object],[object Object],[object Object]
Overview of Workflow Analysis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Analysis: Workflow with many Tasks Task 1 Stream A Stream B Task 2 Stream C Stream D Task 3 Stream E Stream F Task 4 Stream B Stream D Stream G Task 6 Stream G Stream F Stream I Task 7 Sync Task 5 Stream G Stream H
Analysis: Dependency Graph Stage 0 Stage 1 Stage 2 Stage 3 Task 1 Stream A Task 4 Stream B Task 2 Stream C Task 3 Stream E Stream D Task 5 Stream G Task 6 Stream F Sync Task Stream H Stream I Sync Stream G
Performance Notes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Usability Notes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Exploiting Floodgate in the Engine ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Same applications, new performance ... ,[object Object],[object Object],[object Object],Skinning Objects Morphing Objects 42fps 12fps 62fps 38fps Before After
Example CPU Utilization, Morphing Before After
Thread profiling, Morphing Before ,[object Object],[object Object]
Thread profiling, Morphing After ,[object Object],[object Object],[object Object]
New Issues ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Ongoing Improvements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Using Floodgate in a game ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Future proofed? ,[object Object],[object Object],[object Object]
Questions? ,[object Object],[object Object],[object Object]

More Related Content

What's hot

Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
Brendan Gregg
 

What's hot (20)

PROSE
PROSEPROSE
PROSE
 
LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating Systems
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizations
 
Process scheduling
Process schedulingProcess scheduling
Process scheduling
 
FIR filter on GPU
FIR filter on GPUFIR filter on GPU
FIR filter on GPU
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
Embedded system -Introduction to hardware designing
Embedded system  -Introduction to hardware designingEmbedded system  -Introduction to hardware designing
Embedded system -Introduction to hardware designing
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdac
 
Debugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBDebugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDB
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java Programmers
 
Evolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO VisorEvolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO Visor
 
JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020JCConf 2020 - New Java Features Released in 2020
JCConf 2020 - New Java Features Released in 2020
 
customization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAcustomization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLA
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
 
Continuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnitContinuous Performance Regression Testing with JfrUnit
Continuous Performance Regression Testing with JfrUnit
 
2020 icldla-updated
2020 icldla-updated2020 icldla-updated
2020 icldla-updated
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 

Similar to Threading Successes 03 Gamebryo

Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...
IndicThreads
 
A Practical Event Driven Model
A Practical Event Driven ModelA Practical Event Driven Model
A Practical Event Driven Model
Xi Wu
 

Similar to Threading Successes 03 Gamebryo (20)

Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
Xilinx track g
Xilinx   track gXilinx   track g
Xilinx track g
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Copper: A high performance workflow engine
Copper: A high performance workflow engineCopper: A high performance workflow engine
Copper: A high performance workflow engine
 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
A Practical Event Driven Model
A Practical Event Driven ModelA Practical Event Driven Model
A Practical Event Driven Model
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Chapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technologyChapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technology
 
GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Dynamic Hadoop Clusters
Dynamic Hadoop ClustersDynamic Hadoop Clusters
Dynamic Hadoop Clusters
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Introduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSPIntroduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSP
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Threading Successes 03 Gamebryo

  • 1. Emergent Game Technologies Gamebryo Element Engine Thread for Performance
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7. Stream Processing (Formal) ‏ Wikipedia: Given a set of input and output data (streams), the principle essentially defines a series of computer-intensive operations (kernel functions) to be applied for each element in the stream. Input 1 Kernel 1 Input 2 Kernel 2 Output
  • 8.
  • 9. Morphing+Skinning Example Morph Target 1 Vertices Morph Weights Morph Kernel (MK) ‏ Skin Vertices Bone Matrices Blend Weights Skinning Kernel (SK) ‏ Vertex Locations Morph Target 2 Vertices
  • 10. Morphing+Skinning Example MW Fixed MK Instance 1 Matrices Fixed Weights Fixed Verts Part 1 MT 1 V Part 1 MT 1 V Part 2 MT 2 V Part 1 MT 2 V Part 2 MK Instance 2 Skin V Part 1 Skin V Part 2 SK Instance 1 SK Instance 2 Verts Part 2
  • 11.
  • 12.
  • 13. Kernel Example: Times2 // Include Kernel Definition macros #include <NiSPKernelMacros.h> // Declare the Timer2Kernel NiSPDeclareKernel(Times2Kernel) ‏
  • 14. Kernel Example: Times2 #include &quot;Times2Kernel.h&quot; NiSPBeginKernelImpl(Times2Kernel) ‏ { // Get the input stream float * pInput = kWorkload.GetInput< float > (0); // Get the output stream float * pOutput = kWorkload.GetOutput< float > (0); // Process data NiUInt32 uiBlockCount = kWorkload.GetBlockCount(); for (NiUInt32 ui = 0; ui < uiBlockCount; ui++) ‏ { pOutput[ui] = pInput[ui] * 2; } } NiSPEndKernelImpl(Times2Kernel) ‏
  • 15.
  • 16. Example Workflow // Setup input and output streams from existing buffers NiTSPStream< float> inputStream(SomeInputBuffer, MAX_BLOCKS); NiTSPStream< float> outputStream(SomeOutputBuffer, MAX_BLOCKS); // Get a Workflow and setup a new task for it NiSPWorkflow* pWorkflow = NiStreamProcessor::Get()->GetFreeWorkflow(); NiSPTask* pTask = pWorkflow->AddNewTask(); // Set the kernel and streams pTask->SetKernel(&Times2Kernel); pTask->AddInput(&inputStream); pTask->AddOutput(&outputStream); // Submit workflow for execution NiStreamProcessor::Get()->Submit(pWorkflow); // Do other operations... // Wait for workflow to complete NiStreamProcessor::Get()->Wait(pWorkflow);
  • 17.
  • 18.
  • 19. Analysis: Workflow with many Tasks Task 1 Stream A Stream B Task 2 Stream C Stream D Task 3 Stream E Stream F Task 4 Stream B Stream D Stream G Task 6 Stream G Stream F Stream I Task 7 Sync Task 5 Stream G Stream H
  • 20. Analysis: Dependency Graph Stage 0 Stage 1 Stage 2 Stage 3 Task 1 Stream A Task 4 Stream B Task 2 Stream C Task 3 Stream E Stream D Task 5 Stream G Task 6 Stream F Sync Task Stream H Stream I Sync Stream G
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. Example CPU Utilization, Morphing Before After
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.

Editor's Notes

  1. Floodgate is a cross platform stream processing engine that enables developers to exploit the data-processing power of multi-processor platforms.