SlideShare a Scribd company logo
Coding for Multiple Cores 
Bruce Dawson & Chuck Walbourn 
Programmers 
Game Technology Group
Why multi-threading/multi-core? 
Clock rates are stagnant 
Future CPUs will be predominantly multi-thread/ 
multi-core 
Xbox 360 has 3 cores 
PS3 will be multi-core 
>70% of PC sales will be multi-core by end of 2006 
Most Windows Vista systems will be multi-core 
Two performance possibilities: 
Single-threaded? Minimal performance growth 
Multi-threaded? Exponential performance growth
Design for Multithreading 
Good design is critical 
Bad multithreading can be worse than no 
multithreading 
Deadlocks, synchronization bugs, poor 
performance, etc.
Bad Multithreading 
Thread 1 
Thread 2 
Thread 3 
Thread 4 
Thread 5
Good Multithreading 
Game Thread 
Main Thread 
RRReeennndddeeerrriiinnnggg TTThhhrrreeeaaaddd 
Physics 
Rendering Thread 
Animation/ 
Skinning 
Particle Systems 
Networking 
File I/O
Another Paradigm: Cascades 
Thread Input 
1 
Thread Physics 
2 
Thread AI 
3 
Rendering 
Thread 4 
Thread Present 
5 
FFFFrrrraaaammmmeeee 1243 
Advantages: 
Synchronization points are few and well-defined 
Disadvantages: 
Increases latency (for constant frame rate) 
Needs simple (one-way) data flow
Typical Threaded Tasks 
File Decompression 
Rendering 
Graphics Fluff 
Physics
File Decompression 
Most common CPU heavy thread on the 
Xbox 360 
Easy to multithread 
Allows use of aggressive compression to 
improve load times 
Don’t throw a thread at a problem better 
solved by offline processing 
Texture compression, file packing, etc.
Rendering 
Separate update and render threads 
Rendering on multiple threads 
(D3DCREATE_MULTITHREADED) works poorly 
Exception: Xbox 360 command buffers 
Special case of cascades paradigm 
Pass render state from update to render 
With constant workload gives same latency, 
better frame rate 
With increased workload gives same frame rate, 
worse latency
Graphics Fluff 
Extra graphics that doesn't affect play 
Procedurally generated animating cloud textures 
Cloth simulations 
Dynamic ambient occlusion 
Procedurally generated vegetation, etc. 
Extra particles, better particle physics, etc. 
Easy to synchronize 
Potentially expensive, but if the core is 
otherwise idle...?
Physics? 
Could cascade from update to physics to 
rendering 
Makes use of three threads 
May be too much latency 
Could run physics on many threads 
Uses many threads while doing physics 
May leave threads mostly idle elsewhere
Overcommitted Multithreading? 
RReennddeerriinngg TThhrreeaadd 
Physics 
Rendering Thread 
Animation/ 
Skinning 
Particle Systems 
Game Thread
How Many Threads? 
No more than one CPU intensive software 
thread per core 
3-6 on Xbox 360 
1-? on PC (1-4 for now, need to query) 
Too many busy threads adds complexity, 
and lowers performance 
Context switches are not free 
Can have many non-CPU intensive 
threads 
I/O threads that block, or intermittent tasks
Simultaneous Multi-Threading 
Be careful with Simultaneous Multi- 
Threading (SMT) threads 
Not the same as double the number of cores 
Can give a small perf boost 
Can cause a perf drop 
Can avoid scheduler latency 
Ideally one heavy thread per core plus 
some additional intermittent threads
Case Study: Kameo (Xbox 360) 
Started single threaded 
Rendering was taking half of time—put on 
separate thread 
Two render-description buffers created to 
communicate from update to render 
Linear read/write access for best cache usage 
Doesn't copy const data 
File I/O and decompress on other threads
Separate Rendering Thread 
Update Thread 
Buffer 0 
Buffer 1 
Render Thread
Case Study: Kameo (Xbox 360) 
Core Thread Software threads 
0 
0 Game update 
1 File I/O 
1 
0 Rendering 
1 
2 
0 XAudio 
1 File decompression 
Total usage was ~2.2-2.5 cores
Case Study: Project Gotham Racing 
Core Thread Software threads 
0 
0 Update, physics, rendering, UI 
1 Audio update, networking 
1 
0 Crowd update, texture decompression 
1 Texture decompression 
2 
0 XAudio 
1 
Total usage was ~2.0-3.0 cores
Managing Your Threads 
Creating threads 
Synchronizing 
Terminating 
Don't use TerminateThread() 
Bad idea on Windows: leaves the process in an 
indeterminate state, doesn't allow clean-up, etc. 
Unavailable on Xbox 360 
Instead return from your thread function, or call 
ExitThread
Stack size of zero 
means inherit parent's 
Don't forget to close 
this when done with it 
Creating Threads Poorly 
stack size 
const int stackSize = 0; 
HANDLE hThread = CreateThread(0, stackSize, 
ThreadFunctionBad, 0, 0, 0); 
// Do work on main thread here. 
for (;;) { // Wait for child thread to complete 
DWORD exitCode; 
GetExitCodeThread(hThread, &exitCode); 
if (exitCode != STILL_ACTIVE) 
break; 
} 
... 
Be careful with thread 
affinities on Windows 
DWORD __stdcall ThreadFunctionBad(void* data) 
{ 
#ifdef WIN32 
SetThreadAffinityMask(GetCurrentThread(), 8); 
#endif 
// Do child thread work here. 
return 0; 
} 
CreateThread doesn't 
initialize C runtime 
Busy waiting is bad!
Specify stack size on 
Don't forget to close 
this when done with it 
Creating Threads Well 
const int stackSize = 65536; 
HANDLE hThread = (HANDLE)_beginthreadex(0, stackSize, 
ThreadFunction, 0, 0, 0); 
Xbox 360 
// Do work on main thread here. 
// Wait for child thread to complete 
WaitForSingleObject(hThread, INFINITE); 
CloseHandle(hThread); 
... 
Thread affinities must 
be specified on Xbox 
unsigned __stdcall ThreadFunction(void* data) 
{ 
#ifdef XBOX 
// On Xbox 360 you must explicitly assign 
// software threads to hardware threads. 
XSetThreadProcessor(GetCurrentThread(), 2); 
#endif 
// Do child thread work here. 
return 0; 
} 
_beginthreadex 
initializes CRT 
The correct way to wait 
for a thread to exit 
360
Alternative: OpenMP 
Available in VC++ 2005 
Simple way to parallelize loops and some 
other constructs 
Works best on long symmetric tasks— 
particles? 
Game tasks are short—16.6 ms 
Many game tasks are not symmetric 
OpenMP is nice, but not ideal
Available Synchronization Objects 
Events 
Semaphores 
Mutexes 
Critical Sections 
Don't use SuspendThread() 
Some title have used this for synchronization 
Can easily lead to deadlocks 
Interacts badly with Visual Studio debugger
Exclusive Access: Mutex 
// Initialize 
HANDLE mutex = 
CreateMutex(0, FALSE, 0); 
// Use 
void ManipulateSharedData() { 
WaitForSingleObject(mutex, INFINITE); 
// Manipulate stuff... 
ReleaseMutex(mutex); 
} 
// Destroy 
CloseHandle(mutex);
Exclusive Access: 
/C/R IInTitIiCalAizLe_SECTION 
CRITICAL_SECTION cs; 
InitializeCriticalSection(&cs); 
// Use 
void ManipulateSharedData() { 
EnterCriticalSection(&cs); 
// Manipulate stuff... 
LeaveCriticalSection(&cs); 
} 
// Destroy 
DeleteCriticalSection(&cs);
Lockless programming 
Trendy technique to use clever programming to 
share resources without locking 
Includes InterlockedXXX(), lockless 
message passing, Double Checked Locking, etc. 
Very hard to get right: 
Compiler can reorder instructions 
CPU can reorder instructions 
CPU can reorder reads and writes 
Not as fast as avoiding synchronization entirely
Lockless Messages: Buggy 
void SendMessage(void* input) { 
// Wait for the message to be 'empty'. 
while (g_msg.filled) 
; 
memcpy(g_msg.data, input, MESSAGESIZE); 
g_msg.filled = true; 
} 
void GetMessage() { 
// Wait for the message to be 'filled'. 
while (!g_msg.filled) 
; 
memcpy(localMsg.data, g_msg.data, MESSAGESIZE); 
g_msg.filled = false; 
}
Synchronization tips/costs: 
Synchronization is moderately expensive 
when there is no contention 
Hundreds to thousands of cycles 
Synchronization can be arbitrarily 
expensive when there is contention! 
Goals: 
Synchronize rarely 
Hold locks briefly 
Minimize shared data
Beware hidden synchronization: 
Allocations are (generally) a synch point 
Consider per-thread heaps with no locking 
HEAP_NO_SERIALIZE flag avoids lock on Win32 
heaps 
Consider custom single-purpose allocators 
Consider avoiding memory allocations! 
Avoid synch in in-house profilers 
D3DCREATE_MULTITHREADED causes 
synchronization on almost every Direct3D 
call
Threading File I/O & Decompression 
First: use large reads and asynchronous 
I/O 
Then: consider compression to accelerate 
loading 
Don't do format conversions etc. that are better 
done at build time! 
Have resource proxies to allow rendering 
to continue
File I/O Implementation Details 
vector<Resource*> g_resources; 
Worst design: decompressor locks g_resources while 
decompressing 
Better design: decompressor adds resources to vector 
after decompressing 
Still requires renderer to synch on every resource access 
Best design: two Resource* vectors 
Renderer has private vector, no locking required 
Decompressor use shared vector, syncs when adding new 
Resource* 
Renderer moves Resource* from shared to private vector once 
per frame
Profiling multi-threaded apps 
Need thread-aware profilers 
Profiling may hide many synchronization stalls 
Home-grown spin locks make profiling harder 
Consider instrumenting calls to synchronization 
functions 
Don't use locks in instrumentation—use TLS variables to 
store results 
Windows: Intel VTune, AMD CodeAnalyst, and 
the Visual Studio Team System Profiler 
Xbox 360: PIX, XbPerfView, etc.
PIX timing capture
Naming Threads 
typedef struct tagTHREADNAME_INFO { 
DWORD dwType; // must be 0x1000 
LPCSTR szName; // pointer to name (in user addr space) 
DWORD dwThreadID; // thread ID (-1=caller thread) 
DWORD dwFlags; // reserved for future use, must be zero 
} THREADNAME_INFO; 
void SetThreadName( DWORD dwThreadID, LPCSTR szThreadName) { 
THREADNAME_INFO info; 
info.dwType = 0x1000; 
info.szName = szThreadName; 
info.dwThreadID = dwThreadID; 
info.dwFlags = 0; 
__try { 
RaiseException( 0x406D1388, 0, sizeof(info)/sizeof(DWORD), 
(DWORD*)&info ); 
} 
__except(EXCEPTION_CONTINUE_EXECUTION) { 
} 
} 
SetThreadName(-1, "Main thread");
Other Ideas 
Debugging tips for MT 
Visual Studio does support multi-threaded debugging 
Use threads window 
Use @hwthread in watch window on Xbox 360 
KD and WinDBG support multi-threaded debugging 
Thread Local Storage (TLS) 
__declspec(thread) declares per-thread variables 
But doesn't work in dynamically loaded DLLs 
TLSAlloc is less efficient, less convenient, but works in 
dynamically loaded DLLs
Windows tips 
Avoid using D3DCREATE_MULTITHREADED 
It’s easy, it works, it’s really really slow 
Best to do all calls to Direct3D from a single 
thread 
Could pass off locked resource pointers to a 
queue for a loading threads to work with 
Test on multiple machines and 
configurations 
Single-core, SMT (i.e. Hyper-Threading), Dual-core, 
Intel and AMD chips, Multi-socket multicore 
(4+ cores)
Windows API features 
WaitForMultipleObject 
Obviously better than a series of 
WaitForSingleObject calls 
The OS is highly optimized around multithreading 
and event-based blocking 
I/O Completion Ports 
Very efficient way to have the OS assign a pool of 
worker threads to incoming I/O requests 
Useful construct for implementing a game server
SMT versus Multicore 
OS returns number of logical processors in 
GetSystemInfo(), so a 2 could mean a 
SMT machine with only 1 actual core –or- 
2 cores 
Detailed Win32 APIs exposing this 
distinction not available until Windows XP 
x64, Windows Server 2003 SP1, Windows 
Vista, etc. 
GetLogicalProcessorInformation() 
For now you have to use CPUID detailed 
by Intel and AMD to parse this out…
Timing with Multiple Cores 
RDTSC is not always synced between cores! 
As your thread moves from core to core, results of RDTSC 
counter deltas may be nonsense 
CPU frequency itself can change at run-time 
through speed step technologies 
See Power Management APIs for more information 
Best thing to do is use Win32 API 
QueryPerformanceCounter / 
QueryPerformanceFrequency 
See DirectX SDK article Game Timing and 
Multiple Cores
Thread Micromanagement 
Use SetThreadAffinityMask with 
caution! 
May be useful for assigning ‘heavy’ work threads 
This mask is technically a hint, not a commitment 
RDTSC-based instrumenting will require locking 
the game threads to a single core 
Otherwise let the Windows scheduler do the right 
thing 
CreateDevice/Reset might have a side-effect 
on the calling thread’s affinity with software vertex 
processing enabled
Thread Micromanagement (cont) 
Be careful about boosting thread priority 
If the priority is too high, you could cause the 
system to hang and become unresponsive 
If the priority is too low, the thread may starve
DLLs and Multithreading 
DllMain for every DLL is informed of 
thread creation/destruction 
For some DLLs this is required to initialize TLS 
For many this is a waste of time, so call 
DisableThreadLibraryCalls() from your 
DllMain during process creation 
(DLL_PROCESS_ATTACH) 
The OS serializes access to the entry point 
This means threads created during DllMain 
won’t start for a while, so don’t wait on them in the 
DLL startup
Resources 
Multithreading Applications in Win32, Jim Beveridge & 
Robert Weiner, Addison-Wesley, 1997 
Multiprocessor Considerations for Kernel-Mode Drivers 
http://download.microsoft.com/download/e/b/a/eba1050f-a31d- 
436b-9281-92cdfeae4b45/MP_issues.doc 
Determining Logical Processors per Physical Processor 
http://www.intel.com/cd/ids/developer/asmo-na/ 
eng/dc/threading/knowledgebase/43842.htm 
GetLogicalProcessorInformation 
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/ 
dllproc/base/getlogicalprocessorinformation.asp 
Double checked locking 
http://en.wikipedia.org/wiki/Double-checked_locking
Resources 
GDC 2006 Presentations 
http://msdn.com/directx/presentations 
DirectX Developer Center 
http://msdn.com/directx 
XNA Developer Center 
http://msdn.com/xna 
Xbox Developer Center (Registered Devs Only) 
https://xds.xbox.com 
XNA, DirectX, XACT Forums 
http://msdn.com/directx/forums 
Email addresses 
directx@microsoft.com (DirectX Feedback) 
xboxds@microsoft.com (Xbox Developers Only) 
xna@microsoft.com (XNA Feedback)
© 2006 Microsoft Corporation. All rights reserved. 
Microsoft, DirectX, Xbox 360, the Xbox logo, and XNA are either registered trademarks or trademarks of Microsoft Corporation in the United Sates and / or other countries. 
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

More Related Content

What's hot

java memory management & gc
java memory management & gcjava memory management & gc
java memory management & gc
exsuns
 
Once Upon a Process
Once Upon a ProcessOnce Upon a Process
Once Upon a Process
David Evans
 
Memory Corruption: from sandbox to SMM
Memory Corruption: from sandbox to SMMMemory Corruption: from sandbox to SMM
Memory Corruption: from sandbox to SMM
Positive Hack Days
 
Attack on the Core
Attack on the CoreAttack on the Core
Attack on the Core
Peter Hlavaty
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!
Peter Hlavaty
 
Tools and Techniques for Understanding Threading Behavior in Android
Tools and Techniques for Understanding Threading Behavior in AndroidTools and Techniques for Understanding Threading Behavior in Android
Tools and Techniques for Understanding Threading Behavior in Android
Intel® Software
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningCarol McDonald
 
Abusing Microsoft Kerberos - Sorry you guys don't get it
Abusing Microsoft Kerberos - Sorry you guys don't get itAbusing Microsoft Kerberos - Sorry you guys don't get it
Abusing Microsoft Kerberos - Sorry you guys don't get it
Benjamin Delpy
 
Memory model
Memory modelMemory model
Memory model
MingdongLiao
 
Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++Shuo Chen
 
Shytikov on NTLM Authentication
Shytikov on NTLM AuthenticationShytikov on NTLM Authentication
Shytikov on NTLM Authenticationshytikov
 
Concurrency Programming in Java - 07 - High-level Concurrency objects, Lock O...
Concurrency Programming in Java - 07 - High-level Concurrency objects, Lock O...Concurrency Programming in Java - 07 - High-level Concurrency objects, Lock O...
Concurrency Programming in Java - 07 - High-level Concurrency objects, Lock O...
Sachintha Gunasena
 
Mutual Exclusion
Mutual ExclusionMutual Exclusion
Mutual Exclusion
David Evans
 
Lowering STM Overhead with Static Analysis
Lowering STM Overhead with Static AnalysisLowering STM Overhead with Static Analysis
Lowering STM Overhead with Static Analysis
Guy Korland
 
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHackito Ergo Sum
 
Class 1: What is an Operating System?
Class 1: What is an Operating System?Class 1: What is an Operating System?
Class 1: What is an Operating System?
David Evans
 
Python multithreading session 9 - shanmugam
Python multithreading session 9 - shanmugamPython multithreading session 9 - shanmugam
Python multithreading session 9 - shanmugamNavaneethan Naveen
 
Introduction to TPL
Introduction to TPLIntroduction to TPL
Introduction to TPL
Gyuwon Yi
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
Łukasz Koniecki
 

What's hot (20)

java memory management & gc
java memory management & gcjava memory management & gc
java memory management & gc
 
Once Upon a Process
Once Upon a ProcessOnce Upon a Process
Once Upon a Process
 
Memory Corruption: from sandbox to SMM
Memory Corruption: from sandbox to SMMMemory Corruption: from sandbox to SMM
Memory Corruption: from sandbox to SMM
 
Attack on the Core
Attack on the CoreAttack on the Core
Attack on the Core
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!
 
Tools and Techniques for Understanding Threading Behavior in Android
Tools and Techniques for Understanding Threading Behavior in AndroidTools and Techniques for Understanding Threading Behavior in Android
Tools and Techniques for Understanding Threading Behavior in Android
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
 
Abusing Microsoft Kerberos - Sorry you guys don't get it
Abusing Microsoft Kerberos - Sorry you guys don't get itAbusing Microsoft Kerberos - Sorry you guys don't get it
Abusing Microsoft Kerberos - Sorry you guys don't get it
 
Memory model
Memory modelMemory model
Memory model
 
Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++
 
concurrency_c#_public
concurrency_c#_publicconcurrency_c#_public
concurrency_c#_public
 
Shytikov on NTLM Authentication
Shytikov on NTLM AuthenticationShytikov on NTLM Authentication
Shytikov on NTLM Authentication
 
Concurrency Programming in Java - 07 - High-level Concurrency objects, Lock O...
Concurrency Programming in Java - 07 - High-level Concurrency objects, Lock O...Concurrency Programming in Java - 07 - High-level Concurrency objects, Lock O...
Concurrency Programming in Java - 07 - High-level Concurrency objects, Lock O...
 
Mutual Exclusion
Mutual ExclusionMutual Exclusion
Mutual Exclusion
 
Lowering STM Overhead with Static Analysis
Lowering STM Overhead with Static AnalysisLowering STM Overhead with Static Analysis
Lowering STM Overhead with Static Analysis
 
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
 
Class 1: What is an Operating System?
Class 1: What is an Operating System?Class 1: What is an Operating System?
Class 1: What is an Operating System?
 
Python multithreading session 9 - shanmugam
Python multithreading session 9 - shanmugamPython multithreading session 9 - shanmugam
Python multithreading session 9 - shanmugam
 
Introduction to TPL
Introduction to TPLIntroduction to TPL
Introduction to TPL
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 

Similar to Coding for multiple cores

Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Johan Andersson
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
Ben Stopford
 
Improving app performance using .Net Core 3.0
Improving app performance using .Net Core 3.0Improving app performance using .Net Core 3.0
Improving app performance using .Net Core 3.0
Richard Banks
 
[Defcon] Hardware backdooring is practical
[Defcon] Hardware backdooring is practical[Defcon] Hardware backdooring is practical
[Defcon] Hardware backdooring is practical
Moabi.com
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Peter Hlavaty
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
Johan Andersson
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimizationguest3eed30
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory OptimizationWei Lin
 
Here comes the Loom - Ya!vaConf.pdf
Here comes the Loom - Ya!vaConf.pdfHere comes the Loom - Ya!vaConf.pdf
Here comes the Loom - Ya!vaConf.pdf
Krystian Zybała
 
Porting C++ apps to FLASCC
Porting C++ apps to FLASCCPorting C++ apps to FLASCC
Porting C++ apps to FLASCCPavel Nakaznenko
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
ugur candan
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
Martin Peniak
 
Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...
Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...
Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...
Expert Insight
 
Threading Successes 06 Allegorithmic
Threading Successes 06   AllegorithmicThreading Successes 06   Allegorithmic
Threading Successes 06 Allegorithmicguest40fc7cd
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
 
Introto netthreads-090906214344-phpapp01
Introto netthreads-090906214344-phpapp01Introto netthreads-090906214344-phpapp01
Introto netthreads-090906214344-phpapp01Aravindharamanan S
 
BitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven rendererBitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven renderertobias_persson
 

Similar to Coding for multiple cores (20)

Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Improving app performance using .Net Core 3.0
Improving app performance using .Net Core 3.0Improving app performance using .Net Core 3.0
Improving app performance using .Net Core 3.0
 
A Life of breakpoint
A Life of breakpointA Life of breakpoint
A Life of breakpoint
 
[Defcon] Hardware backdooring is practical
[Defcon] Hardware backdooring is practical[Defcon] Hardware backdooring is practical
[Defcon] Hardware backdooring is practical
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Here comes the Loom - Ya!vaConf.pdf
Here comes the Loom - Ya!vaConf.pdfHere comes the Loom - Ya!vaConf.pdf
Here comes the Loom - Ya!vaConf.pdf
 
Porting C++ apps to FLASCC
Porting C++ apps to FLASCCPorting C++ apps to FLASCC
Porting C++ apps to FLASCC
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
.ppt
.ppt.ppt
.ppt
 
Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...
Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...
Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...
 
Threading Successes 06 Allegorithmic
Threading Successes 06   AllegorithmicThreading Successes 06   Allegorithmic
Threading Successes 06 Allegorithmic
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
Introto netthreads-090906214344-phpapp01
Introto netthreads-090906214344-phpapp01Introto netthreads-090906214344-phpapp01
Introto netthreads-090906214344-phpapp01
 
Xen in Linux (aka PVOPS update)
Xen in Linux (aka PVOPS update)Xen in Linux (aka PVOPS update)
Xen in Linux (aka PVOPS update)
 
BitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven rendererBitSquid Tech: Benefits of a data-driven renderer
BitSquid Tech: Benefits of a data-driven renderer
 

More from Lee Hanxue

Github - Git Training Slides: Foundations
Github - Git Training Slides: FoundationsGithub - Git Training Slides: Foundations
Github - Git Training Slides: Foundations
Lee Hanxue
 
How To Create Metro-style Presentation
How To Create Metro-style PresentationHow To Create Metro-style Presentation
How To Create Metro-style Presentation
Lee Hanxue
 
Google Apps for Elegant Solutions
Google Apps for Elegant SolutionsGoogle Apps for Elegant Solutions
Google Apps for Elegant SolutionsLee Hanxue
 
Opus codec
Opus codecOpus codec
Opus codec
Lee Hanxue
 
S1 pali scriptures 5.1.4
S1 pali scriptures 5.1.4S1 pali scriptures 5.1.4
S1 pali scriptures 5.1.4Lee Hanxue
 
Satipatthana workshop July 26-29, 2012
Satipatthana workshop July 26-29, 2012Satipatthana workshop July 26-29, 2012
Satipatthana workshop July 26-29, 2012
Lee Hanxue
 
Satipatthana Sutta Workshop - S7.1 Summary & Conclusion Day 2
Satipatthana Sutta Workshop - S7.1 Summary & Conclusion Day 2Satipatthana Sutta Workshop - S7.1 Summary & Conclusion Day 2
Satipatthana Sutta Workshop - S7.1 Summary & Conclusion Day 2
Lee Hanxue
 
Satipatthana Sutta Workshop - S16 Summary & Conclusion Day 4
Satipatthana Sutta Workshop - S16 Summary & Conclusion Day 4Satipatthana Sutta Workshop - S16 Summary & Conclusion Day 4
Satipatthana Sutta Workshop - S16 Summary & Conclusion Day 4
Lee Hanxue
 
Satipatthana Sutta Workshop - S15.1 Summary & Conclusion Day 3
Satipatthana Sutta Workshop - S15.1 Summary & Conclusion Day 3Satipatthana Sutta Workshop - S15.1 Summary & Conclusion Day 3
Satipatthana Sutta Workshop - S15.1 Summary & Conclusion Day 3
Lee Hanxue
 
Satipatthana Sutta Workshop - S15 Comparison of Satipatthana Contents
Satipatthana Sutta Workshop - S15 Comparison of Satipatthana ContentsSatipatthana Sutta Workshop - S15 Comparison of Satipatthana Contents
Satipatthana Sutta Workshop - S15 Comparison of Satipatthana Contents
Lee Hanxue
 
Satipatthana Sutta Workshop - S14 Pali Terms for 3 times & Sati
Satipatthana Sutta Workshop - S14 Pali Terms for 3 times & SatiSatipatthana Sutta Workshop - S14 Pali Terms for 3 times & Sati
Satipatthana Sutta Workshop - S14 Pali Terms for 3 times & Sati
Lee Hanxue
 
Satipatthana Sutta Workshop - S13 Noble Truths
Satipatthana Sutta Workshop - S13 Noble TruthsSatipatthana Sutta Workshop - S13 Noble Truths
Satipatthana Sutta Workshop - S13 Noble Truths
Lee Hanxue
 
Satipatthana Sutta Workshop - S12.1 Samatha & Vipassana
Satipatthana Sutta Workshop - S12.1 Samatha & VipassanaSatipatthana Sutta Workshop - S12.1 Samatha & Vipassana
Satipatthana Sutta Workshop - S12.1 Samatha & Vipassana
Lee Hanxue
 
Satipatthana Sutta Workshop - S11 samadhi in kayanupassana & kayagatasati1.3.3
Satipatthana Sutta Workshop - S11 samadhi in kayanupassana & kayagatasati1.3.3Satipatthana Sutta Workshop - S11 samadhi in kayanupassana & kayagatasati1.3.3
Satipatthana Sutta Workshop - S11 samadhi in kayanupassana & kayagatasati1.3.3
Lee Hanxue
 
Satipatthana Sutta Workshop - S10.1 Summary & Conclusion Day 3
Satipatthana Sutta Workshop - S10.1 Summary & Conclusion Day 3Satipatthana Sutta Workshop - S10.1 Summary & Conclusion Day 3
Satipatthana Sutta Workshop - S10.1 Summary & Conclusion Day 3
Lee Hanxue
 
Satipatthana Sutta Workshop - S8 Hindrances
Satipatthana Sutta Workshop - S8 HindrancesSatipatthana Sutta Workshop - S8 Hindrances
Satipatthana Sutta Workshop - S8 Hindrances
Lee Hanxue
 
Satipatthana Sutta Workshop - S6.1 Body Parts and the Four Elements
Satipatthana Sutta Workshop - S6.1 Body Parts and the Four ElementsSatipatthana Sutta Workshop - S6.1 Body Parts and the Four Elements
Satipatthana Sutta Workshop - S6.1 Body Parts and the Four Elements
Lee Hanxue
 
Satipatthana Sutta Workshop - S4.2 Summary & Conclusion Day 1
Satipatthana Sutta Workshop - S4.2 Summary & Conclusion Day 1Satipatthana Sutta Workshop - S4.2 Summary & Conclusion Day 1
Satipatthana Sutta Workshop - S4.2 Summary & Conclusion Day 1
Lee Hanxue
 
Satipatthana Sutta Workshop - S4.1 Calming Bodily Formation
Satipatthana Sutta Workshop - S4.1 Calming Bodily FormationSatipatthana Sutta Workshop - S4.1 Calming Bodily Formation
Satipatthana Sutta Workshop - S4.1 Calming Bodily Formation
Lee Hanxue
 
Satipatthana Sutta Workshop - S3 Satipatthana Structure
Satipatthana Sutta Workshop - S3 Satipatthana StructureSatipatthana Sutta Workshop - S3 Satipatthana Structure
Satipatthana Sutta Workshop - S3 Satipatthana Structure
Lee Hanxue
 

More from Lee Hanxue (20)

Github - Git Training Slides: Foundations
Github - Git Training Slides: FoundationsGithub - Git Training Slides: Foundations
Github - Git Training Slides: Foundations
 
How To Create Metro-style Presentation
How To Create Metro-style PresentationHow To Create Metro-style Presentation
How To Create Metro-style Presentation
 
Google Apps for Elegant Solutions
Google Apps for Elegant SolutionsGoogle Apps for Elegant Solutions
Google Apps for Elegant Solutions
 
Opus codec
Opus codecOpus codec
Opus codec
 
S1 pali scriptures 5.1.4
S1 pali scriptures 5.1.4S1 pali scriptures 5.1.4
S1 pali scriptures 5.1.4
 
Satipatthana workshop July 26-29, 2012
Satipatthana workshop July 26-29, 2012Satipatthana workshop July 26-29, 2012
Satipatthana workshop July 26-29, 2012
 
Satipatthana Sutta Workshop - S7.1 Summary & Conclusion Day 2
Satipatthana Sutta Workshop - S7.1 Summary & Conclusion Day 2Satipatthana Sutta Workshop - S7.1 Summary & Conclusion Day 2
Satipatthana Sutta Workshop - S7.1 Summary & Conclusion Day 2
 
Satipatthana Sutta Workshop - S16 Summary & Conclusion Day 4
Satipatthana Sutta Workshop - S16 Summary & Conclusion Day 4Satipatthana Sutta Workshop - S16 Summary & Conclusion Day 4
Satipatthana Sutta Workshop - S16 Summary & Conclusion Day 4
 
Satipatthana Sutta Workshop - S15.1 Summary & Conclusion Day 3
Satipatthana Sutta Workshop - S15.1 Summary & Conclusion Day 3Satipatthana Sutta Workshop - S15.1 Summary & Conclusion Day 3
Satipatthana Sutta Workshop - S15.1 Summary & Conclusion Day 3
 
Satipatthana Sutta Workshop - S15 Comparison of Satipatthana Contents
Satipatthana Sutta Workshop - S15 Comparison of Satipatthana ContentsSatipatthana Sutta Workshop - S15 Comparison of Satipatthana Contents
Satipatthana Sutta Workshop - S15 Comparison of Satipatthana Contents
 
Satipatthana Sutta Workshop - S14 Pali Terms for 3 times & Sati
Satipatthana Sutta Workshop - S14 Pali Terms for 3 times & SatiSatipatthana Sutta Workshop - S14 Pali Terms for 3 times & Sati
Satipatthana Sutta Workshop - S14 Pali Terms for 3 times & Sati
 
Satipatthana Sutta Workshop - S13 Noble Truths
Satipatthana Sutta Workshop - S13 Noble TruthsSatipatthana Sutta Workshop - S13 Noble Truths
Satipatthana Sutta Workshop - S13 Noble Truths
 
Satipatthana Sutta Workshop - S12.1 Samatha & Vipassana
Satipatthana Sutta Workshop - S12.1 Samatha & VipassanaSatipatthana Sutta Workshop - S12.1 Samatha & Vipassana
Satipatthana Sutta Workshop - S12.1 Samatha & Vipassana
 
Satipatthana Sutta Workshop - S11 samadhi in kayanupassana & kayagatasati1.3.3
Satipatthana Sutta Workshop - S11 samadhi in kayanupassana & kayagatasati1.3.3Satipatthana Sutta Workshop - S11 samadhi in kayanupassana & kayagatasati1.3.3
Satipatthana Sutta Workshop - S11 samadhi in kayanupassana & kayagatasati1.3.3
 
Satipatthana Sutta Workshop - S10.1 Summary & Conclusion Day 3
Satipatthana Sutta Workshop - S10.1 Summary & Conclusion Day 3Satipatthana Sutta Workshop - S10.1 Summary & Conclusion Day 3
Satipatthana Sutta Workshop - S10.1 Summary & Conclusion Day 3
 
Satipatthana Sutta Workshop - S8 Hindrances
Satipatthana Sutta Workshop - S8 HindrancesSatipatthana Sutta Workshop - S8 Hindrances
Satipatthana Sutta Workshop - S8 Hindrances
 
Satipatthana Sutta Workshop - S6.1 Body Parts and the Four Elements
Satipatthana Sutta Workshop - S6.1 Body Parts and the Four ElementsSatipatthana Sutta Workshop - S6.1 Body Parts and the Four Elements
Satipatthana Sutta Workshop - S6.1 Body Parts and the Four Elements
 
Satipatthana Sutta Workshop - S4.2 Summary & Conclusion Day 1
Satipatthana Sutta Workshop - S4.2 Summary & Conclusion Day 1Satipatthana Sutta Workshop - S4.2 Summary & Conclusion Day 1
Satipatthana Sutta Workshop - S4.2 Summary & Conclusion Day 1
 
Satipatthana Sutta Workshop - S4.1 Calming Bodily Formation
Satipatthana Sutta Workshop - S4.1 Calming Bodily FormationSatipatthana Sutta Workshop - S4.1 Calming Bodily Formation
Satipatthana Sutta Workshop - S4.1 Calming Bodily Formation
 
Satipatthana Sutta Workshop - S3 Satipatthana Structure
Satipatthana Sutta Workshop - S3 Satipatthana StructureSatipatthana Sutta Workshop - S3 Satipatthana Structure
Satipatthana Sutta Workshop - S3 Satipatthana Structure
 

Recently uploaded

WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 

Recently uploaded (20)

WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 

Coding for multiple cores

  • 1. Coding for Multiple Cores Bruce Dawson & Chuck Walbourn Programmers Game Technology Group
  • 2. Why multi-threading/multi-core? Clock rates are stagnant Future CPUs will be predominantly multi-thread/ multi-core Xbox 360 has 3 cores PS3 will be multi-core >70% of PC sales will be multi-core by end of 2006 Most Windows Vista systems will be multi-core Two performance possibilities: Single-threaded? Minimal performance growth Multi-threaded? Exponential performance growth
  • 3. Design for Multithreading Good design is critical Bad multithreading can be worse than no multithreading Deadlocks, synchronization bugs, poor performance, etc.
  • 4. Bad Multithreading Thread 1 Thread 2 Thread 3 Thread 4 Thread 5
  • 5. Good Multithreading Game Thread Main Thread RRReeennndddeeerrriiinnnggg TTThhhrrreeeaaaddd Physics Rendering Thread Animation/ Skinning Particle Systems Networking File I/O
  • 6. Another Paradigm: Cascades Thread Input 1 Thread Physics 2 Thread AI 3 Rendering Thread 4 Thread Present 5 FFFFrrrraaaammmmeeee 1243 Advantages: Synchronization points are few and well-defined Disadvantages: Increases latency (for constant frame rate) Needs simple (one-way) data flow
  • 7. Typical Threaded Tasks File Decompression Rendering Graphics Fluff Physics
  • 8. File Decompression Most common CPU heavy thread on the Xbox 360 Easy to multithread Allows use of aggressive compression to improve load times Don’t throw a thread at a problem better solved by offline processing Texture compression, file packing, etc.
  • 9. Rendering Separate update and render threads Rendering on multiple threads (D3DCREATE_MULTITHREADED) works poorly Exception: Xbox 360 command buffers Special case of cascades paradigm Pass render state from update to render With constant workload gives same latency, better frame rate With increased workload gives same frame rate, worse latency
  • 10. Graphics Fluff Extra graphics that doesn't affect play Procedurally generated animating cloud textures Cloth simulations Dynamic ambient occlusion Procedurally generated vegetation, etc. Extra particles, better particle physics, etc. Easy to synchronize Potentially expensive, but if the core is otherwise idle...?
  • 11. Physics? Could cascade from update to physics to rendering Makes use of three threads May be too much latency Could run physics on many threads Uses many threads while doing physics May leave threads mostly idle elsewhere
  • 12. Overcommitted Multithreading? RReennddeerriinngg TThhrreeaadd Physics Rendering Thread Animation/ Skinning Particle Systems Game Thread
  • 13. How Many Threads? No more than one CPU intensive software thread per core 3-6 on Xbox 360 1-? on PC (1-4 for now, need to query) Too many busy threads adds complexity, and lowers performance Context switches are not free Can have many non-CPU intensive threads I/O threads that block, or intermittent tasks
  • 14. Simultaneous Multi-Threading Be careful with Simultaneous Multi- Threading (SMT) threads Not the same as double the number of cores Can give a small perf boost Can cause a perf drop Can avoid scheduler latency Ideally one heavy thread per core plus some additional intermittent threads
  • 15. Case Study: Kameo (Xbox 360) Started single threaded Rendering was taking half of time—put on separate thread Two render-description buffers created to communicate from update to render Linear read/write access for best cache usage Doesn't copy const data File I/O and decompress on other threads
  • 16. Separate Rendering Thread Update Thread Buffer 0 Buffer 1 Render Thread
  • 17. Case Study: Kameo (Xbox 360) Core Thread Software threads 0 0 Game update 1 File I/O 1 0 Rendering 1 2 0 XAudio 1 File decompression Total usage was ~2.2-2.5 cores
  • 18. Case Study: Project Gotham Racing Core Thread Software threads 0 0 Update, physics, rendering, UI 1 Audio update, networking 1 0 Crowd update, texture decompression 1 Texture decompression 2 0 XAudio 1 Total usage was ~2.0-3.0 cores
  • 19. Managing Your Threads Creating threads Synchronizing Terminating Don't use TerminateThread() Bad idea on Windows: leaves the process in an indeterminate state, doesn't allow clean-up, etc. Unavailable on Xbox 360 Instead return from your thread function, or call ExitThread
  • 20. Stack size of zero means inherit parent's Don't forget to close this when done with it Creating Threads Poorly stack size const int stackSize = 0; HANDLE hThread = CreateThread(0, stackSize, ThreadFunctionBad, 0, 0, 0); // Do work on main thread here. for (;;) { // Wait for child thread to complete DWORD exitCode; GetExitCodeThread(hThread, &exitCode); if (exitCode != STILL_ACTIVE) break; } ... Be careful with thread affinities on Windows DWORD __stdcall ThreadFunctionBad(void* data) { #ifdef WIN32 SetThreadAffinityMask(GetCurrentThread(), 8); #endif // Do child thread work here. return 0; } CreateThread doesn't initialize C runtime Busy waiting is bad!
  • 21. Specify stack size on Don't forget to close this when done with it Creating Threads Well const int stackSize = 65536; HANDLE hThread = (HANDLE)_beginthreadex(0, stackSize, ThreadFunction, 0, 0, 0); Xbox 360 // Do work on main thread here. // Wait for child thread to complete WaitForSingleObject(hThread, INFINITE); CloseHandle(hThread); ... Thread affinities must be specified on Xbox unsigned __stdcall ThreadFunction(void* data) { #ifdef XBOX // On Xbox 360 you must explicitly assign // software threads to hardware threads. XSetThreadProcessor(GetCurrentThread(), 2); #endif // Do child thread work here. return 0; } _beginthreadex initializes CRT The correct way to wait for a thread to exit 360
  • 22. Alternative: OpenMP Available in VC++ 2005 Simple way to parallelize loops and some other constructs Works best on long symmetric tasks— particles? Game tasks are short—16.6 ms Many game tasks are not symmetric OpenMP is nice, but not ideal
  • 23. Available Synchronization Objects Events Semaphores Mutexes Critical Sections Don't use SuspendThread() Some title have used this for synchronization Can easily lead to deadlocks Interacts badly with Visual Studio debugger
  • 24. Exclusive Access: Mutex // Initialize HANDLE mutex = CreateMutex(0, FALSE, 0); // Use void ManipulateSharedData() { WaitForSingleObject(mutex, INFINITE); // Manipulate stuff... ReleaseMutex(mutex); } // Destroy CloseHandle(mutex);
  • 25. Exclusive Access: /C/R IInTitIiCalAizLe_SECTION CRITICAL_SECTION cs; InitializeCriticalSection(&cs); // Use void ManipulateSharedData() { EnterCriticalSection(&cs); // Manipulate stuff... LeaveCriticalSection(&cs); } // Destroy DeleteCriticalSection(&cs);
  • 26. Lockless programming Trendy technique to use clever programming to share resources without locking Includes InterlockedXXX(), lockless message passing, Double Checked Locking, etc. Very hard to get right: Compiler can reorder instructions CPU can reorder instructions CPU can reorder reads and writes Not as fast as avoiding synchronization entirely
  • 27. Lockless Messages: Buggy void SendMessage(void* input) { // Wait for the message to be 'empty'. while (g_msg.filled) ; memcpy(g_msg.data, input, MESSAGESIZE); g_msg.filled = true; } void GetMessage() { // Wait for the message to be 'filled'. while (!g_msg.filled) ; memcpy(localMsg.data, g_msg.data, MESSAGESIZE); g_msg.filled = false; }
  • 28. Synchronization tips/costs: Synchronization is moderately expensive when there is no contention Hundreds to thousands of cycles Synchronization can be arbitrarily expensive when there is contention! Goals: Synchronize rarely Hold locks briefly Minimize shared data
  • 29. Beware hidden synchronization: Allocations are (generally) a synch point Consider per-thread heaps with no locking HEAP_NO_SERIALIZE flag avoids lock on Win32 heaps Consider custom single-purpose allocators Consider avoiding memory allocations! Avoid synch in in-house profilers D3DCREATE_MULTITHREADED causes synchronization on almost every Direct3D call
  • 30. Threading File I/O & Decompression First: use large reads and asynchronous I/O Then: consider compression to accelerate loading Don't do format conversions etc. that are better done at build time! Have resource proxies to allow rendering to continue
  • 31. File I/O Implementation Details vector<Resource*> g_resources; Worst design: decompressor locks g_resources while decompressing Better design: decompressor adds resources to vector after decompressing Still requires renderer to synch on every resource access Best design: two Resource* vectors Renderer has private vector, no locking required Decompressor use shared vector, syncs when adding new Resource* Renderer moves Resource* from shared to private vector once per frame
  • 32. Profiling multi-threaded apps Need thread-aware profilers Profiling may hide many synchronization stalls Home-grown spin locks make profiling harder Consider instrumenting calls to synchronization functions Don't use locks in instrumentation—use TLS variables to store results Windows: Intel VTune, AMD CodeAnalyst, and the Visual Studio Team System Profiler Xbox 360: PIX, XbPerfView, etc.
  • 34. Naming Threads typedef struct tagTHREADNAME_INFO { DWORD dwType; // must be 0x1000 LPCSTR szName; // pointer to name (in user addr space) DWORD dwThreadID; // thread ID (-1=caller thread) DWORD dwFlags; // reserved for future use, must be zero } THREADNAME_INFO; void SetThreadName( DWORD dwThreadID, LPCSTR szThreadName) { THREADNAME_INFO info; info.dwType = 0x1000; info.szName = szThreadName; info.dwThreadID = dwThreadID; info.dwFlags = 0; __try { RaiseException( 0x406D1388, 0, sizeof(info)/sizeof(DWORD), (DWORD*)&info ); } __except(EXCEPTION_CONTINUE_EXECUTION) { } } SetThreadName(-1, "Main thread");
  • 35. Other Ideas Debugging tips for MT Visual Studio does support multi-threaded debugging Use threads window Use @hwthread in watch window on Xbox 360 KD and WinDBG support multi-threaded debugging Thread Local Storage (TLS) __declspec(thread) declares per-thread variables But doesn't work in dynamically loaded DLLs TLSAlloc is less efficient, less convenient, but works in dynamically loaded DLLs
  • 36. Windows tips Avoid using D3DCREATE_MULTITHREADED It’s easy, it works, it’s really really slow Best to do all calls to Direct3D from a single thread Could pass off locked resource pointers to a queue for a loading threads to work with Test on multiple machines and configurations Single-core, SMT (i.e. Hyper-Threading), Dual-core, Intel and AMD chips, Multi-socket multicore (4+ cores)
  • 37. Windows API features WaitForMultipleObject Obviously better than a series of WaitForSingleObject calls The OS is highly optimized around multithreading and event-based blocking I/O Completion Ports Very efficient way to have the OS assign a pool of worker threads to incoming I/O requests Useful construct for implementing a game server
  • 38. SMT versus Multicore OS returns number of logical processors in GetSystemInfo(), so a 2 could mean a SMT machine with only 1 actual core –or- 2 cores Detailed Win32 APIs exposing this distinction not available until Windows XP x64, Windows Server 2003 SP1, Windows Vista, etc. GetLogicalProcessorInformation() For now you have to use CPUID detailed by Intel and AMD to parse this out…
  • 39. Timing with Multiple Cores RDTSC is not always synced between cores! As your thread moves from core to core, results of RDTSC counter deltas may be nonsense CPU frequency itself can change at run-time through speed step technologies See Power Management APIs for more information Best thing to do is use Win32 API QueryPerformanceCounter / QueryPerformanceFrequency See DirectX SDK article Game Timing and Multiple Cores
  • 40. Thread Micromanagement Use SetThreadAffinityMask with caution! May be useful for assigning ‘heavy’ work threads This mask is technically a hint, not a commitment RDTSC-based instrumenting will require locking the game threads to a single core Otherwise let the Windows scheduler do the right thing CreateDevice/Reset might have a side-effect on the calling thread’s affinity with software vertex processing enabled
  • 41. Thread Micromanagement (cont) Be careful about boosting thread priority If the priority is too high, you could cause the system to hang and become unresponsive If the priority is too low, the thread may starve
  • 42. DLLs and Multithreading DllMain for every DLL is informed of thread creation/destruction For some DLLs this is required to initialize TLS For many this is a waste of time, so call DisableThreadLibraryCalls() from your DllMain during process creation (DLL_PROCESS_ATTACH) The OS serializes access to the entry point This means threads created during DllMain won’t start for a while, so don’t wait on them in the DLL startup
  • 43. Resources Multithreading Applications in Win32, Jim Beveridge & Robert Weiner, Addison-Wesley, 1997 Multiprocessor Considerations for Kernel-Mode Drivers http://download.microsoft.com/download/e/b/a/eba1050f-a31d- 436b-9281-92cdfeae4b45/MP_issues.doc Determining Logical Processors per Physical Processor http://www.intel.com/cd/ids/developer/asmo-na/ eng/dc/threading/knowledgebase/43842.htm GetLogicalProcessorInformation http://msdn.microsoft.com/library/default.asp?url=/library/en-us/ dllproc/base/getlogicalprocessorinformation.asp Double checked locking http://en.wikipedia.org/wiki/Double-checked_locking
  • 44. Resources GDC 2006 Presentations http://msdn.com/directx/presentations DirectX Developer Center http://msdn.com/directx XNA Developer Center http://msdn.com/xna Xbox Developer Center (Registered Devs Only) https://xds.xbox.com XNA, DirectX, XACT Forums http://msdn.com/directx/forums Email addresses directx@microsoft.com (DirectX Feedback) xboxds@microsoft.com (Xbox Developers Only) xna@microsoft.com (XNA Feedback)
  • 45. © 2006 Microsoft Corporation. All rights reserved. Microsoft, DirectX, Xbox 360, the Xbox logo, and XNA are either registered trademarks or trademarks of Microsoft Corporation in the United Sates and / or other countries. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

Editor's Notes

  1. Good afternoon. My name is Bruce Dawson, and this is Chuck Walbourn. We work together in the Microsoft Game Technology Group. My specialty is Xbox 360, and Chuck&amp;apos;s specialty is Windows. We&amp;apos;re here today to give you some thoughts on how to take advantage of multi-core processors.
  2. This topic is important because the free ride is over... Per hardware thread performance is stagnant, but processor improvement continues &amp;lt;SWIPE&amp;gt; &amp;gt;70% figure applies to servers (&amp;gt;85%), desktop, and laptops—everything Celeron dual-core! Moore&amp;apos;s law lives, but since we can&amp;apos;t increase single-proc clocks our transistor counts much more... &amp;lt;SWIPE&amp;gt; More performance requires multi-threading Multi-core penetration figures: http://cache-www.intel.com/cd/00/00/23/54/235413_235413.pdf
  3. Why this talk? Multi-threading is hard—to get benefit you need to plan for it, and you will hit subtle bugs. Effective multi-threading can be really hard. You may hit problems where threading is actually hurting performance. Done properly—huge benefits Good multi-threading always starts with good design.
  4. Haphazard design Start with one thread, it spawns a couple more Then they spawn a couple more Then you start adding communication between threads And more communication between threads And still more communication between threads Then you add synchronization points, where threads need data from other threads or shared resources End result: a lot of your threads spend a lot of time waiting, you need a lot of synchronization objects, you’re prone to resource contention and synch bugs
  5. Start with main thread, look for major tasks Split out into Game/Rendering Add synch points… other than at those points, both threads can run independently Look for additional parallelizable tasks… physics might be a good candidate Synch points before and after Break out other parallelizable tasks Look for tasks that can run independently of main threads… service requests Add communication but keep it to a minimum
  6. Also, each chunk size must be same as largest. Probably not well-suited for games Can work if you have very few stages. At 30Hz, intolerable latency
  7. Update loop should generally be single threaded. May be able to pull out some parts, like path-finding, but synchronization concerns limit your options.
  8. File I/O is something that is often put on a separate thread. This can avoid stalls that asynchronous I/O can&amp;apos;t always hide. Normally file I/O is not CPU heavy. That can change now. File read/write is cheap, but spare threads allows use of aggressive compression
  9. Rendering is usually quite expensive. D3D overhead adds up, and scene traversal costs also Limited number of primitives per second (On Modern Windows machines, we recommend expecting about 300 draws per frame for 60 FPS) Simple in theory: double-buffer all state that affects rendering. Sometimes complicated in practice. Synchronize once per frame
  10. Graphics fluff is a good candidate because it has few interactions with other data. May not need to run at same frame-rate as game. Some games are spending 100% of a core on cloth animation. &amp;quot;That&amp;apos;s crazy!&amp;quot;, or is it brilliant? The main loop of your game may be impossible to multi-thread, in which case the other threads will sit idle unless you add new features. On PC, graphics fluff can be dropped on single-core machines without affecting game-play. Can be replaced with cheaper alternatives.
  11. This diagram does show good multithreading, but probably not perfect. It relies on spawning extra threads for physics, animation, and particle systems. It could turn out that this system demands ten hardware threads at some times, and two hardware threads at others. Ideally you should try to have the same number of CPU heavy threads running at all times. Amdahl&amp;apos;s law—speeding up part of your calculations just leaves the remainder as the single-threaded bottleneck Middle-ware needs to be flexible enough to adapt to the needs of different games. Physics may be allowed one core—or not.
  12. It is reasonable to have additional threads that are not CPU intensive—blocking on I/O Seque: One per hardware thread, or one per core
  13. SMT means that two hardware threads are sharing execution resources. They share L1 caches and execution units, but have independent register sets. If first thread is under utilizing these resources (too many dependency stalls) then another thread can share the resources and total throughput increases. If first thread is heavily utilizing these resources (well scheduled code) then SMT can&amp;apos;t help much. Cache is often a problem—L1 is small, and two threads may fight over it. Worst case, adding a second thread may reduce performance. How to tell? Measure. Easy on Xbox 360, trickier on PC. Scheduler latency is when you have a thread that is ready to run but the OS waits for the current scheduling quantum to expire before running the thread. If you put a thread on its own hardware thread—even just an SMT thread—then it can wake up faster. This works well if you have a thread that mostly sleeps but needs to wake quickly on demand. There can be multiple threads per core, multiple cores per chip, multiple chips per socket, and multiple sockets per computer. Identifying shared L1 caches can help with decisions about how many processors to use. The non-uniformity of hardware threads is one reason why setting thread affinity is problematic on PC. Now, some examples.
  14. Almost finished on Xbox in August 2004—then moved to Xbox 360 Mostly single-threaded game CPU usage split was 51/49 for update/render—perfect 3-MB buffer to describe rendering (not always filled), took ~1-2 ms to fill buffer, ~33 ms to render Decompression thread saved space on DVD and improved load times, cost was some spare CPU cycles. Actually two threads for file I/O—one for reading, one for decompressing, because some calls can block for ~0.5s doing directory lookups
  15. First the update thread fills buffer 0. The render thread is idle. Then the update thread fills buffer 1. While it is doing this the render thread can run, reading from buffer 0. Then the threads swap buffers. This process continues (go back and forth with the arrow keys).
  16. Multi-threading was added very late—~6 months before launch—but it worked This shows the distribution of threads to cores and hardware threads. Note that one hardware thread is unused. That&amp;apos;s okay—it ensures that rendering runs at top speed. There were a few other threads (audio processing, etc.) but not many—roughly one CPU intensive thread per core Cores 0 and 1 were ~80-99% utilized, and core 2 was typically 50% utilized, for total CPU usage of ~2.2-2.5 cores, or ~7-8 GHz
  17. This title is also on Xbox 360. Things to notice: rendering on same thread as update. Two decompression threads. One unused thread, to leave all cycles to audio. Audio was a problem in this title. The update thread and crowd update threads both need to trigger sounds, which required grabbing a critical section that the Audio update thread was often holding.
  18. Things to point out: _beginthreadex is required on Windows to ensure that the CRT is initialized with any TLS required. Optional on Xbox 360 (can use CreateThread, only difference is return type of thread-creation function and thread function) Specifying the stack size is important on Xbox 360 to avoid wasting memory. Should generally always be a multiple of 64-KB. Waiting on the thread handle is how you tell when a thread has terminated—don&amp;apos;t busy wait for this! Return value is a thread handle, must be closed when not needed anymore. Thread affinity is completely manual on Xbox 360. Generally best to let OS do it on Windows, unless you really know what you&amp;apos;re doing. Can easily reduce performance by poor understanding of processor topology (overusing two hardware threads on one core, while leaving a thread idle), or by poor interactions with other processes—your threads unable to run despite having idle hardware threads.
  19. Thread creation is expensive. Don&amp;apos;t do it often. If a thread is temporarily unneeded, leave it waiting on an event or semaphore. This code specifies the stack size, uses _beginthreadex, properly waits for the child to terminate, closes the handle, and specifies the thread affinity on Xbox 360 but not Win32. Perfection! If you don&amp;apos;t need the handle, close it immediately.
  20. Some areas where OpenMP has been used include: Particles Skinning Physics Usually minimal benefit, due to limited scope
  21. This guarantees that ManipulateSharedData() is only executed by one thread at a time. But, mutexes are not the cheapest option...
  22. Critical sections are much cheaper. On Xbox 360 and on Windows they run roughly 20x faster. Two restrictions: cannot be used between processes, and cannot be used with WaitForMultipleObjects Mutexes are kernel objects, so they require a kernel transition, whereas critical sections are user-space objects. Mutexes are more robust in the face of thread death. CRITICAL_SECTION is a good optimization but... key optimization is don&amp;apos;t synchronize too often
  23. x86 CPUs can reorder reads Xbox 360 CPUs can reorder reads and writes—despite being in-order CPUs
  24. g_message.filled must be marked volatile or else both loops will tend to spin forever. Even then, with many compiler/platform pairs there is nothing to stop the write to g_message.filled from being reordered. In SendMessage the write to g_message.filled might be visible before the write to g_message.data is visible. Similarly in GetMessage the reads in process message might come from L2 before the read of g_message.filled. Both types of reordering can happen on the Xbox 360 CPU, and on many compilers. Different hardware threads don&amp;apos;t talk to each other directly—they talk to shared memory/shared L2. Thus, if you prevent reordering in SendMessage that guarantees that the writes get to L2 in order. However, you still have to separately guarantee that reads come from L2 in order in GetMessage. Crucial observation: Lockless programming can be fast, but it is still a type of synchronization, and is more expensive than no synchronization Particularly tricky prior to VS 2005—poorly defined guarantees from volatile Particularly tricky on Xbox 360—volatile and InterlockedXxx semantics are slightly different and don&amp;apos;t prevent CPU reorganization of reads and writes—need explicit memory barriers.
  25. Requiring exclusive access to a popular resource can make multi-threading a complex way of doing single-threading on multiple threads Ideally you want to use synchronization primitives to guarantee multiple threads won&amp;apos;t modify resources simultaneously, while designing so that they generally won&amp;apos;t anyway. Sometimes it is worth doing a short spin-lock on resources that are likey to be held for only a short time. InitializeCriticalSectionAndSpinCount supports this.
  26. g_resources holds a list of pointers to all loaded resources. It is referenced frequently by the render thread as it needs meshes, textures, shaders, etc. The load thread needs to make resources available once they are loaded. If the decompression thread locks g_resources while it decompresses, or while it does file I/O, then the render thread may be locked out for long periods. If g_resources is shared at all, then every reference by the render thread requires synchronization, wasting time on acquiring and releasing locks. Best design is two (or more) vectors, to insulate threads from each other. Private data is good.
  27. Anecdote about profile capture completely hiding critical issue (code was waiting on GPU, but only when not profiling. Same thing happened waiting on load thread) I actually saw a title that had instrumented a ton of functions but then stored the results to a shared array, using critical sections to guard it. About 90% of their synchronization was in the profile functions. Synchronization stalls are hard to locate Use Timing Capture on Xbox 360 to visualize threading behavior Add instrumentation to make visualization easier
  28. This wacky trick makes the name available for Visual Studio, WinDBG, etc, on Xbox 360 and on Windows. It also makes the name available to some other tools, like the PIX timing capture. The VS screenshot of a Windows app shows just two threads (named), and the VS screenshot of an Xbox 360 app shows... more.
  29. If your multi-threaded code is not tested on multi-proc systems, it will fail!
  30. Mention that this is complicated by the fact that some early releases of processor drivers have bugs where QPC/QPF relies on RDTSC and therefore exhibits the problem. The fix is to get the latest processor driver from AMD website.
  31. &amp;lt;number&amp;gt; ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 11/28/14 01:32