Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Speed Up Synchronization Locks:  A Scaleform Case Study Abhishek Agrawal Software Solutions Group
Legal Disclaimer <ul><li>INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS ...
Agenda <ul><li>Common Locking Issues </li></ul><ul><li>Windows* Locking Methodologies and associated performance </li></ul...
Why care for Locking ?? <ul><li>Locking code can be the most frequently run code in a multi-threaded application </li></ul...
Common Lock Pathologies <ul><li>Can introduce performance and correctness problems  </li></ul><ul><li>Some potential probl...
How to avoid Lock Pathologies <ul><li>Deadlocks </li></ul><ul><ul><li>Avoid needing to hold two locks at the same time </l...
Agenda <ul><li>Common Locking Issues </li></ul><ul><li>Windows* Locking Methodologies and associated performance </li></ul...
Windows* Locking Methodologies  <ul><li>Interlocked Functions </li></ul><ul><ul><li>Located in kernel32.dll </li></ul></ul...
WaitForSingleObject Vs. EnterCriticalSection <ul><li>Can be used by putting an EnterCriticalSection and LeaveCriticalSecti...
EnterCriticalSection Vs. WaitForSingleObject <ul><li>EnterCriticalSection is much faster under 1 thread (no contention) si...
Where is the Performance Hit ?? <ul><li>Window’s locking APIs have the possibility of jumping into the operating system ke...
Agenda <ul><li>Common Locking Issues </li></ul><ul><li>Windows* Locking Methodologies and associated performance </li></ul...
User Level Atomic Locks <ul><li>Involves utilizing the atomic instructions of processor to atomically update a memory spac...
A Sample User Level Atomic Lock <ul><li>Figure shows the assembly of a simple mutex lock demonstrating usage of utilizing ...
Windows Interlocked Functions <ul><li>Windows provides access to the most frequently used atomic instructions for synchron...
Atomic Lock (Performance Comparison)  <ul><li>The figure compares the cost of user-level atomic lock vs. WaitForSingleObje...
Scaleform* <ul><li>Scaleform GFx: The #1 Video Game UI Solution </li></ul><ul><li>GFx is a rich media player that supports...
Why Is Threaded UI Important ??  <ul><li>The Future of Animated Flash and Video Textures! </li></ul>
Scaleform* Case Study Summary  <ul><li>Background loading, vector tessellation, Flash playback and ActionScript execution ...
Using Fast Locks in Scaleform* volatile DWORD LockedThreadId = 0; void GLock::Lock() { DWORD threadId = GetCurrentThreadId...
Scaleform GFx* Multi-threaded Demo  <ul><li>Playback multiple files at once on separate threads </li></ul><ul><li>ActionSc...
Agenda <ul><li>Common Locking Issues </li></ul><ul><li>Windows Locking Methodologies and associated performance </li></ul>...
Finding Lock Contention Using Intel Tools <ul><li>Lock Contention is another major issue which limits Scalability and adds...
Contention using VTune™ (Where to Look) <ul><li>EnterCriticalSection </li></ul><ul><ul><li>Ring0 ntoskrnl.exe becomes hott...
Contention in WaitForSingleObject using VTune™ <ul><li>Example shows the hot functions within the Windows OS kernel, ntdll...
Possible Ways to Reduce Lock Contention <ul><li>Lock Stripping. </li></ul><ul><ul><li>Does your whole array really need to...
Microsoft Flight Simulator* Case Study <ul><li>Multi-Threading Goal </li></ul><ul><ul><li>Separate terrain processing from...
<ul><li>Symptoms and Thread Profiling </li></ul><ul><ul><li>Occasional Stuttering </li></ul></ul><ul><ul><li>Doesn’t scale...
Locking Root-Cause <ul><li>Both cases lead to global hash map access. </li></ul><ul><ul><li>Only 1 thread can access the h...
Flight Simulator* Result <ul><li>Reduced stuttering, lower latency in terrain loading, and  </li></ul><ul><li>better visua...
Synchronization Primitives in Intel TBB ® <ul><li>Atomic Operations </li></ul><ul><ul><ul><li>High-level abstraction for a...
Example TBB ®  Reader-Writer Lock <ul><li>If exception occurs within the protected code block destructor will automaticall...
General Recommendations for TBB ®  Locks <ul><li>spin_mutex is VERY FAST in lightly contended situations; use it if you ne...
Summary & Call to Action <ul><li>The use of inefficient synchronization strategy can have a big impact on the performance ...
Contact Info <ul><li>For more info –see our Graphics, Game Development and Threading resources at:  http:// softwarecommun...
 
Upcoming SlideShare
Loading in …5
×

Speed Up Synchronization Locks: How and Why?

9,541 views

Published on

A brief introduction on synchronization primitives used for gaming consoles and Windows platforms and ways to identify potential problems with locks using Intel tools. The talk will discuss an alternate optimized implementation of the Windows Critical_Section with Scaleform as a case study highlighting the importance of using optimized locks.

Published in: Technology, Business
  • Be the first to comment

Speed Up Synchronization Locks: How and Why?

  1. 1. Speed Up Synchronization Locks: A Scaleform Case Study Abhishek Agrawal Software Solutions Group
  2. 2. Legal Disclaimer <ul><li>INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. </li></ul><ul><li>Intel may make changes to specifications and product descriptions at any time, without notice. </li></ul><ul><li>All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. </li></ul><ul><li>Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. </li></ul><ul><li>Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. </li></ul><ul><li>Intel, Intel Inside, and the Intel logo are trademarks of Intel Corporation in the United States and other countries. </li></ul><ul><li>*Other names and brands may be claimed as the property of others. </li></ul><ul><li>Copyright © 2008 Intel Corporation. </li></ul>
  3. 3. Agenda <ul><li>Common Locking Issues </li></ul><ul><li>Windows* Locking Methodologies and associated performance </li></ul><ul><li>User Level Atomic Locks with Scaleform* case Study </li></ul><ul><li>Hot Locks and Lock Contention with Flight Simulator* Case Study </li></ul><ul><li>Locks in Intel TBB ® </li></ul><ul><li>Summary & Call to Action </li></ul>
  4. 4. Why care for Locking ?? <ul><li>Locking code can be the most frequently run code in a multi-threaded application </li></ul><ul><li>Determining which methodology of locking to utilize can be as critical as identification of parallelism within an application </li></ul><ul><li>Improper use of locking mechanism can lead to situations like lock stuttering, very high contention and new types of programming bugs </li></ul>Proper use of locks is crucial for multi-threading applications
  5. 5. Common Lock Pathologies <ul><li>Can introduce performance and correctness problems </li></ul><ul><li>Some potential problems </li></ul><ul><ul><li>Deadlock </li></ul></ul><ul><ul><ul><li>Happens when tasks are trying to acquire more than one lock and each holds some of the locks the other tasks need in order to proceed </li></ul></ul></ul><ul><ul><li>Convoying </li></ul></ul><ul><ul><ul><li>Occurs when the operating system interrupts a task that is holding a lock </li></ul></ul></ul><ul><ul><li>Priority Inversion </li></ul></ul><ul><ul><ul><li>Refers to the scenario where a lower-priority task holds a shared resource that is required by a higher-priority task </li></ul></ul></ul>
  6. 6. How to avoid Lock Pathologies <ul><li>Deadlocks </li></ul><ul><ul><li>Avoid needing to hold two locks at the same time </li></ul></ul><ul><ul><li>Always acquire locks in the same order (e.g. outer container and inner container mutexes) </li></ul></ul><ul><ul><li>Use atomic operations </li></ul></ul><ul><li>Convoying & Priority Inversion </li></ul><ul><ul><li>Use atomic operations instead of locks where possible </li></ul></ul>Use Atomic Operations and User-Level Locks
  7. 7. Agenda <ul><li>Common Locking Issues </li></ul><ul><li>Windows* Locking Methodologies and associated performance </li></ul><ul><li>User Level Atomic Locks with Scaleform* case Study </li></ul><ul><li>Hot Locks and Lock Contention with Flight Simulator* Case Study </li></ul><ul><li>Locks in Intel TBB ® </li></ul><ul><li>Summary & Call to Action </li></ul>
  8. 8. Windows* Locking Methodologies <ul><li>Interlocked Functions </li></ul><ul><ul><li>Located in kernel32.dll </li></ul></ul><ul><ul><li>Essentially just utilizing atomic instructions </li></ul></ul><ul><li>TryEnterCriticalSection (Non-Blocking) </li></ul><ul><ul><li>Attempts to get a lock N times in ring 3 </li></ul></ul><ul><li>EnterCriticalSection (Blocking) </li></ul><ul><ul><li>Attempts to get the lock one time in ring 3 and then jumps into ring 0 </li></ul></ul><ul><li>WaitForSingleObject </li></ul><ul><ul><li>Jumps into ring 0 100% of the time whether the lock is achieved or not </li></ul></ul><ul><ul><li>Mutexes and Semaphore APIs follow the same path </li></ul></ul>
  9. 9. WaitForSingleObject Vs. EnterCriticalSection <ul><li>Can be used by putting an EnterCriticalSection and LeaveCriticalSection API call surrounding the critical section code </li></ul><ul><li>The API has the advantage over WaitForSingleObject in that it will not enter the kernel unless there is contention on the lock </li></ul><ul><li>Disadvantage of EnterCriticalSection </li></ul><ul><li>- It’s a blocking call </li></ul><ul><ul><li>- It cannot be processed globally </li></ul></ul><ul><ul><li>and there is no guarantee on the </li></ul></ul><ul><ul><li>order which threads obtain the </li></ul></ul><ul><ul><li>lock </li></ul></ul><ul><li>An overloaded Microsoft API which can be used to check and modify the state of a number of different objects such as events, jobs etc </li></ul><ul><li>Advantage of WaitForSingleObject is that it can be processed globally which enables it to be used for synchronization between processes </li></ul><ul><li>One major disadvantage of WaitForSingleObject is that it will always obtain a kernel lock, so it enters privileged mode (ring 0) whether the lock is achieved or not </li></ul>EnterCriticalSection WaitForSingleObject
  10. 10. EnterCriticalSection Vs. WaitForSingleObject <ul><li>EnterCriticalSection is much faster under 1 thread (no contention) since it will not jump into the kernel if lock is achieved </li></ul><ul><li>WaitForSingleObject and EnterCriticalSection have similar costs associated with them under high contention scenarios </li></ul>Timings for the sample memory management kernel for 1 and 2 threads. Timings for the sample memory management kernel for 1 to 64 threads. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit Intel Performance Benchmark Limitations (http://www.intel.com/performance/resources/limits.htm)
  11. 11. Where is the Performance Hit ?? <ul><li>Window’s locking APIs have the possibility of jumping into the operating system kernel </li></ul><ul><li>Both EnterCriticalSection and WaitForSingleObject will enter the kernel if there is contention on the lock. The transition from user mode to privileged mode can be costly if accomplished excessively </li></ul><ul><li>Most performance impact is in the case of granular locking where the lock is achieved and released in hundreds of cycles </li></ul>User Level Locks should be used for Granular Operations and in High Contention Scenarios
  12. 12. Agenda <ul><li>Common Locking Issues </li></ul><ul><li>Windows* Locking Methodologies and associated performance </li></ul><ul><li>User Level Atomic Locks with Scaleform* case Study </li></ul><ul><li>Hot Locks and Lock Contention with Flight Simulator* Case Study </li></ul><ul><li>Locks in Intel TBB ® </li></ul><ul><li>Summary & Call to Action </li></ul>
  13. 13. User Level Atomic Locks <ul><li>Involves utilizing the atomic instructions of processor to atomically update a memory space </li></ul><ul><li>The atomic instructions involve utilizing a lock prefix on the instruction and having the destination operand assigned to a memory address </li></ul><ul><li>Some of the instructions which can run atomically with a lock prefix on current Intel processors are: ADD, ADC, AND, BTC, BTR, CMPXCHG, DEC, INT, SUB, XOR, XADD, XCHG etc </li></ul>
  14. 14. A Sample User Level Atomic Lock <ul><li>Figure shows the assembly of a simple mutex lock demonstrating usage of utilizing an atomic instruction with a lock prefix for obtaining a lock </li></ul>Is it necessary to write assembly to take advantage of user land locks which utilize the lock prefix ??
  15. 15. Windows Interlocked Functions <ul><li>Windows provides access to the most frequently used atomic instructions for synchronization through the “interlocked” APIs InterlockedExchange, InterlockedIncrement, InterlockedDecrement, InterlockedCompareExchange and InterlockedExchangeAdd etc. </li></ul><ul><li>API’s reside in kernel32.dll </li></ul><ul><li>The interlocked functions do not have any possibility of jumping into the Windows kernel </li></ul>
  16. 16. Atomic Lock (Performance Comparison) <ul><li>The figure compares the cost of user-level atomic lock vs. WaitForSingleObject </li></ul><ul><li>Both under high and low contention scenarios, the user-level atomic lock is several orders of magnitude cheaper. For this reason, a user-level lock is preferable for frequently called granular locking </li></ul>Cost of user-level atomic lock vs. WaitForSingleObject for the memory management locking kernel example Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit Intel Performance Benchmark Limitations (http://www.intel.com/performance/resources/limits.htm)
  17. 17. Scaleform* <ul><li>Scaleform GFx: The #1 Video Game UI Solution </li></ul><ul><li>GFx is a rich media player that supports Flash </li></ul><ul><li>Licensed for Crysis, Mass Effect, and 150+ games </li></ul><ul><li>Available on all leading PC and Console platforms </li></ul><ul><li>Used for Menus, HUDs, and Animated Textures </li></ul><ul><li>Recently introduced Thread Support into the GFx for Simultaneous Playback, Optimized Loading, ActionScript Processing and other tasks </li></ul>
  18. 18. Why Is Threaded UI Important ?? <ul><li>The Future of Animated Flash and Video Textures! </li></ul>
  19. 19. Scaleform* Case Study Summary <ul><li>Background loading, vector tessellation, Flash playback and ActionScript execution may require many allocations, which reduce performance. </li></ul><ul><li>Solution: Innovative allocator that uses about 35 cycles for allocate/free requests but that optimization is meaningless if it needs to be synchronized with a critical section. </li></ul><ul><li>In allocation-heavy examples, system lock can reduce performance by 10-30%. </li></ul><ul><li>GLock gives about 50% locking performance improvement. </li></ul><ul><li>Based on “Fast Critical Sections” post by Vladislav Gelfer on Code Project. </li></ul>
  20. 20. Using Fast Locks in Scaleform* volatile DWORD LockedThreadId = 0; void GLock::Lock() { DWORD threadId = GetCurrentThreadId(); if (threadId != LockedThreadId) { if ( (LockedThreadId == 0) && (InterlockedCompareExchange((long*)&LockedThreadId, threadId, 0) == 0 ) ) { // Single instruction atomic quick-lock was successful. } else { // Potentially locked elsewhere, so do a more expensive // lock with system wait on semaphore. PerfLock(threadId); } } RecursiveLockCount++; } void GLock::Unlock() { if (--RecursiveLockCount == 0) { // Release lock does not need atomic op on Intel Architecture! LockedThreadId = 0; // Release other system semaphore waiters, if any. } }
  21. 21. Scaleform GFx* Multi-threaded Demo <ul><li>Playback multiple files at once on separate threads </li></ul><ul><li>ActionScript intensive Flash file </li></ul>
  22. 22. Agenda <ul><li>Common Locking Issues </li></ul><ul><li>Windows Locking Methodologies and associated performance </li></ul><ul><li>User Level Atomic Locks with Scaleform* case Study </li></ul><ul><li>Hot Locks and Lock Contention with Flight Simulator* Case Study </li></ul><ul><li>Locks in Intel TBB ® </li></ul><ul><li>Summary & Call to Action </li></ul>
  23. 23. Finding Lock Contention Using Intel Tools <ul><li>Lock Contention is another major issue which limits Scalability and adds Complexity </li></ul><ul><li>Intel Tools can help in finding high contention scenarios </li></ul><ul><ul><li>VTune™ </li></ul></ul><ul><ul><ul><li>Collecting clock ticks event via event based sampling using the Intel VTune Analyzer can be useful to help determine how much contention is occurring </li></ul></ul></ul><ul><ul><li>Thread Profiler™ </li></ul></ul><ul><ul><ul><li>Provides an API for users to instrument user synchronization </li></ul></ul></ul><ul><ul><ul><li>Spin waits appear as a hashed color in the Thread Profiler GUI </li></ul></ul></ul>Please refer to Intel Session on “Comparative Analysis of Game Parallelization” for more details on Thread Profiler
  24. 24. Contention using VTune™ (Where to Look) <ul><li>EnterCriticalSection </li></ul><ul><ul><li>Ring0 ntoskrnl.exe becomes hotter </li></ul></ul><ul><ul><li>For very high contention scenario, ring 0 becomes hot and number of context switches become very high </li></ul></ul><ul><li>TryEnterCriticalSection </li></ul><ul><ul><li>Ntdll.dll will become hotter as you add threads </li></ul></ul><ul><li>WaitForSingleObject </li></ul><ul><ul><li>Similar behavior as EnterCriticalSection </li></ul></ul><ul><li>Interlocked Functions </li></ul><ul><ul><li>kernel32.dll will get hot </li></ul></ul>
  25. 25. Contention in WaitForSingleObject using VTune™ <ul><li>Example shows the hot functions within the Windows OS kernel, ntdll.dll, and hal.dll under no contention and high contention for WaitForSingleObject call </li></ul>
  26. 26. Possible Ways to Reduce Lock Contention <ul><li>Lock Stripping. </li></ul><ul><ul><li>Does your whole array really need to be protected by the same lock or can you give each element its own lock? </li></ul></ul><ul><li>Protect data, not code. </li></ul><ul><ul><li>Common technique is to put a lock around the whole function call. Remember that it’s only data that needs to be protected, not the code. </li></ul></ul><ul><li>Use Reader-Writer Locks where applicable. </li></ul><ul><ul><li>For the cases where a lot of threads read a memory location that is rarely changed. </li></ul></ul><ul><ul><li>Ensures that multiple readers can enter the lock at the same time. </li></ul></ul>
  27. 27. Microsoft Flight Simulator* Case Study <ul><li>Multi-Threading Goal </li></ul><ul><ul><li>Separate terrain processing from rendering </li></ul></ul><ul><ul><ul><li>Loading games once in the beginning </li></ul></ul></ul><ul><ul><ul><li>The engine keeps loading contents in the background while playing </li></ul></ul></ul><ul><ul><ul><li>Main thread runs D3D, physics, etc. </li></ul></ul></ul><ul><ul><ul><li>All other threads loads and pre-processes the terrain textures and other contents </li></ul></ul></ul><ul><ul><li>Loading and processing textures without slowing down frame-rate </li></ul></ul><ul><ul><ul><li>Expected to scale in terms of processing more contents as more processors are available </li></ul></ul></ul>
  28. 28. <ul><li>Symptoms and Thread Profiling </li></ul><ul><ul><li>Occasional Stuttering </li></ul></ul><ul><ul><li>Doesn’t scale well from 2->4 Cores because of very high contention </li></ul></ul>Locking Problem Main Thread BKG Thread Main Thread BKG Thread
  29. 29. Locking Root-Cause <ul><li>Both cases lead to global hash map access. </li></ul><ul><ul><li>Only 1 thread can access the hash map while all other threads are blocked </li></ul></ul><ul><ul><li>Entire hash map was protected by a critical section (probably the worst choice) </li></ul></ul><ul><li>Solution </li></ul><ul><ul><li>Protect each bucket in the hash map instead of the whole hash map. </li></ul></ul><ul><ul><ul><li>As long as multiple threads are accessing different buckets, they are safe and don’t block each other </li></ul></ul></ul><ul><ul><li>Use of Lock Free Library </li></ul></ul><ul><ul><ul><li>Microsoft* internal tools </li></ul></ul></ul><ul><ul><ul><li>The concept is to have a single thread to write, but multiple threads can read at the same time as long as it is not being written. </li></ul></ul></ul><ul><ul><ul><li>TBB provides similar locking mechanism </li></ul></ul></ul>
  30. 30. Flight Simulator* Result <ul><li>Reduced stuttering, lower latency in terrain loading, and </li></ul><ul><li>better visuals without sacrificing frame rates </li></ul>
  31. 31. Synchronization Primitives in Intel TBB ® <ul><li>Atomic Operations </li></ul><ul><ul><ul><li>High-level abstraction for atomic instructions. </li></ul></ul></ul><ul><ul><ul><ul><li>OS/Compiler Portable </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Supports Processors like (Itanium) which have weak memory consistency </li></ul></ul></ul></ul><ul><li>Exception-safe Locks </li></ul>No No Yes Yes queuing_rw_mutex No No No No spin_rw_mutex No No Yes Yes queuing_mutex No No No No spin_mutex Yes No OS dependent OS dependent mutex Sleeps Reentrant Fair Scalable
  32. 32. Example TBB ® Reader-Writer Lock <ul><li>If exception occurs within the protected code block destructor will automatically release the lock if it’s acquired avoiding a dead-lock </li></ul><ul><li>Any reader lock may be upgraded to writer lock; upgrade_to_writer indicates whether the lock had to be released before it can upgrade </li></ul>#include “tbb/spin_rw_mutex.h” using namespace tbb; spin_rw_mutex MyMutex; int foo (){ /* Construction of ‘lock’ acquires ‘MyMutex’ */ spin_rw_mutex::scoped_lock lock (MyMutex, /*is_writer*/ false); … if (!lock.upgrade_to_writer ()) { /*data may have been modified since the last read*/ } else { /* data was not modified by other thread */ } return 0; /* Destructor of ‘lock’ releases ‘MyMutex’ */ }
  33. 33. General Recommendations for TBB ® Locks <ul><li>spin_mutex is VERY FAST in lightly contended situations; use it if you need to protect very few instructions </li></ul><ul><li>Use queuing_rw_mutex when scalability and fairness are important </li></ul><ul><li>Use reader-writer mutex to allow non-blocking read for multiple threads </li></ul>Please refer to Intel Session on “Comparative Analysis of Game Parallelization” for more details on TBB
  34. 34. Summary & Call to Action <ul><li>The use of inefficient synchronization strategy can have a big impact on the performance of your Multi-Threaded application: if it doesn’t hit you today then it sure will do tomorrow. </li></ul><ul><li>Try using User-Level Atomic Locks instead of very expensive Kernel-Locks. </li></ul><ul><li>Use Intel Tools (VTune™ and Thread Profiler™) to help identify potential lock problems. </li></ul><ul><li>Use the locks properly to avoid high contention scenarios and make your code more scalable. </li></ul>
  35. 35. Contact Info <ul><li>For more info –see our Graphics, Game Development and Threading resources at: http:// softwarecommunity.intel.com / </li></ul><ul><li>Feel free to contact me directly: abhishek.r.agrawal@intel.com </li></ul>

×