Paris Game/AI Conference 2011
Upcoming SlideShare
Loading in...5
×
 

Paris Game/AI Conference 2011

on

  • 1,493 views

Slides from the Paris Game/AI Conference 2011 talk by Neil Henning - covering the

Slides from the Paris Game/AI Conference 2011 talk by Neil Henning - covering the

Statistics

Views

Total Views
1,493
Views on SlideShare
1,146
Embed Views
347

Actions

Likes
2
Downloads
14
Comments
0

4 Embeds 347

http://www.neil-henning.co.uk 249
http://www.codeplay.com 96
http://translate.googleusercontent.com 1
http://testplay2.local 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Paris Game/AI Conference 2011 Paris Game/AI Conference 2011 Presentation Transcript

    • Preparing AI for Parallelism
      Lessons from NASCAR The Game 2011
      Neil Henning – Technology Lead
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Introduction
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Introduction
      I am sure some of you are wondering...
      Why a guy from
      is doing a talk about
      which was developed by
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Introduction
      Team from Codeplay worked for 15 months on game
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Introduction
      • NASCAR isn’t just about driving straight, then turning left
      • 43 cars on screen at the same time
      • Cars race in tight packs on the circuit
      • Overtaking is all about navigating through these packs
      Cannot simply make the AI use LODs, nearly always in view
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Agenda
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Agenda
      How to prepare AI for parallelism
      • …by investigating NASCAR the Game 2011's AI
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Agenda
      • During the investigation I will answer the questions:
      • Why prepare your AI for parallelism?
      • What changes should be made?
      • What common issues are there?
      • How did these changes help when optimizing NASCAR?
      • How did we make use of the PS3's unique hardware?
      What performance improvement was achieved?
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Why prepare your AI for parallelism?
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Why prepare your AI for parallelism?
      • Without parallelism, tighter limits on number of bots
      • Say we have four bots
      • In serial – can easily fit in a frame
      frame length
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Why prepare your AI for parallelism?
      Without parallelism, tighter limits on number of bots
      • Want to increase bots by 3x?
      • Have to either optimize or parallelize (or both)
      frame length
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Why prepare your AI for parallelism?
      Without parallelism, tighter limits on number of bots
      • Split work between threads
      • Only possible with parallelism
      frame length
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Why prepare your AI for parallelism?
      • Multicore is the future (has been for some time)
      • This generation of consoles are multicore
      • Sony's new PS Vita is quad core
      • Even iPad usesdual core processors now!
      • Being able to split work amongst cores is key
      • Might not be required yet, but could be essential later
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Why prepare your AI for parallelism?
      • Helps during crunch time
      • Optimization being sought throughout engine
      • Either optimize engine or cut features
      • Have AI prepared to become parallel
      • Optimization folks will love you!
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What changes should be made?
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What changes should be made?
      • Split work into manageable chunks
      • In NASCAR, had 18 components for each car
      Driving
      Controllers
      Stay
      Behind
      Obstacle
      Detection
      Stay
      Beside
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What changes should be made?
      • Components are in groups
      • All components in a group can be run in parallel
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What changes should be made?
      • 43 cars = 43 AIs
      • Each car’s groups can be run in parallel too
      0
      1
      2


      42
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What changes should be made?
      • Read/Write phases
      • Two phases for your AI
      • Read phase can read world/other car state
      • Write phase can modify own car state
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What changes should be made?
      • Use temporary data to store read values from environment
      • In read phase, store needed reads into temporary data
      • In write phase, read from the temporary data
      • AI is one frame behind world events
      • Effect on AI is minimal
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What changes should be made?
      • In NASCAR a read/write phase was used
      • Write phase uses data from previous frames read phase
      Write Phase
      Read Phase
      • Minimal set of components in read/write phase group
      • Only components that required world/other car state
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What changes should be made?
      • Remove large stack locals
      • Having two or more threads means lots of duplicate locals
      void func()
      {
      char localBuffer[1024];
      // … do something with localBuffer
      }
      • If func is called from many threads, many times data use!
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What changes should be made?
      • Document code – describe relationship between data
      structFoo
      {
      Bar* bar;
      };
      many : one?
      one : one?
      one : many?
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What changes should be made?
      • Document code – describe relationship between data
      structFoo
      {
      Bar* bar;
      };
      • Knowing how data is shared critical for threading
      • Documenting the relationship saves time and effort later
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What common issues are there?
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What common issues are there?
      • Virtual functions – can have a high runtime cost
      • ~500-1200 cycles on PowerPC if virtual lookup misses cache
      • Can equate to a large amount of time doing no work
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What common issues are there?
      • In NASCAR, components had virtual update method
      • Based on previous game (Supercar Challenge)
      • 16 cars in previous, now 43 cars
      • 5 component types in previous, now 18 component types
      • Now read/write phase too
      • 80 virtual calls to update became 1333 virtual calls!
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What common issues are there?
      • In NASCAR, components had virtual update method
      • In real terms, 3ms of virtual function lookup per frame
      • First optimization was to have typed buckets of components
      • 1333 virtual calls went to 31 virtual calls
      • Platform agnostic (PS3, 360 and Wii all sped up)
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What common issues are there?
      • Virtual functions not just a code abstraction
      structFoo { virtual void func(); };
      structBar : public Foo { virtual void func(); };
      Foo * foo;
      foo->func();
      // don’t know size of foo! Could be sizeof(Foo) || sizeof(Bar)
      • Virtual functions hide data too
      • Not knowing the size of data kills SPU/Compute development
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What common issues are there?
      • Naïve multithreading – locks galore
      • Avoid/reduce/remove locks if possible
      • Locks can be a solution, be very careful of use though
      void func()
      {
      lock->lock();
      // … do something
      lock->unlock();
      }
      • Read/write phases allow removal of most (if not all) locks
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What common issues are there?
      • Physics subsystem caused issues with NASCAR
      • AI required knowledge of obstacles
      • Physics system used, raycast to find problematic obstacles
      • Each call to raycast used a mutex, every thread would halt!
      • Had to refactor code to remove need for locking
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What common issues are there?
      • Know your data – how is it accessed? Where is it shared?
      structRaceCar{ Brain * brain; };
      structBrain { RaceCar * raceCar; Obstacle ** obstacles; };
      structObstacle { BrainInterface * interface; };
      structBrainInterface{ RaceCar * raceCar; Brain * brain; };
      • Very easy for systems grown over time to have convoluted struct layouts
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did these changes help when optimizing NASCAR?
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did these changes help when optimizing NASCAR?
      ...
      barriers
      • Read/Write phase was key to performance on Xbox 360
      • Allowed work to be split across all 6 threads
      • Each thread was given 1/6th of the cars to process
      • Takes 2ms of all CPU resources on 360 in a frame
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did these changes help when optimizing NASCAR?
      • Tried the same approach on PS3
      • Only 2 threads on PS3, but have 6 sub processors (the SPUs)
      • Both threads on PS3 were completely full
      • Any multithreading speedup has to be on the SPUs
      Each SPU has 256kb local storage (for code & data)
      Code was ~2Mb and data was ~8Mb – far too large!
      • Unfeasible to mimic 360 approach
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did these changes help when optimizing NASCAR?
      On PS3 most costly components were targeted
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did these changes help when optimizing NASCAR?
      • PS3 version relied on components being run in parallel
      • And all components in a group being able to be run in parallel
      • Costly groups were made to use the SPUs
      • Knowing relationship between data was key
      • Well documented code made life so much easier!
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      • Codeplay was asked by Eutechnyx to optimize the AI
      • Very tight deadlines, 1 month to reduce time taken in AI
      • No main thread time left – have to use the SPUs
      • Our Offload compiler technology crucial
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      • For those unfamiliar with coding for the SPU…
      • They are amazingly fast, if you code correctly for them
      • Normally requires total rewrite of existing codebase
      • Painful to access global variables
      • Virtual functions are a complete write off
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      • SPU development typically takes many months
      • Common to have 4-5 SPU programmers for ~10 months
      • Not feasible for late-in-cycle development
      • Offload aims to mitigate the issues with getting code onto SPU
      • Can offload code to SPU much quicker (typically a few man days)
      • Much easier to move existing code bases to SPU
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      __blockingoffload()
      {
      // do some work on SPU, PPU waits for completion!
      };
      offloadThread_t handle = __offload()
      {
      // do some work on SPU!
      };
      // can do some work on PPU before waiting for SPU
      offloadThreadJoin(handle);
      • Small language extension moves work from PPU to SPU
      • Any work within an offload block is performed on the SPU
      • All PPU code is duplicated for the SPU
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      intaGlobalVariable;
      __blockingoffload()
      {
      intaLocalVariable = aGlobalVariable;
      };
      • Offload allows access to global variables
      • Just use them as normal!
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      structFoo { virtual void bar() {} };
      __blockingoffload[Foo::bar this]()
      {
      Foo foo;
      foo.bar();
      };
      • Offload allows virtual function calls too
      • Just have to specify which virtual functions may be called
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      Stay Behind Other Car
      Driving Controllers
      Obstacle Detection
      Stay Beside Other Car
      • First, profiled the AI during a typical race
      • Four components taking most of the frame time
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      Stay Behind Other Car
      Driving Controllers
      Obstacle Detection
      Stay Beside Other Car
      • Used four slightly different strategies when multithreading
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      Obstacle Detection
      Obstacle Detection only component in its group
      • Very inefficient code for the SPU, but moved 1/3 onto 4 SPUs
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      Stay Behind Other Car
      Stay Beside Other Car
      Looked at Stay Behind/Beside Other Car together
      In the same group, can be run in parallel
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      Stay Behind Other Car
      Stay Beside Other Car
      Moved Stay Behind component to SPU
      Stay Beside component would continue to be run on PPU
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      Stay Behind Other Car
      Stay Beside Other Car
      As long as SPU work was less time than the PPU work, no cost!
      Effectively ‘hid’ the cost of calculating Stay Behind component
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      Driving Controllers
      • Lastly, driving controllers took 1/3 of AI cost alone
      Split the cars across 4 SPUs, and ran in parallel
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      • In total ~170 source code changed
      • Changes were purely optimization
      AIObstacle** obstacles;
      unsigned intnumObstacles;
      offloadThread_thandle = __offload(obstacles, numObstacles)
      {
      for(unsigned int i = 0; i < numObstacles; i++)
      {
      AIObstacle* obstacle = obstacles[i];
      // use obstacle for calculations
      }
      };
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      • In total ~170 source code changed
      • Changes were purely optimization
      // array of AIObstacle* ’s on main memory
      AIObstacle** obstacles;
      unsigned intnumObstacles;
      offloadThread_thandle = __offload(obstacles, numObstacles)
      {
      for(unsigned int i = 0; i < numObstacles; i++)
      {
      // AIObstacle* points to main memory
      AIObstacle* obstacle = obstacles[i];
      // use obstacle for calculations
      }
      };
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • How did we make use of the PS3's unique hardware?
      • In total ~170 source code changed
      • Changes were purely optimization
      // array of AIObstacle* ’s on main memory
      AIObstacle** obstacles;
      unsigned intnumObstacles;
      offloadThread_thandle = __offload(obstacles, numObstacles)
      {
      CachedPointer<AIObstacle*>
      innerObstacles(obstacles, numObstacles);
      for(unsigned int i = 0; i < numObstacles; i++)
      {
      // AIObstacle* points to main memory
      CachedPointer<AIObstacle> obstacle(innerObstacles[i]);
      // use obstacle for calculations
      }
      };
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What performance improvement was achieved?
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What performance improvement was achieved?
      Obstacle Detection
      • Obstacle detection went from 2ms -> 1.1ms
      • ~100 lines of source code changed
      • 2½ weeks development time
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What performance improvement was achieved?
      Obstacle Detection
      • Obstacle detection went from 2ms -> 1.1ms
      • ~100 lines of source code changed
      • 2½ weeks development time
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What performance improvement was achieved?
      Stay Behind Other Car
      Stay Beside Other Car
      • Stay Behind went from 1.1ms -> 0ms (hidden behind other)
      • ~50 lines of source code changed
      • 1 week development time
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What performance improvement was achieved?
      Stay Behind Other Car
      Stay Beside Other Car
      • Stay Behind went from 1.1ms -> 0ms (hidden behind other)
      • ~50 lines of source code changed
      • 1 week development time
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What performance improvement was achieved?
      Driving Controllers
      • Driving Controllers went from 4ms -> 0.6ms
      • ~20 lines of source code changed
      • 8 hours development time
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What performance improvement was achieved?
      Driving Controllers
      • Driving Controllers went from 4ms -> 0.6ms
      • ~20 lines of source code changed
      • 8 hours development time
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • What performance improvement was achieved?
      • Performance speaks for itself!
      • 50% speed improvement on PS3
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Takeaway
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Takeaway
      • It is possible to parallelise late in development
      • But need code ready to be parallelised
      • Small changes in coding style lead to hugely better results
      • Better to plan systems from beginning with multicore in mind
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011
    • Questions?
      Can also catch me on twitter @sheredom
      Neil Henning
      neil@codeplay.com
      Paris Game AI Conference 2011