SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Bringing Intelligent Motion Using
Reinforcement Learning On Intel Client
Manuj Sabharwal, Yaz Khabiri
Agenda
3
Ø Overview of Reinforcement Learning (RL)
Ø Reinforcement Learning in Gaming
Ø Training RL Algorithms
Ø Intelligent Motion Use case
Ø Performance Optimization on Intel® CPU
Ø Inference RL Algorithms
Ø Understanding Motion models
Ø Using DirectML* to leverage Intel GPUs
Ø Summary
Overview of Machine Learning
4
4
m
Machine Learning
Supervised Unsupervised Reinforcement
Data; labels à Class
Task driven
Data à Cluster State à Action
Learn from mistake
Successes Of
Reinforcement
Learning
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
High-Level Reinforcement Learning Overview
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Agent gets state (s) from environment
Agent takes action (a) using policy (π)
Agent receives reward (r)
Goal: Maximize large future reward return (R)
https://unity3d.com/machine-learning
Examples Of RL Algorithms
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Actor-Critic algorithms (model based learning)*
• Reduce variance of policy gradient using the actor
(the policy) and critic (value function)
• Value Based
• Q-Learning
• Find best action under current state
• Policy based
• Trust Region Policy Optimization
• Generalized Advantage estimation
http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_3_rl_intro.pdf
Brain behind Algorithms
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Value Functions
• How much reward a state or an action by prediction of total future
reward (return)
• Policy Methods
• Find the best action directly
• Optimize policy (behavior) directly
• Vanilla Policy Gradients
• For every episode with positive reward use gradient to increase
probability of future actions
• Improved Policy Gradients
• Multiple gradient steps per episode
Popular Path
To Bring
Machine
Learning In
Games
• Microsoft*
• DirectML (DML) framework
• Ubisoft* – LaForge
• Bringing research into industry
• Access to game engines and data
• Unity*
• First party support via ML-Agents
• Interface between research and gaming
• DML backend coming soon
Motion With Reinforcement Learning
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Understanding path or motion planning problem is crucial in
unstructured environment
• Data driven input in combination of physics based animation character to create
smooth and robust animation
• RL offers a convenient framework for learning different strategies without
mountain of data
• Solves generalization problems by path or motion planning
Deep Q-Networks : Volodymyr Mnih, Deep RL Bootcamp, Berkeley, DeepMind*
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Q-learning (Q) : State × Action → Result, if we were to take an action in a given
state, then we could easily construct a policy that maximizes our rewards:
• A = argmax Q (s,a)
• Neural network helps to resemble Q as it can calculate universal function approximators
• Q(s,a)=r+γQa’(sʹ,aʹ))
Equations to framework
(e.g. Q-Learning à DQN Learning)
Layer-1 Layer-3Layer-2state Q(s,n)
conv conv conv FC FC
Q Values
Straight
Left
Right
Activation
function
Activation
function
Activation
function
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Evaluating Motion Algorithms On Intel® Core Processors
https://github.com/xbpeng/DeepMimic
0 500 1000 1500 2000 2500 3000 3500
5
10
15
20
25
30
35
40
45
50
55
60
Minutes
MillionIterations
TensorFlow Baseline
~52hours of training on
8Core platform
~52hours to train on CPU à Can we do better?
Testing by Intel as of June 28th , 2019 Intel® i9-9900k, 95W TDP, 8C16T; Frequency : 4.3Ghz, Turbo Enabled Graphics: NVIDIA* GTX 2080, Memory: 4x8GB@2133Mhz, Storage: Intel SSD 545 Series 240GB, OS: Windows* 10 RS5
BIOS build: CFLSFX1.R00.X151B01. All data is collected with Tensorflow* 1.12 and DeepMimic branch dates June 28th 2019
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Analyzing Software Stack
~20% of actual time is spend in compute and rest are overhead
Intel® VTune™ Amplifier XE
Actual compute
Inefficiency due to spins
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Optimizing the Software Stack - 1
ØRe-evaluating libraries included in software stack for DeepMimic
• Recompiling Tensorflow* with Intel® MKLDNN
bazel --output_base=output_dir build --config=mkl --config=opt
//tensorflow/tools/pip_package:build_pip_package
python -c "import tensorflow; print(tensorflow.pywrap_tensorflow.IsMklEnabled())“ à Result : True
• Evaluate different threading parameters to reduce spin time
import tensorflow # this sets KMP_BLOCKTIME and OMP_PROC_BIND
import os # delete the existing values
del os.environ['OMP_PROC_BIND’]
del os.environ['KMP_BLOCKTIME’]
ØMoving Python installation à Optimize Intel Python libraries
• Simple optimizations by moving numpy libraries to more efficient Intel
Numpy libraries
Optimizing the Software Stack - 2
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
ØOptimizing math libraries to use FP32 datatype and parallelism instead of
double precision and scalar code
• Mapping libraries from Eigen scaler to Eigen with MKL
Compiling EIGEN with MKL and Bullet3 (Physics SDK : real-time collision library) to use
AVX2 code path
Optimization Results
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Baseline After Optimizations
Putting CPUs to Work
• Application is able to train with acceptable compute instead of spinning
• Most of spinning from OpenMP and threading is removed due to Tensorflow with MKLDNN
• Eigen MKL library in DeepMimic Core is able to take advantage of intrinsic code
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Optimizing training is first step for deployment
• Correct libraries and datatype is important for deep learning training
performance
Training Result with Optimized Stack
Reducing training time by 2.6x by enabling multithreading and using MKLDNN instead of Eigen à 50hours to 19hours
0
1000
2000
3000
4000
5 10 15 20 25 30 35 40 45 50 55 60
MINUTES
ITERATIONS (MILLIONS)
Timing After Optimizations
TensorFlow - Baseline TensorFlow- MKLDNN Tensorflow+MKLDNN+EIGEN Libs
2.6x better training performance
Take-away
Use of optimization libraries to train machine
learning algorithms help to boost
performance and reduce training time
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Bringing Motion to Production
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Understanding inference model
Training checkpoint
Inference Model
How can developer read?
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Unity® ML Agents
Bridging Gap between Research and Game integration
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Overview : Unity ML-Agents
Unity
Environment
Agent
Collect
Observations
Agent Action
Vector Action
Brain
Academy
Unity Inference Engine
DirectML CS CPU
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Goal: Puppy runs for bone
• Agent: Corgi
• About 50 float32 inputs
• Three hidden layers of 512 nodes
• About 20 float output
Puppo Motion Using Unity ML Agent
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Analyzing inference performance à 1 Agent
No Meta command : 1.8 seconds/inference
Meta command : 0.8 seconds/inference
https://devblogs.microsoft.com/pix/download/
Execution time reduced by 2x with meta commands on kernel level
Microsoft® PIX Tool – Benefits of using Meta Commands
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
3.064msec
1.364msec
More the Agents à Better performance with Metacommands
Results
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
0.00
0.50
1.00
1.50
2.00
2.50
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 Agent 10 Agent 50 Agent
GAIN(%)
MSEC
SCALING WITH Multiple AGENTS
Computer Shader Metacommands Gain
Lower is better
Metacommands gives significant boost in performance by leveraging Intel® Graphics
driver optimizations
Intel® Graphics Performance Analyzer (GPA) DX12 Profiling
Preview
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
DX12 DirectML profiling in Intel® GPA
Summary
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
• Tensorflow with Intel® MKLDNN build is now available on Windows
• Leveraging new instruction set on Intel® Xeon™ and Core™ Processors
• Performance boost on training as Reinforcement learning use cases are
CPU favorable
• Using optimized pre-post libraries gives E2E performance boost
• DirectML from Microsoft leverages metacommands which gives good boost
in performance for game + deep learning infused workloads
References
SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Tensorflow https://www.tensorflow.org/
Tensorflow Optimization guide https://software.intel.com/en-us/articles/intel-
optimization-for-tensorflow-installation-guide
DeepMimic https://github.com/xbpeng/DeepMimic/tree/master/learning
AI4Animation https://github.com/xbpeng/DeepMimic/tree/master/learning
Unity-ML Agents https://github.com/Unity-Technologies/ml-agents
RL beginner guide https://skymind.ai/wiki/deep-reinforcement-learning
Gym https://gym.openai.com/
Ubisoft https://montreal.ubisoft.com/en/our-engagements/research-and-
development/
Intel® GPA - https://software.intel.com/en-us/gpa
• Subtitle Copy Goes Here

Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019 Technical Sessions

  • 1.
    SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST
  • 2.
    Bringing Intelligent MotionUsing Reinforcement Learning On Intel Client Manuj Sabharwal, Yaz Khabiri
  • 3.
    Agenda 3 Ø Overview ofReinforcement Learning (RL) Ø Reinforcement Learning in Gaming Ø Training RL Algorithms Ø Intelligent Motion Use case Ø Performance Optimization on Intel® CPU Ø Inference RL Algorithms Ø Understanding Motion models Ø Using DirectML* to leverage Intel GPUs Ø Summary
  • 4.
    Overview of MachineLearning 4 4 m Machine Learning Supervised Unsupervised Reinforcement Data; labels à Class Task driven Data à Cluster State à Action Learn from mistake
  • 5.
    Successes Of Reinforcement Learning SIGGRAPH 2019| LOS ANGLES | 28 JULY - 1 AUGUST
  • 6.
    High-Level Reinforcement LearningOverview SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Agent gets state (s) from environment Agent takes action (a) using policy (π) Agent receives reward (r) Goal: Maximize large future reward return (R) https://unity3d.com/machine-learning
  • 7.
    Examples Of RLAlgorithms SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Actor-Critic algorithms (model based learning)* • Reduce variance of policy gradient using the actor (the policy) and critic (value function) • Value Based • Q-Learning • Find best action under current state • Policy based • Trust Region Policy Optimization • Generalized Advantage estimation http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_3_rl_intro.pdf
  • 8.
    Brain behind Algorithms SIGGRAPH2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Value Functions • How much reward a state or an action by prediction of total future reward (return) • Policy Methods • Find the best action directly • Optimize policy (behavior) directly • Vanilla Policy Gradients • For every episode with positive reward use gradient to increase probability of future actions • Improved Policy Gradients • Multiple gradient steps per episode
  • 9.
    Popular Path To Bring Machine LearningIn Games • Microsoft* • DirectML (DML) framework • Ubisoft* – LaForge • Bringing research into industry • Access to game engines and data • Unity* • First party support via ML-Agents • Interface between research and gaming • DML backend coming soon
  • 10.
    Motion With ReinforcementLearning SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Understanding path or motion planning problem is crucial in unstructured environment • Data driven input in combination of physics based animation character to create smooth and robust animation • RL offers a convenient framework for learning different strategies without mountain of data • Solves generalization problems by path or motion planning Deep Q-Networks : Volodymyr Mnih, Deep RL Bootcamp, Berkeley, DeepMind*
  • 11.
    SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST • Q-learning (Q) : State × Action → Result, if we were to take an action in a given state, then we could easily construct a policy that maximizes our rewards: • A = argmax Q (s,a) • Neural network helps to resemble Q as it can calculate universal function approximators • Q(s,a)=r+γQa’(sʹ,aʹ)) Equations to framework (e.g. Q-Learning à DQN Learning) Layer-1 Layer-3Layer-2state Q(s,n) conv conv conv FC FC Q Values Straight Left Right Activation function Activation function Activation function
  • 12.
    SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST Evaluating Motion Algorithms On Intel® Core Processors https://github.com/xbpeng/DeepMimic 0 500 1000 1500 2000 2500 3000 3500 5 10 15 20 25 30 35 40 45 50 55 60 Minutes MillionIterations TensorFlow Baseline ~52hours of training on 8Core platform ~52hours to train on CPU à Can we do better? Testing by Intel as of June 28th , 2019 Intel® i9-9900k, 95W TDP, 8C16T; Frequency : 4.3Ghz, Turbo Enabled Graphics: NVIDIA* GTX 2080, Memory: 4x8GB@2133Mhz, Storage: Intel SSD 545 Series 240GB, OS: Windows* 10 RS5 BIOS build: CFLSFX1.R00.X151B01. All data is collected with Tensorflow* 1.12 and DeepMimic branch dates June 28th 2019
  • 13.
    SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST Analyzing Software Stack ~20% of actual time is spend in compute and rest are overhead Intel® VTune™ Amplifier XE Actual compute Inefficiency due to spins
  • 14.
    SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST Optimizing the Software Stack - 1 ØRe-evaluating libraries included in software stack for DeepMimic • Recompiling Tensorflow* with Intel® MKLDNN bazel --output_base=output_dir build --config=mkl --config=opt //tensorflow/tools/pip_package:build_pip_package python -c "import tensorflow; print(tensorflow.pywrap_tensorflow.IsMklEnabled())“ à Result : True • Evaluate different threading parameters to reduce spin time import tensorflow # this sets KMP_BLOCKTIME and OMP_PROC_BIND import os # delete the existing values del os.environ['OMP_PROC_BIND’] del os.environ['KMP_BLOCKTIME’] ØMoving Python installation à Optimize Intel Python libraries • Simple optimizations by moving numpy libraries to more efficient Intel Numpy libraries
  • 15.
    Optimizing the SoftwareStack - 2 SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST ØOptimizing math libraries to use FP32 datatype and parallelism instead of double precision and scalar code • Mapping libraries from Eigen scaler to Eigen with MKL Compiling EIGEN with MKL and Bullet3 (Physics SDK : real-time collision library) to use AVX2 code path
  • 16.
    Optimization Results SIGGRAPH 2019| LOS ANGLES | 28 JULY - 1 AUGUST Baseline After Optimizations Putting CPUs to Work • Application is able to train with acceptable compute instead of spinning • Most of spinning from OpenMP and threading is removed due to Tensorflow with MKLDNN • Eigen MKL library in DeepMimic Core is able to take advantage of intrinsic code
  • 17.
    SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST • Optimizing training is first step for deployment • Correct libraries and datatype is important for deep learning training performance Training Result with Optimized Stack Reducing training time by 2.6x by enabling multithreading and using MKLDNN instead of Eigen à 50hours to 19hours 0 1000 2000 3000 4000 5 10 15 20 25 30 35 40 45 50 55 60 MINUTES ITERATIONS (MILLIONS) Timing After Optimizations TensorFlow - Baseline TensorFlow- MKLDNN Tensorflow+MKLDNN+EIGEN Libs 2.6x better training performance
  • 18.
    Take-away Use of optimizationlibraries to train machine learning algorithms help to boost performance and reduce training time SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
  • 19.
    Bringing Motion toProduction SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
  • 20.
    SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST Understanding inference model Training checkpoint Inference Model How can developer read?
  • 21.
    SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST Unity® ML Agents Bridging Gap between Research and Game integration
  • 22.
    SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST Overview : Unity ML-Agents Unity Environment Agent Collect Observations Agent Action Vector Action Brain Academy Unity Inference Engine DirectML CS CPU
  • 23.
    SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST • Goal: Puppy runs for bone • Agent: Corgi • About 50 float32 inputs • Three hidden layers of 512 nodes • About 20 float output Puppo Motion Using Unity ML Agent
  • 24.
    SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST Analyzing inference performance à 1 Agent No Meta command : 1.8 seconds/inference Meta command : 0.8 seconds/inference https://devblogs.microsoft.com/pix/download/ Execution time reduced by 2x with meta commands on kernel level
  • 25.
    Microsoft® PIX Tool– Benefits of using Meta Commands SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST 3.064msec 1.364msec More the Agents à Better performance with Metacommands
  • 26.
    Results SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST 0.00 0.50 1.00 1.50 2.00 2.50 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 Agent 10 Agent 50 Agent GAIN(%) MSEC SCALING WITH Multiple AGENTS Computer Shader Metacommands Gain Lower is better Metacommands gives significant boost in performance by leveraging Intel® Graphics driver optimizations
  • 27.
    Intel® Graphics PerformanceAnalyzer (GPA) DX12 Profiling Preview SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST DX12 DirectML profiling in Intel® GPA
  • 28.
    Summary SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST • Tensorflow with Intel® MKLDNN build is now available on Windows • Leveraging new instruction set on Intel® Xeon™ and Core™ Processors • Performance boost on training as Reinforcement learning use cases are CPU favorable • Using optimized pre-post libraries gives E2E performance boost • DirectML from Microsoft leverages metacommands which gives good boost in performance for game + deep learning infused workloads
  • 29.
    References SIGGRAPH 2019 |LOS ANGLES | 28 JULY - 1 AUGUST Tensorflow https://www.tensorflow.org/ Tensorflow Optimization guide https://software.intel.com/en-us/articles/intel- optimization-for-tensorflow-installation-guide DeepMimic https://github.com/xbpeng/DeepMimic/tree/master/learning AI4Animation https://github.com/xbpeng/DeepMimic/tree/master/learning Unity-ML Agents https://github.com/Unity-Technologies/ml-agents RL beginner guide https://skymind.ai/wiki/deep-reinforcement-learning Gym https://gym.openai.com/ Ubisoft https://montreal.ubisoft.com/en/our-engagements/research-and- development/ Intel® GPA - https://software.intel.com/en-us/gpa
  • 30.