Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019 Technical Sessions

374 views

Published on

Review state-of-the-art techniques that use neural networks to synthesize motion, such as mode-adaptive neural network and phase-functioned neural networks. See how next-generation CPUs with reinforcement learning can offer better performance.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019 Technical Sessions

  1. 1. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
  2. 2. Bringing Intelligent Motion Using Reinforcement Learning On Intel Client Manuj Sabharwal, Yaz Khabiri
  3. 3. Agenda 3 Ø Overview of Reinforcement Learning (RL) Ø Reinforcement Learning in Gaming Ø Training RL Algorithms Ø Intelligent Motion Use case Ø Performance Optimization on Intel® CPU Ø Inference RL Algorithms Ø Understanding Motion models Ø Using DirectML* to leverage Intel GPUs Ø Summary
  4. 4. Overview of Machine Learning 4 4 m Machine Learning Supervised Unsupervised Reinforcement Data; labels à Class Task driven Data à Cluster State à Action Learn from mistake
  5. 5. Successes Of Reinforcement Learning SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
  6. 6. High-Level Reinforcement Learning Overview SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Agent gets state (s) from environment Agent takes action (a) using policy (π) Agent receives reward (r) Goal: Maximize large future reward return (R) https://unity3d.com/machine-learning
  7. 7. Examples Of RL Algorithms SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Actor-Critic algorithms (model based learning)* • Reduce variance of policy gradient using the actor (the policy) and critic (value function) • Value Based • Q-Learning • Find best action under current state • Policy based • Trust Region Policy Optimization • Generalized Advantage estimation http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_3_rl_intro.pdf
  8. 8. Brain behind Algorithms SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Value Functions • How much reward a state or an action by prediction of total future reward (return) • Policy Methods • Find the best action directly • Optimize policy (behavior) directly • Vanilla Policy Gradients • For every episode with positive reward use gradient to increase probability of future actions • Improved Policy Gradients • Multiple gradient steps per episode
  9. 9. Popular Path To Bring Machine Learning In Games • Microsoft* • DirectML (DML) framework • Ubisoft* – LaForge • Bringing research into industry • Access to game engines and data • Unity* • First party support via ML-Agents • Interface between research and gaming • DML backend coming soon
  10. 10. Motion With Reinforcement Learning SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Understanding path or motion planning problem is crucial in unstructured environment • Data driven input in combination of physics based animation character to create smooth and robust animation • RL offers a convenient framework for learning different strategies without mountain of data • Solves generalization problems by path or motion planning Deep Q-Networks : Volodymyr Mnih, Deep RL Bootcamp, Berkeley, DeepMind*
  11. 11. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Q-learning (Q) : State × Action → Result, if we were to take an action in a given state, then we could easily construct a policy that maximizes our rewards: • A = argmax Q (s,a) • Neural network helps to resemble Q as it can calculate universal function approximators • Q(s,a)=r+γQa’(sʹ,aʹ)) Equations to framework (e.g. Q-Learning à DQN Learning) Layer-1 Layer-3Layer-2state Q(s,n) conv conv conv FC FC Q Values Straight Left Right Activation function Activation function Activation function
  12. 12. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Evaluating Motion Algorithms On Intel® Core Processors https://github.com/xbpeng/DeepMimic 0 500 1000 1500 2000 2500 3000 3500 5 10 15 20 25 30 35 40 45 50 55 60 Minutes MillionIterations TensorFlow Baseline ~52hours of training on 8Core platform ~52hours to train on CPU à Can we do better? Testing by Intel as of June 28th , 2019 Intel® i9-9900k, 95W TDP, 8C16T; Frequency : 4.3Ghz, Turbo Enabled Graphics: NVIDIA* GTX 2080, Memory: 4x8GB@2133Mhz, Storage: Intel SSD 545 Series 240GB, OS: Windows* 10 RS5 BIOS build: CFLSFX1.R00.X151B01. All data is collected with Tensorflow* 1.12 and DeepMimic branch dates June 28th 2019
  13. 13. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Analyzing Software Stack ~20% of actual time is spend in compute and rest are overhead Intel® VTune™ Amplifier XE Actual compute Inefficiency due to spins
  14. 14. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Optimizing the Software Stack - 1 ØRe-evaluating libraries included in software stack for DeepMimic • Recompiling Tensorflow* with Intel® MKLDNN bazel --output_base=output_dir build --config=mkl --config=opt //tensorflow/tools/pip_package:build_pip_package python -c "import tensorflow; print(tensorflow.pywrap_tensorflow.IsMklEnabled())“ à Result : True • Evaluate different threading parameters to reduce spin time import tensorflow # this sets KMP_BLOCKTIME and OMP_PROC_BIND import os # delete the existing values del os.environ['OMP_PROC_BIND’] del os.environ['KMP_BLOCKTIME’] ØMoving Python installation à Optimize Intel Python libraries • Simple optimizations by moving numpy libraries to more efficient Intel Numpy libraries
  15. 15. Optimizing the Software Stack - 2 SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST ØOptimizing math libraries to use FP32 datatype and parallelism instead of double precision and scalar code • Mapping libraries from Eigen scaler to Eigen with MKL Compiling EIGEN with MKL and Bullet3 (Physics SDK : real-time collision library) to use AVX2 code path
  16. 16. Optimization Results SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Baseline After Optimizations Putting CPUs to Work • Application is able to train with acceptable compute instead of spinning • Most of spinning from OpenMP and threading is removed due to Tensorflow with MKLDNN • Eigen MKL library in DeepMimic Core is able to take advantage of intrinsic code
  17. 17. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Optimizing training is first step for deployment • Correct libraries and datatype is important for deep learning training performance Training Result with Optimized Stack Reducing training time by 2.6x by enabling multithreading and using MKLDNN instead of Eigen à 50hours to 19hours 0 1000 2000 3000 4000 5 10 15 20 25 30 35 40 45 50 55 60 MINUTES ITERATIONS (MILLIONS) Timing After Optimizations TensorFlow - Baseline TensorFlow- MKLDNN Tensorflow+MKLDNN+EIGEN Libs 2.6x better training performance
  18. 18. Take-away Use of optimization libraries to train machine learning algorithms help to boost performance and reduce training time SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
  19. 19. Bringing Motion to Production SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
  20. 20. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Understanding inference model Training checkpoint Inference Model How can developer read?
  21. 21. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Unity® ML Agents Bridging Gap between Research and Game integration
  22. 22. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Overview : Unity ML-Agents Unity Environment Agent Collect Observations Agent Action Vector Action Brain Academy Unity Inference Engine DirectML CS CPU
  23. 23. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Goal: Puppy runs for bone • Agent: Corgi • About 50 float32 inputs • Three hidden layers of 512 nodes • About 20 float output Puppo Motion Using Unity ML Agent
  24. 24. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Analyzing inference performance à 1 Agent No Meta command : 1.8 seconds/inference Meta command : 0.8 seconds/inference https://devblogs.microsoft.com/pix/download/ Execution time reduced by 2x with meta commands on kernel level
  25. 25. Microsoft® PIX Tool – Benefits of using Meta Commands SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST 3.064msec 1.364msec More the Agents à Better performance with Metacommands
  26. 26. Results SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST 0.00 0.50 1.00 1.50 2.00 2.50 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 1 Agent 10 Agent 50 Agent GAIN(%) MSEC SCALING WITH Multiple AGENTS Computer Shader Metacommands Gain Lower is better Metacommands gives significant boost in performance by leveraging Intel® Graphics driver optimizations
  27. 27. Intel® Graphics Performance Analyzer (GPA) DX12 Profiling Preview SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST DX12 DirectML profiling in Intel® GPA
  28. 28. Summary SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST • Tensorflow with Intel® MKLDNN build is now available on Windows • Leveraging new instruction set on Intel® Xeon™ and Core™ Processors • Performance boost on training as Reinforcement learning use cases are CPU favorable • Using optimized pre-post libraries gives E2E performance boost • DirectML from Microsoft leverages metacommands which gives good boost in performance for game + deep learning infused workloads
  29. 29. References SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST Tensorflow https://www.tensorflow.org/ Tensorflow Optimization guide https://software.intel.com/en-us/articles/intel- optimization-for-tensorflow-installation-guide DeepMimic https://github.com/xbpeng/DeepMimic/tree/master/learning AI4Animation https://github.com/xbpeng/DeepMimic/tree/master/learning Unity-ML Agents https://github.com/Unity-Technologies/ml-agents RL beginner guide https://skymind.ai/wiki/deep-reinforcement-learning Gym https://gym.openai.com/ Ubisoft https://montreal.ubisoft.com/en/our-engagements/research-and- development/ Intel® GPA - https://software.intel.com/en-us/gpa
  30. 30. • Subtitle Copy Goes Here

×