A session by Dr Ganapathi Pulipaka, Chief Data Scientist, Accenture on the topic of 'Deep Reinforcement Leaning In Machine Learning' at InterCon USA 2019, held at Caesars Palace, Las Vegas on 18-20 June, 2019.
2. Deep Reinforcement Learning in Machine Learning
Though, the term artificial intelligence has been around from
1950s, there has been a major shift towards machine learning
from late 1990s through early 2002. The rise of the popularity
in the reinforcement learning has begun from 2000s and has
been the most promising algorithmic technique on the
landscape of artificial intelligence in the recent years. 1980s
have seen the knowledge-based systems trying to power the
machines with common sense and knowledge. There seemed
to be no end program the number of rules to power the
knowledge-based systems. It not significantly increased the
costs to power the systems through such knowledge-based
rules, it also slowed down the efforts and ability to re-create
the common sense in the machines. The trend shifted towards
the machine learning to avoid encoding millions of rules and
embed these into the machine. Machine learning learns the
rules from a pile of data automatically from the machines
through programming. The industries have shifted their focus
onto machine learning and abandoned the knowledge-based
systems. Through the 2000s, the AI researchers have started
implemented a number of machine learning algorithms
through Bayesian networks, bioinspired algorithms through
evolutionary algorithms, markov methods, and support vector
machines. The neural networks have shot to fame in 2012
with the introduction of deep learning technique with a
number of neural networks.
3. Deep Reinforcement Learning in Machine Learning
The third and final shift in the artificial
intelligence research community has been the
reinforcement learning technique. Moving away
from feeding the machines with labeled-data
through the supervised learning, the research
community has ignited the world by powering the
neural networks through rewards, actions, states,
policies, value, and action. With the advent of
DeepMind’s AlphaGo in 2015 totally trained with
reinforcement learning algorithm. It has defeated
the world champion of ancient game Go. AlphaGo
leverages the value networks to determine the
board positions in the Go and leverages policy
networks for selecting each move. A number of
Monte Carlo tree search programs have been
implemented that can simulate thousands of
moves without any historical datasets. DeepMind
has developed a special search algorithm that can
achieve a 99.8% winning rate against the
opponent programs and defeated European Go
champion with 5-0 and other human professional
players as well.
4. Deep Reinforcement Learning in Machine Learning
D Silver et al. Nature 529, 484–489 (2016) doi:10.1038/nature16961
DeepMind’s AlphaGo neural network training pipeline and reinforcement learning architecture
5. Deep Reinforcement Learning in Machine Learning
The Future of Reinforcement Learning
MIT Technology Review has downloaded 16,625
research papers from arxiv that are publicly available
under the computer science and artificial intelligence
section through November, 2018. Through natural
language processing technique on the abstracts the
words constraint, theory, rule, logic, program, learning,
network, data, task, and performance have been
evaluated to find the reinforcement learning boom in
the recent times. The trends have shown the rise of
the traditional neural networks in 1950s and 1960s,
symbolic approaches in 1970s, the knowledge-based
and rule-based systems in 1980s, support vector
machines in 1990s, and the reign of neural networks
was back in 2010s with the advent of heavy
implementation of deep neural networks.
6. Deep Reinforcement Learning in Machine Learning
Deep Traffic - Reinforcement Learning
Deep Traffic is a reinforcement learning simulation based on the
24K entries received on MIT Deep Traffic competition on self-
driving cars that drive on a multi-lane freeway with a model-free
off-policy reinforcement learning process that inspires a number
of data scientists and machine learning enthusiasts to evaluate
the Deep-Q-Learning reinforcement learning network variants
and hyperparameter configurations with episodic iterations
training of 96.6 years of RL simulations, 572.2 million
crowdsourced and optimized DQN hyperparameters to train the
agents successfully. Deep Reinforcement Learning also has
shown the promising future with physics engine for model-
based control in MuJoCo environment. It has also shown
significant advancements in the Arcade gaming environment
and Atari gaming environments of DeepMind. It’s implemented
completely in JavaScript.
7. Deep Reinforcement Learning in Machine Learning
Markov Decision Processes
A number of reinforcement learning algorithms can be
applied in the field of robotics such as policy optimization,
model-free reinforcement learning, policy gradients with
trust region policy optimization, proximal policy
optimization, bootstrapping, Monte Carlo methods, actor-
critic methods, on-policy (SARSA), off-policy (Q-Learning),
Deep-Q-Network, Markov decision processes, and
dynamic programming. Majority of the function
approximations are built on the mathematical foundations
based on the Markov decision processes with optimal
state and Q-value functions that operate on the state and
action pairs. In Atari games, the illustration here also
would depict the past frames state representation. In
Markov decision processes, an infinite horizon is
discounted as (S,A,P,R,γ,d0), where
S – Finite state space
A – Finite action space
P – S×A→∆(S) Transition function
R:S ×A →∆([0,Rmax]) -> Reward function
γ∈[0,1) -> Discount factor
d0∈∆(S)is the initial state distribution
8. Deep Reinforcement Learning in Machine Learning
Atari Game Zoo
Deep reinforcement learning agents have not only made
significant progress in the field of robotics, but in many
instances have superseded the performance of humans in
the benchmarks such as Atari 2600 games and Dota 2.
Uber also has applied the reinforcement learning
algorithms in improving Uber Eats recommendations and
self-driving cars. Uber has built Atari Game Zoo based on
the Atari Learning Environment (ALE) Atari 2600 on Atari
gaming console for games such as SeaQuest,
Montezuma’s Revenge or Pitfall. Though, the objective of
creating such Atari Zoo is not to make comparisons of
high-scoring solutions and hyperparameter optimization
configurations among multiple algorithms. For example
the evolutionary algorithms from OpenAI gym have
shown different type of learning representations than the
gradient-methods.
9. Deep Reinforcement Learning in Machine Learning
The machine intelligence of algorithms is now distributed in a cloud-computing environment and will aid the
organizations in future to discover valuable insights and perform several operations through APIs. Organizations
are mass-manufacturing algorithms since it meets economies of scale in a distributed environment. Artificial
intelligence is the new inferno for powering AI winter (that lasted from 1990s through 2010s) with the machine
intelligence platforms through machine learning to rapidly prototype and deploy in production from sandboxes.
Figure: Wang, H., & Raj, B. (2017). On the Origin of Deep Learning
10. Deep Reinforcement Learning in Machine Learning
Intel optimized deep learning and machine learning frameworks
.
Figure: Intel Deep Learning and Machine Learning Frameworks (Alberto,
2016).
11. Deep Reinforcement Learning in Machine Learning
.
Figure: Nvidia deep learning frameworks with DGX (Nvidia, 2016).
13. Deep Reinforcement Learning in Machine Learning
.
Figure: . Deep Water: Open source deep learning framework (H2O.AI, 2017).
14. Deep Reinforcement Learning in Machine Learning
MXNet Deep Learning Framework
.
Figure: . MXNet for deep learning (DMLC, 2017).
15. Deep Reinforcement Learning in Machine Learning
.
References
Fridman, L. (2019). Tutorials, assignments, and competitions for MIT Deep Learning related
courses. Retrieved from https://github.com/lexfridman/mit-deep-learning
Fridman, L., Terwilliger, J., & Jenik, B. (2018). DeepTraffic: Crowdsourced Hyperparameter Tuning
of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation. Retrieved from
https://arxiv.org/abs/1801.02805
Hao, K. (2019, January 25). We analyzed 16,625 papers to figure out where AI is headed next. MIT
Technology Review. Retrieved from https://www.technologyreview.com/s/612768/we-analyzed-
16625-papers-to-figure-out-where-ai-is-headed-next/
Jiang, N. (2019). On Value Functions and the Agent-Environment Boundary. Retrieved from
https://arxiv.org/pdf/1905.13341.pdf
Petroski, F., Madhavan, V., Liu, R., Wang, R., Li, Y., Clune, J., & Lehman, J. (2019). AI Creating a Zoo
of Atari-Playing Agents to Catalyze the Understanding of Deep Reinforcement Learning. Retrieved
from https://eng.uber.com/atari-zoo-deep-reinforcement-learning/
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. V., ... Grewe, D. (2016,
January 28). Mastering the game of Go with deep neural networks and tree search. Nature, 529,
484-489. Retrieved from https://www.nature.com/articles/nature16961