2. OpenAI - Introduction
● OpenAI is a non-profit artificial
intelligence (AI) research company.
● It aims to promote and develop
friendly AI in a way to benefit
humanity as a whole.
● It aims to "freely collaborate" by
making its patents and research open
to the public.
2
3. Latest in News
● OpenAI Five vs Dota 2
● Event was streamed online
● OpenAI Five is a set of five neural networks
3
5. ReInforcement Learning
● One of the most important type of Machine Learning,
● An agent learns how to behave in a environment by performing actions and
seeing the results.
5
6. ReInforcement Learning
There are two basic concepts in reinforcement learning:
1. Environment (namely, the outside world) and
2. Agent (namely, the algorithm you are writing).
The agent sends actions to the environment, and the environment replies with
observations and rewards (that is, a score).
6
7. Example : ReInforcement Learning
Imagine you’re a child in a living room.
Action1 : You see a fireplace, and you approach it. It’s warm
(Positive Reward +1).
Action 2: But when you try to touch the fire. It burns your hand
(Negative reward -1).
Learning 1 : fire is positive when you are a sufficient distance
away, because it produces warmth.
Learning 2 : But getting too close to it and you will be burned.
Image Source : https://medium.freecodecamp.org/an-introduction-to-reinforcement-learning-4339519de419
That’s how humans learn, through interaction.
Reinforcement Learning is just a computational approach of learning from action.
7
8. Reinforcement Learning
Actions influence the state, which determines reward.
Image Source : https://medium.freecodecamp.org/an-introduction-to-reinforcement-learning-4339519de419
8
9. Reinforcement Learning Process (Super Mario)
Let’s imagine an agent learning to play Super Mario.
The Reinforcement Learning (RL) process can be modeled as a loop that works
like this:
● Our Agent receives state S0 from the Environment (In our case we receive
the first frame of our game (state) from Super Mario (environment))
● Based on that state S0, agent takes an action A0 (our agent will move right)
● Environment transitions to a new state S1 (new frame)
● Environment gives some reward R1 to the agent (not dead: +1)
This RL loop outputs a sequence of state, action and reward.
9
10. The Reinforcement Learning process
● The goal of the agent is to maximize the expected cumulative reward.
● By running more and more loops, the agent will learn to play better and
better.
10
11. OpenAI Gym
● Gym is a toolkit for Researching
(developing and comparing)
reinforcement learning algorithms.
● It supports teaching agents everything
from walking to playing games like Pong
or Pinball.
● Gym Envs:
https://gym.openai.com/envs/#mujoco
Ant-v2
Make a 3D four-legged robot walk. 11
12. OpenAI Universe
● Platform for measuring and training an AGI across games, websites and other
applications.
● Makes it possible for any existing program to become an OpenAI Gym
environment, without needing special access to the program's internals,
source code, or APIs.
● It does this by packaging the program into a Docker container, and presenting
the AI with the same interface a human uses: sending keyboard and mouse
events, and receiving screen pixels.
● Contains over 1,000 environments in which an AI agent can take actions and
gather observations. 12
13. Command: Start Docker Container via Conda
● Conda is an open source package management system and environment
management system that runs on Windows, macOS and Linux.
● Conda quickly installs, runs and updates packages and their dependencies.
● Conda easily creates, saves, loads and switches between environments on
your local computer.
Command:
conda create --name universe-starter-agent python=3.5
source activate universe-starter-agent
13
15. Gym vs Universe
● OpenAI Universe is like a much bigger OpenAI Gym.
● OpenAI Gym’s got some basic tasks, like pole balancing, and pendulum
uprighting, and some more difficult ones like basic Atari games like Space
Invaders.
● like an enclosed world, or a “gym” to exercise and develop RL algorithms.
● OpenAI Universe has a much wider variety of tasks, and is more involved in
giving RL networks/algorithms the ability to interact with the real world:
playing games, using an actual (virtual) keyboard and mouse to interact with
buttons and sliders on webpages, etc.
● Universe is based on Gym
15
16. Use Cases
Environments for doing various tasks, like
● Sending an email,
● Doing some mouse clicking, keyboard events,
● More and more environments are being added
16