Project Malmo was initiated by Microsoft Research as a platform to use Minecraft as an AI testing framework. It’s not about playing the game, it’s about using Minecraft as an experimental AI platform.
In my talk I’ll take a look into reinforcement learning and how to solve individual problems with (deep) reinforcement learning, Minecraft, and project Malmo.
The participant will learn the basics of reinforcement learning. In addition, he gets an overview of Project Malmo and concrete examples of reinforcement learning.
https://www.mcubed.london/sessions/minecraft-reinforcement-learning/
4. Minecraft
Markus "Notch" Persson
Mojang AB
Best-selling PC game of all time
Exploration
Resource gathering
Crafting
Combat
Sandbox construction game
Creative + building aspects
Three-dimensional environment
6. Project Malmo
Open Source (Github)
Microsoft Research Lab
Based on
Minecraft / Minecraft Forge
Agents written in
Python, Lua, C++, C#,
Java, Torch, ALE*
Mission XML
WorldState
Send Command
*Arcade Learning Environment
7. “The Project Malmo platform is designed to
support a wide range of experimentation
needs and can support research in robotics,
computer vision, reinforcement learning,
planning, multi-agent systems, and related
areas”The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence
Project Malmo
12. Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto
(1998)
Reinforcement Learning
Cliff Walking Example
Reward:
-1 per move
100 blue field
-100 lava field
35. [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 0 0 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0]
[ L L -2 -1] [-2 -2 L -1]
[ L -2 -2 -2] [-2 -2 L L]
[ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
36. [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0]
[ L -1 -1 78] [-1 -1 -1 -1] [-1 0 0 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0]
[ L L -2 -1] [-2 -2 L -1]
[ L -2 -2 -2] [-2 -2 L L]
[ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
37. [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1]
[ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1]
[ L L -2 -2] [-2 -3 L -2]
[ L -2 -3 -2] [-3 -2 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-2 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
38. [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1]
[ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1]
[ L L -2 48] [-2 -3 L -2]
[ L -2 -3 -2] [-3 -2 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
39. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 -2] [-3 -3 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
40. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
41. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 -3 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
42. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-4 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
43. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
44. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
45. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
ALPHA = 1.0 GAMMA = 0.8
46. [99 48 0 L] [48 0 0 0] [-1 0 L 0]
[ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1]
[ L -1 -1 -1] [-1 -1 -1 92] [-1 L L -1]
[ L L -2 -1] [-2 -2 L 83]
[ L -3 -3 74] [-2 -4 L L]
[ L -5 -2 L] [-4 -4 -4 55] [-4 -4 L -4]
[ L L -1 L] [-6 L 11 -5] [-5 L -5 31] [-5 L L -4]
ALPHA = 0.5 GAMMA = 1.0 (40 moves)
47. [99 48 0 L] [48 0 0 0] [-1 0 L 0]
[ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1]
[ L -1 -1 47] [-2 -1 -1 95] [-1 L L -1]
[ L L -2 -1] [-2 45 L 94]
[ L -3 -3 93] [-2 -4 L L]
[ L -5 -2 L] [-4 -4 -4 92] [-4 -4 L -4]
[ L L 88 L] [-6 L 90 -5] [-5 L -5 91] [-5 L L -4]
ALPHA = 0.5 GAMMA = 1.0 (60 moves)
60. Links
The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint
Conference on Artificial Intelligence http://www.ijcai.org/Proceedings/2016
Project Malmo https://www.microsoft.com/en-us/research/project/project-malmo/
Project Malmo (Github) https://github.com/Microsoft/malmo
Reinforcement Learning: An Introduction - ISBN-13: 978-0262193986
2nd Version online
YouTube RL Course by David Silver