SlideShare a Scribd company logo
1 of 61
Download to read offline
#MinecraftRL @choas
Minecraft and Reinforcement Learning
Lars Gregori
@choas
labs.hybris.com
#MinecraftRL @choas
?
?
?
?
Minecraft
#MinecraftRL @choas
Minecraft
Markus "Notch" Persson
Mojang AB
Best-selling PC game of all time
Exploration
Resource gathering
Crafting
Combat
Sandbox construction game
Creative + building aspects
Three-dimensional environment
Project Malmo
#MinecraftRL @choas
Project Malmo
Open Source (Github)
Microsoft Research Lab
Based on
Minecraft / Minecraft Forge
Agents written in
Python, Lua, C++, C#,
Java, Torch, ALE*
Mission XML
WorldState
Send Command
*Arcade Learning Environment
#MinecraftRL @choas
“The Project Malmo platform is designed to
support a wide range of experimentation
needs and can support research in robotics,
computer vision, reinforcement learning,
planning, multi-agent systems, and related
areas”The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence
Project Malmo
Reinforcement Learning
#MinecraftRL @choas
Reinforcement Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
#MinecraftRL @choas
Reinforcement Learning
Observation Reward Action
Environment
Agent
#MinecraftRL @choas
“Reinforcement learning is like
trial-and-error learning.”David Silver
Reinforcement Learning
#MinecraftRL @choas
Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto

(1998)
Reinforcement Learning
Cliff Walking Example
Reward:
-1 per move
100 blue field
-100 lava field
#MinecraftRL @choas
Reinforcement Learning Demo
#MinecraftRL @choas
Q-Learning
#MinecraftRL @choas
Q-Learning
#MinecraftRL @choas
Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = q_table[prev_state][prev_action]
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
#MinecraftRL @choas
Q-Learning
#MinecraftRL @choas
Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = q_table[prev_state][prev_action]
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
#MinecraftRL @choas
Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
#MinecraftRL @choas
Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
#MinecraftRL @choas
Q-Learning
100 

-1
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = old_q + ALPHA * (99.0 + GAMMA * max_q - old_q)
#MinecraftRL @choas
Q-Learning
100 

-1
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)
#MinecraftRL @choas
Q-Learning
100 

-1
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)
#MinecraftRL @choas
Q-Learning
100 

-1
99.0
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = 99.0
#MinecraftRL @choas
Q-Learning
100 

-1
99.0
#MinecraftRL @choas
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = q_table[prev_state][prev_action]
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
#MinecraftRL @choas
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
#MinecraftRL @choas
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
#MinecraftRL @choas
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (-1.0 + GAMMA * max_q - old_q)
#MinecraftRL @choas
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (-1.0 + 0.8 * 99.0 - old_q)
#MinecraftRL @choas
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (-1.0 + 79.2 - old_q)
#MinecraftRL @choas
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = -1.0 + 1.0 * (-1.0 + 79.2 - -1.0)
#MinecraftRL @choas
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = -1.0 + 1.0 * (-1.0 + 79.2 + 1.0)
#MinecraftRL @choas
Q-Learning
100 

-1
99.0
78.2
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = 78.2
#MinecraftRL @choas
[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 0 0 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0]
[ L L -2 -1] [-2 -2 L -1]
[ L -2 -2 -2] [-2 -2 L L]
[ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
#MinecraftRL @choas
[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0]
[ L -1 -1 78] [-1 -1 -1 -1] [-1 0 0 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0]
[ L L -2 -1] [-2 -2 L -1]
[ L -2 -2 -2] [-2 -2 L L]
[ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
#MinecraftRL @choas
[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1]
[ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1]
[ L L -2 -2] [-2 -3 L -2]
[ L -2 -3 -2] [-3 -2 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-2 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
#MinecraftRL @choas
[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1]
[ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1]
[ L L -2 48] [-2 -3 L -2]
[ L -2 -3 -2] [-3 -2 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
#MinecraftRL @choas
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 -2] [-3 -3 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
#MinecraftRL @choas
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
#MinecraftRL @choas
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 -3 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
#MinecraftRL @choas
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-4 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
#MinecraftRL @choas
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
#MinecraftRL @choas
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
#MinecraftRL @choas
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
ALPHA = 1.0 GAMMA = 0.8
#MinecraftRL @choas
[99 48 0 L] [48 0 0 0] [-1 0 L 0]
[ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1]
[ L -1 -1 -1] [-1 -1 -1 92] [-1 L L -1]
[ L L -2 -1] [-2 -2 L 83]
[ L -3 -3 74] [-2 -4 L L]
[ L -5 -2 L] [-4 -4 -4 55] [-4 -4 L -4]
[ L L -1 L] [-6 L 11 -5] [-5 L -5 31] [-5 L L -4]
ALPHA = 0.5 GAMMA = 1.0 (40 moves)
#MinecraftRL @choas
[99 48 0 L] [48 0 0 0] [-1 0 L 0]
[ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1]
[ L -1 -1 47] [-2 -1 -1 95] [-1 L L -1]
[ L L -2 -1] [-2 45 L 94]
[ L -3 -3 93] [-2 -4 L L]
[ L -5 -2 L] [-4 -4 -4 92] [-4 -4 L -4]
[ L L 88 L] [-6 L 90 -5] [-5 L -5 91] [-5 L L -4]
ALPHA = 0.5 GAMMA = 1.0 (60 moves)
!
!
!
!
?
Deep Reinforcement Learning
#MinecraftRL @choas
Deep Reinforcement Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
#MinecraftRL @choas
Playing Atari with Deep Reinforcement Learning (arXiv:1312.5602)
https://youtu.be/TmPfTpjtdgg
#MinecraftRL @choas
#MinecraftRL @choas
#MinecraftRL @choas
12 Classes
#MinecraftRL @choas
### based on arXiv:1312.5602 (page 6)



model = Sequential()

model.add(Conv2D(16, (8, 8), strides=(4, 4), input_shape=input_shape))

model.add(Activation('relu'))

model.add(Conv2D(32, (4, 4), strides=(2, 2)))

model.add(Activation(‘relu'))
model.add(Flatten())

model.add(Dense(256))

model.add(Activation('relu'))
model.add(Dense(12, activation=‘sigmoid')) # 12 classes / actions
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
Keras Model
#MinecraftRL @choas
Deep Reinforcement Learning Demo
#MinecraftRL @choas
Take-away
Links
The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint
Conference on Artificial Intelligence http://www.ijcai.org/Proceedings/2016
Project Malmo https://www.microsoft.com/en-us/research/project/project-malmo/
Project Malmo (Github) https://github.com/Microsoft/malmo
Reinforcement Learning: An Introduction - ISBN-13: 978-0262193986

2nd Version online


YouTube RL Course by David Silver
#MinecraftRL @choas
Thank you. Hi Lars …
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components 

of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated
companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are 

set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release
any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/
or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information
in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various 

risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements,
and they should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) 

in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies. 

See http://global.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
© 2018 SAP SE or an SAP affiliate company. All rights reserved.

More Related Content

Similar to Minecraft and Reinforcement Learning

Datamining R 1st
Datamining R 1stDatamining R 1st
Datamining R 1stsesejun
 
Datamining r 1st
Datamining r 1stDatamining r 1st
Datamining r 1stsesejun
 
[DE] AI und Minecraft
[DE] AI und Minecraft[DE] AI und Minecraft
[DE] AI und MinecraftLars Gregori
 
Just in time (series) - KairosDB
Just in time (series) - KairosDBJust in time (series) - KairosDB
Just in time (series) - KairosDBVictor Anjos
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2Kevin Chun-Hsien Hsu
 
The Ring programming language version 1.7 book - Part 64 of 196
The Ring programming language version 1.7 book - Part 64 of 196The Ring programming language version 1.7 book - Part 64 of 196
The Ring programming language version 1.7 book - Part 64 of 196Mahmoud Samir Fayed
 
Datamining r 4th
Datamining r 4thDatamining r 4th
Datamining r 4thsesejun
 

Similar to Minecraft and Reinforcement Learning (8)

Datamining R 1st
Datamining R 1stDatamining R 1st
Datamining R 1st
 
Datamining r 1st
Datamining r 1stDatamining r 1st
Datamining r 1st
 
[DE] AI und Minecraft
[DE] AI und Minecraft[DE] AI und Minecraft
[DE] AI und Minecraft
 
Data types
Data typesData types
Data types
 
Just in time (series) - KairosDB
Just in time (series) - KairosDBJust in time (series) - KairosDB
Just in time (series) - KairosDB
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2
 
The Ring programming language version 1.7 book - Part 64 of 196
The Ring programming language version 1.7 book - Part 64 of 196The Ring programming language version 1.7 book - Part 64 of 196
The Ring programming language version 1.7 book - Part 64 of 196
 
Datamining r 4th
Datamining r 4thDatamining r 4th
Datamining r 4th
 

More from Lars Gregori

BYOM - Bring Your Own Model
BYOM - Bring Your Own ModelBYOM - Bring Your Own Model
BYOM - Bring Your Own ModelLars Gregori
 
uTensor - embedded devices and machine learning models
uTensor - embedded devices and machine learning modelsuTensor - embedded devices and machine learning models
uTensor - embedded devices and machine learning modelsLars Gregori
 
SAP Leonardo Machine Learning
SAP Leonardo Machine LearningSAP Leonardo Machine Learning
SAP Leonardo Machine LearningLars Gregori
 
Machine Learning Models on Mobile Devices
Machine Learning Models on Mobile DevicesMachine Learning Models on Mobile Devices
Machine Learning Models on Mobile DevicesLars Gregori
 
IoT protocolls - smart washing machine
IoT protocolls - smart washing machineIoT protocolls - smart washing machine
IoT protocolls - smart washing machineLars Gregori
 
Minecraft and Reinforcement Learning
Minecraft and Reinforcement LearningMinecraft and Reinforcement Learning
Minecraft and Reinforcement LearningLars Gregori
 
[DE] IoT Protokolle
[DE] IoT Protokolle[DE] IoT Protokolle
[DE] IoT ProtokolleLars Gregori
 
Using a trained model on your mobile device
Using a trained model on your mobile deviceUsing a trained model on your mobile device
Using a trained model on your mobile deviceLars Gregori
 
Using a trained model on your mobile device
Using a trained model on your mobile deviceUsing a trained model on your mobile device
Using a trained model on your mobile deviceLars Gregori
 
[German] Boards für das IoT-Prototyping
[German] Boards für das IoT-Prototyping[German] Boards für das IoT-Prototyping
[German] Boards für das IoT-PrototypingLars Gregori
 
IoT, APIs und Microservices - alles unter Node-RED
IoT, APIs und Microservices - alles unter Node-REDIoT, APIs und Microservices - alles unter Node-RED
IoT, APIs und Microservices - alles unter Node-REDLars Gregori
 
Web Bluetooth - Next Generation Bluetooth?
Web Bluetooth - Next Generation Bluetooth?   Web Bluetooth - Next Generation Bluetooth?
Web Bluetooth - Next Generation Bluetooth? Lars Gregori
 
Embedded Rust – Rust on IoT devices
Embedded Rust – Rust on IoT devicesEmbedded Rust – Rust on IoT devices
Embedded Rust – Rust on IoT devicesLars Gregori
 
Embedded Rust on IoT devices
Embedded Rust on IoT devicesEmbedded Rust on IoT devices
Embedded Rust on IoT devicesLars Gregori
 
IoT mit Rust programmieren
IoT mit Rust programmierenIoT mit Rust programmieren
IoT mit Rust programmierenLars Gregori
 
Boards for the IoT-Prototyping
Boards for the IoT-PrototypingBoards for the IoT-Prototyping
Boards for the IoT-PrototypingLars Gregori
 
Groß steuert klein - Wie lässt sich ein Arduino steuern?
Groß steuert klein - Wie lässt sich ein Arduino steuern?Groß steuert klein - Wie lässt sich ein Arduino steuern?
Groß steuert klein - Wie lässt sich ein Arduino steuern?Lars Gregori
 
Connecting Minecraft and e-Commerce business services
Connecting Minecraft and e-Commerce business servicesConnecting Minecraft and e-Commerce business services
Connecting Minecraft and e-Commerce business servicesLars Gregori
 
Moto - Orchestrating IoT for business users 
and connecting it to YaaS
Moto - Orchestrating IoT for business users 
and connecting it to YaaSMoto - Orchestrating IoT for business users 
and connecting it to YaaS
Moto - Orchestrating IoT for business users 
and connecting it to YaaSLars Gregori
 

More from Lars Gregori (20)

BYOM - Bring Your Own Model
BYOM - Bring Your Own ModelBYOM - Bring Your Own Model
BYOM - Bring Your Own Model
 
uTensor - embedded devices and machine learning models
uTensor - embedded devices and machine learning modelsuTensor - embedded devices and machine learning models
uTensor - embedded devices and machine learning models
 
SAP Leonardo Machine Learning
SAP Leonardo Machine LearningSAP Leonardo Machine Learning
SAP Leonardo Machine Learning
 
Machine Learning Models on Mobile Devices
Machine Learning Models on Mobile DevicesMachine Learning Models on Mobile Devices
Machine Learning Models on Mobile Devices
 
IoT protocolls - smart washing machine
IoT protocolls - smart washing machineIoT protocolls - smart washing machine
IoT protocolls - smart washing machine
 
Minecraft and Reinforcement Learning
Minecraft and Reinforcement LearningMinecraft and Reinforcement Learning
Minecraft and Reinforcement Learning
 
[DE] IoT Protokolle
[DE] IoT Protokolle[DE] IoT Protokolle
[DE] IoT Protokolle
 
Using a trained model on your mobile device
Using a trained model on your mobile deviceUsing a trained model on your mobile device
Using a trained model on your mobile device
 
Using a trained model on your mobile device
Using a trained model on your mobile deviceUsing a trained model on your mobile device
Using a trained model on your mobile device
 
AI and Minecraft
AI and MinecraftAI and Minecraft
AI and Minecraft
 
[German] Boards für das IoT-Prototyping
[German] Boards für das IoT-Prototyping[German] Boards für das IoT-Prototyping
[German] Boards für das IoT-Prototyping
 
IoT, APIs und Microservices - alles unter Node-RED
IoT, APIs und Microservices - alles unter Node-REDIoT, APIs und Microservices - alles unter Node-RED
IoT, APIs und Microservices - alles unter Node-RED
 
Web Bluetooth - Next Generation Bluetooth?
Web Bluetooth - Next Generation Bluetooth?   Web Bluetooth - Next Generation Bluetooth?
Web Bluetooth - Next Generation Bluetooth?
 
Embedded Rust – Rust on IoT devices
Embedded Rust – Rust on IoT devicesEmbedded Rust – Rust on IoT devices
Embedded Rust – Rust on IoT devices
 
Embedded Rust on IoT devices
Embedded Rust on IoT devicesEmbedded Rust on IoT devices
Embedded Rust on IoT devices
 
IoT mit Rust programmieren
IoT mit Rust programmierenIoT mit Rust programmieren
IoT mit Rust programmieren
 
Boards for the IoT-Prototyping
Boards for the IoT-PrototypingBoards for the IoT-Prototyping
Boards for the IoT-Prototyping
 
Groß steuert klein - Wie lässt sich ein Arduino steuern?
Groß steuert klein - Wie lässt sich ein Arduino steuern?Groß steuert klein - Wie lässt sich ein Arduino steuern?
Groß steuert klein - Wie lässt sich ein Arduino steuern?
 
Connecting Minecraft and e-Commerce business services
Connecting Minecraft and e-Commerce business servicesConnecting Minecraft and e-Commerce business services
Connecting Minecraft and e-Commerce business services
 
Moto - Orchestrating IoT for business users 
and connecting it to YaaS
Moto - Orchestrating IoT for business users 
and connecting it to YaaSMoto - Orchestrating IoT for business users 
and connecting it to YaaS
Moto - Orchestrating IoT for business users 
and connecting it to YaaS
 

Recently uploaded

Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 

Recently uploaded (20)

Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 

Minecraft and Reinforcement Learning

  • 1. #MinecraftRL @choas Minecraft and Reinforcement Learning Lars Gregori @choas labs.hybris.com
  • 4. #MinecraftRL @choas Minecraft Markus "Notch" Persson Mojang AB Best-selling PC game of all time Exploration Resource gathering Crafting Combat Sandbox construction game Creative + building aspects Three-dimensional environment
  • 6. #MinecraftRL @choas Project Malmo Open Source (Github) Microsoft Research Lab Based on Minecraft / Minecraft Forge Agents written in Python, Lua, C++, C#, Java, Torch, ALE* Mission XML WorldState Send Command *Arcade Learning Environment
  • 7. #MinecraftRL @choas “The Project Malmo platform is designed to support a wide range of experimentation needs and can support research in robotics, computer vision, reinforcement learning, planning, multi-agent systems, and related areas”The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence Project Malmo
  • 11. #MinecraftRL @choas “Reinforcement learning is like trial-and-error learning.”David Silver Reinforcement Learning
  • 12. #MinecraftRL @choas Reinforcement Learning: An Introduction Richard S. Sutton and Andrew G. Barto
 (1998) Reinforcement Learning Cliff Walking Example Reward: -1 per move 100 blue field -100 lava field
  • 16. #MinecraftRL @choas Q-Learning ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = q_table[prev_state][prev_action] max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 18. #MinecraftRL @choas Q-Learning ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = q_table[prev_state][prev_action] max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 19. #MinecraftRL @choas Q-Learning ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 20. #MinecraftRL @choas Q-Learning ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 21. #MinecraftRL @choas Q-Learning 100 
 -1 ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = old_q + ALPHA * (99.0 + GAMMA * max_q - old_q)
  • 22. #MinecraftRL @choas Q-Learning 100 
 -1 ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)
  • 23. #MinecraftRL @choas Q-Learning 100 
 -1 ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)
  • 24. #MinecraftRL @choas Q-Learning 100 
 -1 99.0 ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = 99.0
  • 26. #MinecraftRL @choas Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = q_table[prev_state][prev_action] max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 27. #MinecraftRL @choas Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 28. #MinecraftRL @choas Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 29. #MinecraftRL @choas Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = old_q + ALPHA * (-1.0 + GAMMA * max_q - old_q)
  • 30. #MinecraftRL @choas Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = old_q + ALPHA * (-1.0 + 0.8 * 99.0 - old_q)
  • 31. #MinecraftRL @choas Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = old_q + ALPHA * (-1.0 + 79.2 - old_q)
  • 32. #MinecraftRL @choas Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = -1.0 + 1.0 * (-1.0 + 79.2 - -1.0)
  • 33. #MinecraftRL @choas Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = -1.0 + 1.0 * (-1.0 + 79.2 + 1.0)
  • 34. #MinecraftRL @choas Q-Learning 100 
 -1 99.0 78.2 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = 78.2
  • 35. #MinecraftRL @choas [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0] [ L -1 -1 -1] [-1 -1 -1 -1] [-1 0 0 0] [ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0] [ L L -2 -1] [-2 -2 L -1] [ L -2 -2 -2] [-2 -2 L L] [ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 36. #MinecraftRL @choas [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0] [ L -1 -1 78] [-1 -1 -1 -1] [-1 0 0 0] [ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0] [ L L -2 -1] [-2 -2 L -1] [ L -2 -2 -2] [-2 -2 L L] [ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 37. #MinecraftRL @choas [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1] [ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1] [ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1] [ L L -2 -2] [-2 -3 L -2] [ L -2 -3 -2] [-3 -2 L L] [ L -3 -3 L] [-3 -3 -3 -3] [-2 -3 L -3] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 38. #MinecraftRL @choas [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1] [ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1] [ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1] [ L L -2 48] [-2 -3 L -2] [ L -2 -3 -2] [-3 -2 L L] [ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 39. #MinecraftRL @choas [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 -2] [-3 -3 L L] [ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 40. #MinecraftRL @choas [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 -3 L] [-3 -3 -3 -3] [-3 -3 L -3] [ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 41. #MinecraftRL @choas [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 -3 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 42. #MinecraftRL @choas [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L -4 L] [-4 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 43. #MinecraftRL @choas [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L -4 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 44. #MinecraftRL @choas [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 45. #MinecraftRL @choas [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3] ALPHA = 1.0 GAMMA = 0.8
  • 46. #MinecraftRL @choas [99 48 0 L] [48 0 0 0] [-1 0 L 0] [ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1] [ L -1 -1 -1] [-1 -1 -1 92] [-1 L L -1] [ L L -2 -1] [-2 -2 L 83] [ L -3 -3 74] [-2 -4 L L] [ L -5 -2 L] [-4 -4 -4 55] [-4 -4 L -4] [ L L -1 L] [-6 L 11 -5] [-5 L -5 31] [-5 L L -4] ALPHA = 0.5 GAMMA = 1.0 (40 moves)
  • 47. #MinecraftRL @choas [99 48 0 L] [48 0 0 0] [-1 0 L 0] [ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1] [ L -1 -1 47] [-2 -1 -1 95] [-1 L L -1] [ L L -2 -1] [-2 45 L 94] [ L -3 -3 93] [-2 -4 L L] [ L -5 -2 L] [-4 -4 -4 92] [-4 -4 L -4] [ L L 88 L] [-6 L 90 -5] [-5 L -5 91] [-5 L L -4] ALPHA = 0.5 GAMMA = 1.0 (60 moves)
  • 49. ?
  • 51. #MinecraftRL @choas Deep Reinforcement Learning Supervised Learning Unsupervised Learning Reinforcement Learning
  • 52. #MinecraftRL @choas Playing Atari with Deep Reinforcement Learning (arXiv:1312.5602) https://youtu.be/TmPfTpjtdgg
  • 56. #MinecraftRL @choas ### based on arXiv:1312.5602 (page 6)
 
 model = Sequential()
 model.add(Conv2D(16, (8, 8), strides=(4, 4), input_shape=input_shape))
 model.add(Activation('relu'))
 model.add(Conv2D(32, (4, 4), strides=(2, 2)))
 model.add(Activation(‘relu')) model.add(Flatten())
 model.add(Dense(256))
 model.add(Activation('relu')) model.add(Dense(12, activation=‘sigmoid')) # 12 classes / actions model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy']) Keras Model
  • 59. Links The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence http://www.ijcai.org/Proceedings/2016 Project Malmo https://www.microsoft.com/en-us/research/project/project-malmo/ Project Malmo (Github) https://github.com/Microsoft/malmo Reinforcement Learning: An Introduction - ISBN-13: 978-0262193986
 2nd Version online 
 YouTube RL Course by David Silver
  • 61. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its distributors contain proprietary software components 
 of other software vendors. National product specifications may vary. These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are 
 set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/ or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various 
 risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they should not be relied upon in making purchasing decisions. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) 
 in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies. 
 See http://global.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices. © 2018 SAP SE or an SAP affiliate company. All rights reserved.