SlideShare a Scribd company logo
1 of 61
Download to read offline
Minecraft and

Reinforcement Learning
Lars Gregori
?
?
?
?
Minecraft
Minecraft
Markus "Notch" Persson
Mojang AB
Best-selling PC game of all time
Exploration
Resource gathering
Crafting
Combat
Sandbox construction game
Creative + building aspects
Three-dimensional environment
Project Malmo
Project Malmo
Open Source (Github)
Microsoft Research Lab
Based on
Minecraft / Minecraft Forge
Agents written in
Python, Lua, C++, C#,
Java, Torch, ALE*
Mission XML
WorldState
Send Command
*Arcade Learning Environment
“The Project Malmo platform is designed to
support a wide range of experimentation
needs and can support research in robotics,
computer vision, reinforcement learning,
planning, multi-agent systems, and related
areas”The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence
Project Malmo
Reinforcement Learning
Reinforcement Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
Reinforcement Learning
Observation Reward Action
Environment
Agent
“Reinforcement learning is like
trial-and-error learning.”David Silver
Reinforcement Learning
Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto

(1998)
Reinforcement Learning
Cliff Walking Example
Reward:
-1 per move
100 blue field
-100 lava field
Reinforcement Learning Demo
Q-Learning
Q-Learning
Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = q_table[prev_state][prev_action]
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = q_table[prev_state][prev_action]
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
100 

-1
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = old_q + ALPHA * (99.0 + GAMMA * max_q - old_q)
Q-Learning
100 

-1
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)
Q-Learning
100 

-1
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)
Q-Learning
100 

-1
99.0
ALPHA = 1.0 ### step-size parameter
GAMMA = 0.8 ### discount-rate parameter
old_q = 0.0
max_q = 0.0
new_q = 99.0
Q-Learning
100 

-1
99.0
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = q_table[prev_state][prev_action]
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = max(q_table[current_state][:])
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (-1.0 + GAMMA * max_q - old_q)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (-1.0 + 0.8 * 99.0 - old_q)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = old_q + ALPHA * (-1.0 + 79.2 - old_q)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = -1.0 + 1.0 * (-1.0 + 79.2 - -1.0)
Q-Learning
100 

-1
99.0
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = -1.0 + 1.0 * (-1.0 + 79.2 + 1.0)
Q-Learning
100 

-1
99.0
78.2
ALPHA = 1.0
GAMMA = 0.8
old_q = -1.0
max_q = 99.0
new_q = 78.2
[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 0 0 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0]
[ L L -2 -1] [-2 -2 L -1]
[ L -2 -2 -2] [-2 -2 L L]
[ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0]
[ L -1 -1 78] [-1 -1 -1 -1] [-1 0 0 0]
[ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0]
[ L L -2 -1] [-2 -2 L -1]
[ L -2 -2 -2] [-2 -2 L L]
[ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1]
[ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1]
[ L L -2 -2] [-2 -3 L -2]
[ L -2 -3 -2] [-3 -2 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-2 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1]
[ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1]
[ L L -2 48] [-2 -3 L -2]
[ L -2 -3 -2] [-3 -2 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 -2] [-3 -3 L L]
[ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 -3 L] [-3 -3 -3 -3] [-3 -3 L -3]
[ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 -3 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-4 L -3 -3] [-3 L -3 -3] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L -4 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
Q Table
L = Lava
[ ← ↓ → ↑ ]
[99 0 0 0] [78 -1 -1 0] [-1 -1 L -1]
[ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1]
[ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1]
[ L L -2 48] [-2 -3 L 37]
[ L -3 -3 29] [-3 -3 L L]
[ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3]
[ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3]
ALPHA = 1.0 GAMMA = 0.8
[99 48 0 L] [48 0 0 0] [-1 0 L 0]
[ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1]
[ L -1 -1 -1] [-1 -1 -1 92] [-1 L L -1]
[ L L -2 -1] [-2 -2 L 83]
[ L -3 -3 74] [-2 -4 L L]
[ L -5 -2 L] [-4 -4 -4 55] [-4 -4 L -4]
[ L L -1 L] [-6 L 11 -5] [-5 L -5 31] [-5 L L -4]
ALPHA = 0.5 GAMMA = 1.0 (40 moves)
[99 48 0 L] [48 0 0 0] [-1 0 L 0]
[ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1]
[ L -1 -1 47] [-2 -1 -1 95] [-1 L L -1]
[ L L -2 -1] [-2 45 L 94]
[ L -3 -3 93] [-2 -4 L L]
[ L -5 -2 L] [-4 -4 -4 92] [-4 -4 L -4]
[ L L 88 L] [-6 L 90 -5] [-5 L -5 91] [-5 L L -4]
ALPHA = 0.5 GAMMA = 1.0 (60 moves)
!
!
!
!
?
Deep Reinforcement Learning
Deep Reinforcement Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
Playing Atari with Deep Reinforcement Learning (arXiv:1312.5602)
https://youtu.be/TmPfTpjtdgg
12 Classes
west_super
arXiv:1312.5602 - Playing Atari with Deep Reinforcement Learning (page 6)
### based on arXiv:1312.5602 (page 6)



model = Sequential()

model.add(Conv2D(16, (8, 8), strides=(4, 4), input_shape=input_shape))

model.add(Activation('relu'))

model.add(Conv2D(32, (4, 4), strides=(2, 2)))

model.add(Activation(‘relu'))
model.add(Flatten())

model.add(Dense(256))

model.add(Activation('relu'))
model.add(Dense(12, activation=‘sigmoid')) # 12 classes / actions
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
Keras Model
Deep Reinforcement Learning Demo
Take-away
Links
The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint
Conference on Artificial Intelligence http://www.ijcai.org/Proceedings/2016
Project Malmo https://www.microsoft.com/en-us/research/project/project-malmo/
Project Malmo (Github) https://github.com/Microsoft/malmo
Reinforcement Learning: An Introduction - ISBN-13: 978-0262193986

2nd Version online


YouTube RL Course by David Silver
Thank you.

More Related Content

What's hot

Algorithm: Quick-Sort
Algorithm: Quick-SortAlgorithm: Quick-Sort
Algorithm: Quick-SortTareq Hasan
 
Quick sort Algorithm Discussion And Analysis
Quick sort Algorithm Discussion And AnalysisQuick sort Algorithm Discussion And Analysis
Quick sort Algorithm Discussion And AnalysisSNJ Chaudhary
 
Data Structure and Algorithms Merge Sort
Data Structure and Algorithms Merge SortData Structure and Algorithms Merge Sort
Data Structure and Algorithms Merge SortManishPrajapati78
 
Unit 3 - Function & Grouping,Joins and Set Operations in ORACLE
Unit 3 - Function & Grouping,Joins and Set Operations in ORACLEUnit 3 - Function & Grouping,Joins and Set Operations in ORACLE
Unit 3 - Function & Grouping,Joins and Set Operations in ORACLEDrkhanchanaR
 
Divide and conquer - Quick sort
Divide and conquer - Quick sortDivide and conquer - Quick sort
Divide and conquer - Quick sortMadhu Bala
 
3.8 quick sort
3.8 quick sort3.8 quick sort
3.8 quick sortKrish_ver2
 
Analysis of Algorithm (Bubblesort and Quicksort)
Analysis of Algorithm (Bubblesort and Quicksort)Analysis of Algorithm (Bubblesort and Quicksort)
Analysis of Algorithm (Bubblesort and Quicksort)Flynce Miguel
 
Quick sort algorithn
Quick sort algorithnQuick sort algorithn
Quick sort algorithnKumar
 

What's hot (19)

Data Structure Sorting
Data Structure SortingData Structure Sorting
Data Structure Sorting
 
Quicksort
QuicksortQuicksort
Quicksort
 
Algorithm: Quick-Sort
Algorithm: Quick-SortAlgorithm: Quick-Sort
Algorithm: Quick-Sort
 
Quick sort Algorithm Discussion And Analysis
Quick sort Algorithm Discussion And AnalysisQuick sort Algorithm Discussion And Analysis
Quick sort Algorithm Discussion And Analysis
 
Data Structure and Algorithms Merge Sort
Data Structure and Algorithms Merge SortData Structure and Algorithms Merge Sort
Data Structure and Algorithms Merge Sort
 
Unit 3 - Function & Grouping,Joins and Set Operations in ORACLE
Unit 3 - Function & Grouping,Joins and Set Operations in ORACLEUnit 3 - Function & Grouping,Joins and Set Operations in ORACLE
Unit 3 - Function & Grouping,Joins and Set Operations in ORACLE
 
Merge sort-algorithm for computer science engineering students
Merge sort-algorithm for computer science engineering studentsMerge sort-algorithm for computer science engineering students
Merge sort-algorithm for computer science engineering students
 
Unit 7 sorting
Unit 7   sortingUnit 7   sorting
Unit 7 sorting
 
Divide and conquer - Quick sort
Divide and conquer - Quick sortDivide and conquer - Quick sort
Divide and conquer - Quick sort
 
3.8 quicksort
3.8 quicksort3.8 quicksort
3.8 quicksort
 
3.8 quick sort
3.8 quick sort3.8 quick sort
3.8 quick sort
 
Analysis of Algorithm (Bubblesort and Quicksort)
Analysis of Algorithm (Bubblesort and Quicksort)Analysis of Algorithm (Bubblesort and Quicksort)
Analysis of Algorithm (Bubblesort and Quicksort)
 
Merge sort
Merge sortMerge sort
Merge sort
 
State feedback example
State feedback exampleState feedback example
State feedback example
 
Merge sort algorithm power point presentation
Merge sort algorithm power point presentationMerge sort algorithm power point presentation
Merge sort algorithm power point presentation
 
Unit 3 stack
Unit 3   stackUnit 3   stack
Unit 3 stack
 
Merge sort
Merge sortMerge sort
Merge sort
 
Quick sort algorithn
Quick sort algorithnQuick sort algorithn
Quick sort algorithn
 
Sorting
SortingSorting
Sorting
 

Similar to Minecraft and reinforcement learning

The LCA problem revisited
The LCA problem revisitedThe LCA problem revisited
The LCA problem revisitedMinsung Hong
 
Datamining r 4th
Datamining r 4thDatamining r 4th
Datamining r 4thsesejun
 
Mathematical Modelling of Electrical/Mechanical modellinng in MATLAB
Mathematical Modelling of Electrical/Mechanical modellinng in MATLABMathematical Modelling of Electrical/Mechanical modellinng in MATLAB
Mathematical Modelling of Electrical/Mechanical modellinng in MATLABCOMSATS Abbottabad
 
Soluções dos exercícios de cinética química digitados
Soluções dos exercícios de cinética química digitadosSoluções dos exercícios de cinética química digitados
Soluções dos exercícios de cinética química digitadosMárcio Martins
 
‏‏chap6 list tuples.pptx
‏‏chap6 list tuples.pptx‏‏chap6 list tuples.pptx
‏‏chap6 list tuples.pptxRamiHarrathi1
 
PRE: Datamining 2nd R
PRE: Datamining 2nd RPRE: Datamining 2nd R
PRE: Datamining 2nd Rsesejun
 
Datamining R 1st
Datamining R 1stDatamining R 1st
Datamining R 1stsesejun
 
Hiroaki Shiokawa
Hiroaki ShiokawaHiroaki Shiokawa
Hiroaki ShiokawaSuurist
 
06.scd_muestreo_de_senales_continuas
06.scd_muestreo_de_senales_continuas06.scd_muestreo_de_senales_continuas
06.scd_muestreo_de_senales_continuasHipólito Aguilar
 
Datamining r 1st
Datamining r 1stDatamining r 1st
Datamining r 1stsesejun
 
Intoduction to numpy
Intoduction to numpyIntoduction to numpy
Intoduction to numpyFaraz Ahmed
 
Transformer xl
Transformer xlTransformer xl
Transformer xlSan Kim
 
Session -5for students.pdf
Session -5for students.pdfSession -5for students.pdf
Session -5for students.pdfpriyanshusoni53
 

Similar to Minecraft and reinforcement learning (18)

The LCA problem revisited
The LCA problem revisitedThe LCA problem revisited
The LCA problem revisited
 
Datamining r 4th
Datamining r 4thDatamining r 4th
Datamining r 4th
 
Mathematical Modelling of Electrical/Mechanical modellinng in MATLAB
Mathematical Modelling of Electrical/Mechanical modellinng in MATLABMathematical Modelling of Electrical/Mechanical modellinng in MATLAB
Mathematical Modelling of Electrical/Mechanical modellinng in MATLAB
 
Soluções dos exercícios de cinética química digitados
Soluções dos exercícios de cinética química digitadosSoluções dos exercícios de cinética química digitados
Soluções dos exercícios de cinética química digitados
 
‏‏chap6 list tuples.pptx
‏‏chap6 list tuples.pptx‏‏chap6 list tuples.pptx
‏‏chap6 list tuples.pptx
 
PRE: Datamining 2nd R
PRE: Datamining 2nd RPRE: Datamining 2nd R
PRE: Datamining 2nd R
 
Datamining R 1st
Datamining R 1stDatamining R 1st
Datamining R 1st
 
Hiroaki Shiokawa
Hiroaki ShiokawaHiroaki Shiokawa
Hiroaki Shiokawa
 
06.scd_muestreo_de_senales_continuas
06.scd_muestreo_de_senales_continuas06.scd_muestreo_de_senales_continuas
06.scd_muestreo_de_senales_continuas
 
Datamining r 1st
Datamining r 1stDatamining r 1st
Datamining r 1st
 
Intoduction to numpy
Intoduction to numpyIntoduction to numpy
Intoduction to numpy
 
Transformer xl
Transformer xlTransformer xl
Transformer xl
 
Python lists
Python listsPython lists
Python lists
 
Session -5for students.pdf
Session -5for students.pdfSession -5for students.pdf
Session -5for students.pdf
 
Oil Prices Data Analysis - R
Oil Prices Data Analysis - ROil Prices Data Analysis - R
Oil Prices Data Analysis - R
 
4.1 matrices
4.1 matrices4.1 matrices
4.1 matrices
 
013 LISTS.pdf
013 LISTS.pdf013 LISTS.pdf
013 LISTS.pdf
 
Resoluçãohaskell2
Resoluçãohaskell2Resoluçãohaskell2
Resoluçãohaskell2
 

More from Lars Gregori

BYOM - Bring Your Own Model
BYOM - Bring Your Own ModelBYOM - Bring Your Own Model
BYOM - Bring Your Own ModelLars Gregori
 
uTensor - embedded devices and machine learning models
uTensor - embedded devices and machine learning modelsuTensor - embedded devices and machine learning models
uTensor - embedded devices and machine learning modelsLars Gregori
 
SAP Leonardo Machine Learning
SAP Leonardo Machine LearningSAP Leonardo Machine Learning
SAP Leonardo Machine LearningLars Gregori
 
Machine Learning Models on Mobile Devices
Machine Learning Models on Mobile DevicesMachine Learning Models on Mobile Devices
Machine Learning Models on Mobile DevicesLars Gregori
 
IoT protocolls - smart washing machine
IoT protocolls - smart washing machineIoT protocolls - smart washing machine
IoT protocolls - smart washing machineLars Gregori
 
[DE] AI und Minecraft
[DE] AI und Minecraft[DE] AI und Minecraft
[DE] AI und MinecraftLars Gregori
 
Minecraft and Reinforcement Learning
Minecraft and Reinforcement LearningMinecraft and Reinforcement Learning
Minecraft and Reinforcement LearningLars Gregori
 
[DE] IoT Protokolle
[DE] IoT Protokolle[DE] IoT Protokolle
[DE] IoT ProtokolleLars Gregori
 
Using a trained model on your mobile device
Using a trained model on your mobile deviceUsing a trained model on your mobile device
Using a trained model on your mobile deviceLars Gregori
 
Using a trained model on your mobile device
Using a trained model on your mobile deviceUsing a trained model on your mobile device
Using a trained model on your mobile deviceLars Gregori
 
[German] Boards für das IoT-Prototyping
[German] Boards für das IoT-Prototyping[German] Boards für das IoT-Prototyping
[German] Boards für das IoT-PrototypingLars Gregori
 
IoT, APIs und Microservices - alles unter Node-RED
IoT, APIs und Microservices - alles unter Node-REDIoT, APIs und Microservices - alles unter Node-RED
IoT, APIs und Microservices - alles unter Node-REDLars Gregori
 
Web Bluetooth - Next Generation Bluetooth?
Web Bluetooth - Next Generation Bluetooth?   Web Bluetooth - Next Generation Bluetooth?
Web Bluetooth - Next Generation Bluetooth? Lars Gregori
 
Embedded Rust – Rust on IoT devices
Embedded Rust – Rust on IoT devicesEmbedded Rust – Rust on IoT devices
Embedded Rust – Rust on IoT devicesLars Gregori
 
Embedded Rust on IoT devices
Embedded Rust on IoT devicesEmbedded Rust on IoT devices
Embedded Rust on IoT devicesLars Gregori
 
IoT mit Rust programmieren
IoT mit Rust programmierenIoT mit Rust programmieren
IoT mit Rust programmierenLars Gregori
 
Boards for the IoT-Prototyping
Boards for the IoT-PrototypingBoards for the IoT-Prototyping
Boards for the IoT-PrototypingLars Gregori
 
Groß steuert klein - Wie lässt sich ein Arduino steuern?
Groß steuert klein - Wie lässt sich ein Arduino steuern?Groß steuert klein - Wie lässt sich ein Arduino steuern?
Groß steuert klein - Wie lässt sich ein Arduino steuern?Lars Gregori
 
Connecting Minecraft and e-Commerce business services
Connecting Minecraft and e-Commerce business servicesConnecting Minecraft and e-Commerce business services
Connecting Minecraft and e-Commerce business servicesLars Gregori
 

More from Lars Gregori (20)

BYOM - Bring Your Own Model
BYOM - Bring Your Own ModelBYOM - Bring Your Own Model
BYOM - Bring Your Own Model
 
uTensor - embedded devices and machine learning models
uTensor - embedded devices and machine learning modelsuTensor - embedded devices and machine learning models
uTensor - embedded devices and machine learning models
 
SAP Leonardo Machine Learning
SAP Leonardo Machine LearningSAP Leonardo Machine Learning
SAP Leonardo Machine Learning
 
Machine Learning Models on Mobile Devices
Machine Learning Models on Mobile DevicesMachine Learning Models on Mobile Devices
Machine Learning Models on Mobile Devices
 
IoT protocolls - smart washing machine
IoT protocolls - smart washing machineIoT protocolls - smart washing machine
IoT protocolls - smart washing machine
 
[DE] AI und Minecraft
[DE] AI und Minecraft[DE] AI und Minecraft
[DE] AI und Minecraft
 
Minecraft and Reinforcement Learning
Minecraft and Reinforcement LearningMinecraft and Reinforcement Learning
Minecraft and Reinforcement Learning
 
[DE] IoT Protokolle
[DE] IoT Protokolle[DE] IoT Protokolle
[DE] IoT Protokolle
 
Using a trained model on your mobile device
Using a trained model on your mobile deviceUsing a trained model on your mobile device
Using a trained model on your mobile device
 
Using a trained model on your mobile device
Using a trained model on your mobile deviceUsing a trained model on your mobile device
Using a trained model on your mobile device
 
AI and Minecraft
AI and MinecraftAI and Minecraft
AI and Minecraft
 
[German] Boards für das IoT-Prototyping
[German] Boards für das IoT-Prototyping[German] Boards für das IoT-Prototyping
[German] Boards für das IoT-Prototyping
 
IoT, APIs und Microservices - alles unter Node-RED
IoT, APIs und Microservices - alles unter Node-REDIoT, APIs und Microservices - alles unter Node-RED
IoT, APIs und Microservices - alles unter Node-RED
 
Web Bluetooth - Next Generation Bluetooth?
Web Bluetooth - Next Generation Bluetooth?   Web Bluetooth - Next Generation Bluetooth?
Web Bluetooth - Next Generation Bluetooth?
 
Embedded Rust – Rust on IoT devices
Embedded Rust – Rust on IoT devicesEmbedded Rust – Rust on IoT devices
Embedded Rust – Rust on IoT devices
 
Embedded Rust on IoT devices
Embedded Rust on IoT devicesEmbedded Rust on IoT devices
Embedded Rust on IoT devices
 
IoT mit Rust programmieren
IoT mit Rust programmierenIoT mit Rust programmieren
IoT mit Rust programmieren
 
Boards for the IoT-Prototyping
Boards for the IoT-PrototypingBoards for the IoT-Prototyping
Boards for the IoT-Prototyping
 
Groß steuert klein - Wie lässt sich ein Arduino steuern?
Groß steuert klein - Wie lässt sich ein Arduino steuern?Groß steuert klein - Wie lässt sich ein Arduino steuern?
Groß steuert klein - Wie lässt sich ein Arduino steuern?
 
Connecting Minecraft and e-Commerce business services
Connecting Minecraft and e-Commerce business servicesConnecting Minecraft and e-Commerce business services
Connecting Minecraft and e-Commerce business services
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 

Minecraft and reinforcement learning

  • 4. Minecraft Markus "Notch" Persson Mojang AB Best-selling PC game of all time Exploration Resource gathering Crafting Combat Sandbox construction game Creative + building aspects Three-dimensional environment
  • 6. Project Malmo Open Source (Github) Microsoft Research Lab Based on Minecraft / Minecraft Forge Agents written in Python, Lua, C++, C#, Java, Torch, ALE* Mission XML WorldState Send Command *Arcade Learning Environment
  • 7. “The Project Malmo platform is designed to support a wide range of experimentation needs and can support research in robotics, computer vision, reinforcement learning, planning, multi-agent systems, and related areas”The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence Project Malmo
  • 10. Reinforcement Learning Observation Reward Action Environment Agent
  • 11. “Reinforcement learning is like trial-and-error learning.”David Silver Reinforcement Learning
  • 12. Reinforcement Learning: An Introduction Richard S. Sutton and Andrew G. Barto
 (1998) Reinforcement Learning Cliff Walking Example Reward: -1 per move 100 blue field -100 lava field
  • 16. Q-Learning ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = q_table[prev_state][prev_action] max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 18. Q-Learning ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = q_table[prev_state][prev_action] max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 19. Q-Learning ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 20. Q-Learning ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 21. Q-Learning 100 
 -1 ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = old_q + ALPHA * (99.0 + GAMMA * max_q - old_q)
  • 22. Q-Learning 100 
 -1 ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)
  • 23. Q-Learning 100 
 -1 ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = 0.0 + 1.0 * (99.0 + 0.8 * 0.0 - 0.0)
  • 24. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 ### step-size parameter GAMMA = 0.8 ### discount-rate parameter old_q = 0.0 max_q = 0.0 new_q = 99.0
  • 26. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = q_table[prev_state][prev_action] max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 27. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = max(q_table[current_state][:]) new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 28. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = old_q + ALPHA * (reward + GAMMA * max_q - old_q)
  • 29. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = old_q + ALPHA * (-1.0 + GAMMA * max_q - old_q)
  • 30. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = old_q + ALPHA * (-1.0 + 0.8 * 99.0 - old_q)
  • 31. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = old_q + ALPHA * (-1.0 + 79.2 - old_q)
  • 32. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = -1.0 + 1.0 * (-1.0 + 79.2 - -1.0)
  • 33. Q-Learning 100 
 -1 99.0 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = -1.0 + 1.0 * (-1.0 + 79.2 + 1.0)
  • 34. Q-Learning 100 
 -1 99.0 78.2 ALPHA = 1.0 GAMMA = 0.8 old_q = -1.0 max_q = 99.0 new_q = 78.2
  • 35. [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0] [ L -1 -1 -1] [-1 -1 -1 -1] [-1 0 0 0] [ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0] [ L L -2 -1] [-2 -2 L -1] [ L -2 -2 -2] [-2 -2 L L] [ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 36. [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L 0] [ L -1 -1 78] [-1 -1 -1 -1] [-1 0 0 0] [ L -1 -1 -1] [-1 -1 -1 -1] [-1 L 0 0] [ L L -2 -1] [-2 -2 L -1] [ L -2 -2 -2] [-2 -2 L L] [ L -3 -2 L] [-2 -3 -2 -2] [-2 -3 L -2] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-2 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 37. [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1] [ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1] [ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1] [ L L -2 -2] [-2 -3 L -2] [ L -2 -3 -2] [-3 -2 L L] [ L -3 -3 L] [-3 -3 -3 -3] [-2 -3 L -3] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 38. [99 0 0 0] [ 0 -1 -1 0] [ 0 0 L -1] [ L -1 -1 78] [61 -1 -1 -1] [-1 -1 L -1] [ L -2 -2 61] [-2 -1 -1 -1] [-1 L L -1] [ L L -2 48] [-2 -3 L -2] [ L -2 -3 -2] [-3 -2 L L] [ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 39. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 -2] [-3 -3 L L] [ L -3 -3 L] [-3 -3 -3 -3] [-3 -3 L -3] [ L L -3 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 40. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 -3 L] [-3 -3 -3 -3] [-3 -3 L -3] [ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 41. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 -3 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L -4 L] [-3 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 42. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L -4 L] [-4 L -3 -3] [-3 L -3 -3] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 43. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L -4 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 44. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3] Q Table L = Lava [ ← ↓ → ↑ ]
  • 45. [99 0 0 0] [78 -1 -1 0] [-1 -1 L -1] [ L -1 -1 78] [61 -1 -1 -1] [48 -1 L -1] [ L -2 -2 61] [-2 -2 -2 48] [-1 L L -1] [ L L -2 48] [-2 -3 L 37] [ L -3 -3 29] [-3 -3 L L] [ L -4 16 L] [-3 -3 -3 22] [-3 -3 L -3] [ L L 8 L] [-4 L -3 12] [-3 L -3 16] [-3 L L -3] ALPHA = 1.0 GAMMA = 0.8
  • 46. [99 48 0 L] [48 0 0 0] [-1 0 L 0] [ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1] [ L -1 -1 -1] [-1 -1 -1 92] [-1 L L -1] [ L L -2 -1] [-2 -2 L 83] [ L -3 -3 74] [-2 -4 L L] [ L -5 -2 L] [-4 -4 -4 55] [-4 -4 L -4] [ L L -1 L] [-6 L 11 -5] [-5 L -5 31] [-5 L L -4] ALPHA = 0.5 GAMMA = 1.0 (40 moves)
  • 47. [99 48 0 L] [48 0 0 0] [-1 0 L 0] [ L 0 -1 97] [96 -1 -1 -1] [-1 -1 L -1] [ L -1 -1 47] [-2 -1 -1 95] [-1 L L -1] [ L L -2 -1] [-2 45 L 94] [ L -3 -3 93] [-2 -4 L L] [ L -5 -2 L] [-4 -4 -4 92] [-4 -4 L -4] [ L L 88 L] [-6 L 90 -5] [-5 L -5 91] [-5 L L -4] ALPHA = 0.5 GAMMA = 1.0 (60 moves)
  • 49. ?
  • 52. Playing Atari with Deep Reinforcement Learning (arXiv:1312.5602) https://youtu.be/TmPfTpjtdgg
  • 53.
  • 54.
  • 56. arXiv:1312.5602 - Playing Atari with Deep Reinforcement Learning (page 6)
  • 57. ### based on arXiv:1312.5602 (page 6)
 
 model = Sequential()
 model.add(Conv2D(16, (8, 8), strides=(4, 4), input_shape=input_shape))
 model.add(Activation('relu'))
 model.add(Conv2D(32, (4, 4), strides=(2, 2)))
 model.add(Activation(‘relu')) model.add(Flatten())
 model.add(Dense(256))
 model.add(Activation('relu')) model.add(Dense(12, activation=‘sigmoid')) # 12 classes / actions model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy']) Keras Model
  • 60. Links The Malmo Platform for Artificial Intelligence Experimentation. Proc. 25th International Joint Conference on Artificial Intelligence http://www.ijcai.org/Proceedings/2016 Project Malmo https://www.microsoft.com/en-us/research/project/project-malmo/ Project Malmo (Github) https://github.com/Microsoft/malmo Reinforcement Learning: An Introduction - ISBN-13: 978-0262193986
 2nd Version online 
 YouTube RL Course by David Silver