Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
• 自己紹介
• AnyLogic入門
• 強化学習の入門
• AnyLogic+強化学習のメリット
• サンプルと実績の紹介
| OUTLINE
Currently VP. Engineering @ Skymind
• Leading RL Applications
• Previously:
• Assistant Manager @ JBS
• Intern Researcher ...
● Builds AI infrastructure for operating models in
production
● Allows model access from cloud, server,
desktop, and mobil...
Skymind’s team has contributed millions of lines of code to Open Source
| OPEN SOURCE CONTRIBUTORS
5
Deep Learning, A Practitioner’s Approach
● Written by Adam Gibson (CTO) and Josh Patterson (Contributor)
● Published in 20...
Deep Learning and the Game of Go
● Written by Max Pumperla, Deep Learning Engineer @ Skymind
● Published in 2019
● Shows h...
AnyLogic入門
8
AnyLogic is a multi-modal simulation modeling
software that is capable of doing system
dynamics, agent-based and discrete ...
AnyLogic models are extended with Java so you can create custom agents or experiments.
Exported applications are Java libr...
DL4J includes RL4J, a reinforcement library for Java. It can be used
inside AnyLogic without friction.
Reinforcement Learn...
強化学習入門
12
| WHAT IS AI?
13
| 4 TYPES OF LEARNING
14
| REINFORCEMENT LEARNING IN DETAIL
| REINFORCEMENT LEARNING ALGORITHMS (VALUE)
Q-learning is a method for training a reinforcement
learning agent to anticipa...
| REINFORCEMENT LEARNING ALGORITHMS (POLICY)
Actor Critic based algorithms use the current
state as the input and outputs ...
AnyLogic+強化学習のメリット
18
• Lots of NP-Hard problems exist in Simulation
• Current Optimization techniques are not able to do anything
• A good enou...
© The AnyLogic Company |
www.anylogic.com
20
Learning and decision making from a simulation model
FINAL MODEL
LEARN
Simula...
© The AnyLogic Company |
www.anylogic.com
21
Learning and decision making from a simulation model
FINAL MODEL
LEARN
© The AnyLogic Company |
www.anylogic.com
22
Simulation as the reinforcement learning environment
SIMULATED WORLD
(Simulat...
サンプルと実績の紹介
23
© The AnyLogic Company |
www.anylogic.com
24
Traffic Light Example
Eduardo Gonzalez
VP Engineering
Skymind
Samuel Audet
De...
© The AnyLogic Company |
www.anylogic.com
25
Arrivalrates(perhour)
Time (seconds)
Traffic Light Example
Cars enter the int...
© The AnyLogic Company |
www.anylogic.com
26
Implementation Architecture
© The AnyLogic Company |
www.anylogic.com
27
Implementation Architecture
AnyLogic Model
Imported RL4J
library
Custom Exper...
© The AnyLogic Company |
www.anylogic.com
28
What is inside the Custom experiment?
Hyperparameters
Network configuration
T...
© The AnyLogic Company |
www.anylogic.com
29
What is inside the Custom experiment?
Network configuration
10
300 300
2
Inpu...
© The AnyLogic Company |
www.anylogic.com
30
What is inside the Custom experiment?
Network configuration
© The AnyLogic Company |
www.anylogic.com
31
What is inside the Custom experiment?
Network configuration
Training
© The AnyLogic Company |
www.anylogic.com
32
What is inside the Custom experiment?
© The AnyLogic Company |
www.anylogic.com
33
What is inside the Custom experiment?
Array with 10 elements
1
2
34
5
6
87
9
© The AnyLogic Company |
www.anylogic.com
34
What is inside the Custom experiment?
© The AnyLogic Company |
www.anylogic.com
35
What is inside the Custom experiment?
Action == 0: do nothing
Action == 1: ch...
© The AnyLogic Company |
www.anylogic.com
36
Comparison of results (Optimized vs. Policy)
© The AnyLogic Company |
www.anylogic.com
37
© The AnyLogic Company |
www.anylogic.com
38
Comparison of results (Base vs. Optimized vs. Policy)
Real systems: Dynamic +...
© The AnyLogic Company |
www.anylogic.com
39
Reinforcement learning decision points
Hyperparameters Observation Space
Acti...
© The AnyLogic Company |
www.anylogic.com
40
Trained policies can be deployed in
all types of devices and equipments
to ad...
© The AnyLogic Company |
www.anylogic.com
41
Machine Learning powered by Skymind
http://www.skymind.ai/anylogic
© The AnyLogic Company |
www.anylogic.com
42
• The great news for simulation modelers is that
their skills have a new and ...
© The AnyLogic Company |
www.anylogic.com
43
At least in near future, there is NO way to automate the process of abstracti...
© The AnyLogic Company |
www.anylogic.com
44
thank you!
Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind
Upcoming SlideShare
Loading in …5
×

of

Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 1 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 2 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 3 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 4 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 5 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 6 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 7 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 8 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 9 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 10 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 11 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 12 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 13 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 14 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 15 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 16 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 17 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 18 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 19 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 20 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 21 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 22 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 23 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 24 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 25 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 26 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 27 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 28 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 29 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 30 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 31 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 32 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 33 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 34 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 35 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 36 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 37 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 38 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 39 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 40 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 41 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 42 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 43 Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind Slide 44
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind

Download to read offline

Tech on#06
強化学習を使った次世代シミュレーション最適化
Eduardo Gonzalez様@skymind

#TechOn東京

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind

  1. 1. • 自己紹介 • AnyLogic入門 • 強化学習の入門 • AnyLogic+強化学習のメリット • サンプルと実績の紹介 | OUTLINE
  2. 2. Currently VP. Engineering @ Skymind • Leading RL Applications • Previously: • Assistant Manager @ JBS • Intern Researcher @ Panasonic Eduardo Gonzalez | WHO AM I 3 @wm_eddie https://qiita.com/wmeddie https://wm-eddie.info
  3. 3. ● Builds AI infrastructure for operating models in production ● Allows model access from cloud, server, desktop, and mobile ● Providing tooling for models such as revision history and accuracy monitoring over time ● Created the widely used open-source AI framework Deeplearning4j, powering AI for large enterprises globally, from banking to telecom PRODUCTS SKIL: ML and DL Model Server | ABOUT SKYMIND 4
  4. 4. Skymind’s team has contributed millions of lines of code to Open Source | OPEN SOURCE CONTRIBUTORS 5
  5. 5. Deep Learning, A Practitioner’s Approach ● Written by Adam Gibson (CTO) and Josh Patterson (Contributor) ● Published in 2017 ● Good fundamentals for deep learning and the DL4J framework ● Many Graphics come from the book | BOOK 6
  6. 6. Deep Learning and the Game of Go ● Written by Max Pumperla, Deep Learning Engineer @ Skymind ● Published in 2019 ● Shows how to go from 0 to an entire AlphaZero style Go bot ● Introduces Deep Learning and Reinforcement Learning from scratch. | BOOK 7
  7. 7. AnyLogic入門 8
  8. 8. AnyLogic is a multi-modal simulation modeling software that is capable of doing system dynamics, agent-based and discrete event based simulations. It is a de facto standard in the industry and is used by almost all of the Fortune 500. | ANYLOGIC AnyLogic models can be exported into a Java application and deployed to customers.
  9. 9. AnyLogic models are extended with Java so you can create custom agents or experiments. Exported applications are Java libraries and can be integrated into and leverage data from Enterprise applications and Excel. | ANYLOGIC DETAILS
  10. 10. DL4J includes RL4J, a reinforcement library for Java. It can be used inside AnyLogic without friction. Reinforcement Learning was a main theme of the AnyLogic ’19 Conference. Skymind collaborated closely with AnyLogic for workshops and panel discussions. | WHY ANYLOGIC + SKYMIND
  11. 11. 強化学習入門 12
  12. 12. | WHAT IS AI? 13
  13. 13. | 4 TYPES OF LEARNING 14
  14. 14. | REINFORCEMENT LEARNING IN DETAIL
  15. 15. | REINFORCEMENT LEARNING ALGORITHMS (VALUE) Q-learning is a method for training a reinforcement learning agent to anticipate how much reward it can expect in the future. The Q comes from the standard mathematical notation Q(s, a) which is a function of the state and a possible action © Intel Illustration from Deep Learning and the Game of Go © Manning
  16. 16. | REINFORCEMENT LEARNING ALGORITHMS (POLICY) Actor Critic based algorithms use the current state as the input and outputs a set of moves it should play (the policy), and a value of which player is ahead (the critic) © Intel Illustration from Deep Learning and the Game of Go © Manning
  17. 17. AnyLogic+強化学習のメリット 18
  18. 18. • Lots of NP-Hard problems exist in Simulation • Current Optimization techniques are not able to do anything • A good enough solution is better than no solution • And better than hand written heuristics | WHY REINFORCEMENT LEARNING
  19. 19. © The AnyLogic Company | www.anylogic.com 20 Learning and decision making from a simulation model FINAL MODEL LEARN Simulation model is an extension of someone’s mental model
  20. 20. © The AnyLogic Company | www.anylogic.com 21 Learning and decision making from a simulation model FINAL MODEL LEARN
  21. 21. © The AnyLogic Company | www.anylogic.com 22 Simulation as the reinforcement learning environment SIMULATED WORLD (Simulation Model)
  22. 22. サンプルと実績の紹介 23
  23. 23. © The AnyLogic Company | www.anylogic.com 24 Traffic Light Example Eduardo Gonzalez VP Engineering Skymind Samuel Audet Deep Learning Engineer Skymind Tyler Wolfe-Adam Technical Support Specialist The AnyLogic Company
  24. 24. © The AnyLogic Company | www.anylogic.com 25 Arrivalrates(perhour) Time (seconds) Traffic Light Example Cars enter the intersection from 4 directions and move towards the opposing side. The objective of the training experiment is to learn a policy optimally controls the traffic light based on current status of the traffic. N S W E
  25. 25. © The AnyLogic Company | www.anylogic.com 26 Implementation Architecture
  26. 26. © The AnyLogic Company | www.anylogic.com 27 Implementation Architecture AnyLogic Model Imported RL4J library Custom Experiment
  27. 27. © The AnyLogic Company | www.anylogic.com 28 What is inside the Custom experiment? Hyperparameters Network configuration Training
  28. 28. © The AnyLogic Company | www.anylogic.com 29 What is inside the Custom experiment? Network configuration 10 300 300 2 Input Hidden 1 Hidden 2 Output
  29. 29. © The AnyLogic Company | www.anylogic.com 30 What is inside the Custom experiment? Network configuration
  30. 30. © The AnyLogic Company | www.anylogic.com 31 What is inside the Custom experiment? Network configuration Training
  31. 31. © The AnyLogic Company | www.anylogic.com 32 What is inside the Custom experiment?
  32. 32. © The AnyLogic Company | www.anylogic.com 33 What is inside the Custom experiment? Array with 10 elements 1 2 34 5 6 87 9
  33. 33. © The AnyLogic Company | www.anylogic.com 34 What is inside the Custom experiment?
  34. 34. © The AnyLogic Company | www.anylogic.com 35 What is inside the Custom experiment? Action == 0: do nothing Action == 1: change the traffic light phase if not yellow
  35. 35. © The AnyLogic Company | www.anylogic.com 36 Comparison of results (Optimized vs. Policy)
  36. 36. © The AnyLogic Company | www.anylogic.com 37
  37. 37. © The AnyLogic Company | www.anylogic.com 38 Comparison of results (Base vs. Optimized vs. Policy) Real systems: Dynamic + Stochastic (exogenous inputs / system internals) Optimization: Optimal fixed input parameters Policy: Optimal (or near-optimal) decisions over time
  38. 38. © The AnyLogic Company | www.anylogic.com 39 Reinforcement learning decision points Hyperparameters Observation Space Action SpaceReward
  39. 39. © The AnyLogic Company | www.anylogic.com 40 Trained policies can be deployed in all types of devices and equipments to adaptively and autonomously complete some tasks. How are learned policies used? Edge devices could be used as controllers to deploy the learned policies.
  40. 40. © The AnyLogic Company | www.anylogic.com 41 Machine Learning powered by Skymind http://www.skymind.ai/anylogic
  41. 41. © The AnyLogic Company | www.anylogic.com 42 • The great news for simulation modelers is that their skills have a new and exciting application now! • To implement a reinforcement learning (or DRL) a team of DRL expert(s) + simulation modeler(s) can collaborate. In theory, it is not necessary for each team to have an in-depth knowledge of the other group’s tasks. • In developing simulation models that are going to be used as training environments, the stakes are higher because the human buffer is no longer there. What should simulation modelers know about this new application?
  42. 42. © The AnyLogic Company | www.anylogic.com 43 At least in near future, there is NO way to automate the process of abstracting reality into a simulation model because it has two aspects that [current] machines are not good at: ̶ The process of abstracting reality is an art ̶ Simulation models are fundamentally based on uncovering causality and how something works Can simulation modelers’ jobs be replaced with AI too?
  43. 43. © The AnyLogic Company | www.anylogic.com 44 thank you!
  • inte16

    May. 16, 2019

Tech on#06 強化学習を使った次世代シミュレーション最適化 Eduardo Gonzalez様@skymind #TechOn東京

Views

Total views

1,454

On Slideshare

0

From embeds

0

Number of embeds

1,042

Actions

Downloads

6

Shares

0

Comments

0

Likes

1

×