SlideShare a Scribd company logo
1 of 24
Download to read offline
Self-learning Game Playing
Java Summit
Richard Abbuhl / Freelancer / ING
iSense – 16 May 2018
iSense Java Summit
1. Introduction – Richard
2. Preface
3. Self-playing ingredients
4. It’s All About The Data
5. Neural Networks
6. Reinforcement Learning
7. Monte-Carlo Tree Search
8. Questions?
2
Agenda
iSense Java Summit
• Java / Microservices back-end, Angular / Polymer front-end, Java, Javascript, (C, C++, Ada)
• Java / Machine Learning (Education/Interest), https://github.com/richardabbuhl/jmentor (back-
propagation and reinforcement learning)
• Love the hype.
• What’s AI done for you?
Introduction - Richard
3
iSense Java Summit
Who is on the top right?
Who is on the bottom right?
What do they have in common with machine
learning?
Preface
4
iSense Java SummitBasic Ingredients
5
Machine Learning Game-Playing:
- Rules of the game
- Data
- Machine Learning Algorithm
- Search algorithm
iSense Java SummitML: It’s All About The Data
6
Early business value
• Customer databases
Current business value
• Big data
• Data warehouses
• Data lakes
Note: ETL (Extract, Transform, and Load)
iSense Java SummitML: It’s All About The Data
7
Data set of 30 million moves played by
human experts (available at the KGS Go
server)
iSense Java SummitML: It’s All About The Data
8
What’s wrong with the data?
Errors in Data
Missing Data
Skewed Data
Incomplete Data
Alpha Go: predicted human moves 57%
percent of the time
iSense Java SummitIt’s All About The Data (Synthetic Data)
9
Machine learning algorithms needs lots of
data (AG: 30M+?, TTT: sample )
Big data is expensive?
Alternatives?
iSense Java SummitSynthetic Data Set
10
Approach one: generate a data set
Design is to create realistic data
Easy or not?
iSense Java SummitSynthetic Data Set / Self-Playing
11
Approach two:
Set Machine Learning weights to initial state
For N times:
Play a game:
Player one moves (either ML algorithm or random)
Adjust weights
Player two moves (either ML algorithm or random)
Adjust weights
until done
Random move = synthetic data (anneal over time)
AlphaGo Zero: no human data needed any more (synthetic / self-playing)
iSense Java Summit
Basics:
• Most Deep Learning is based on back-propagation (1986) which is used to train a neural
network to recognize patterns.
• Training is done by presenting two members sets of patterns to the network:
• Ki = {Ai, Bi}, I = 0,…,p – 1
• Where
• Ai = {Xi,0, …, Xi,n-1}
• Bi = {Yi,0, …, Yi,m-1}
Neural Networks
12
iSense Java Summit
Example:
• For the XOR problem the network is:
• 2 inputs, 8 hidden, 1 output
• A training set is defined as:
• 0.0 0.0 0.9
• 0.0 1.0 -0.9
• 1.0 0.0 -0.9
• 1.0 1.0 0.9
Neural Networks
13
iSense Java Summit
Example:
• For the TTT problem the network is:
• 27 inputs, 48 hidden, 1 output
• A training set is defined as:
• 0.0 0.0 0.9 0.0 0.9 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.540
• 0.0 0.9 0.0 0.0 0.0 0.9 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.460
• 0.0 0.0 0.9 0.9 0.0 0.0 0.0 0.9 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.500
• 0.9 0.0 0.0 0.0 0.0 0.9 0.0 0.9 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.460
• 0.0 0.9 0.0 0.9 0.0 0.0 0.0 0.0 0.9 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.500
Neural Networks
14
iSense Java SummitNeural Networks
15
Basics:
• Implemented using a feed-forward multi-layer neural network
iSense Java Summit
Basics:
• Training is done as follows:
1. Initialize the weights and thresholds
2. Present training set Ki to the network
3. Calculate the forward pass of the network
4. Calculate the desired output
5. Adapt the weights
6. Calculate the error for the training set
7. Repeat by going to step 2 (*)
Training stops when the error for all training sets is less than 0.01 (generalize).
Neural Networks
16
iSense Java SummitReinforcement Learning
17
Reinforcement Learning originally coined by Minsky (1961).
If an action take by a learning system is followed by a satisfactory state of affairs, then the
tendency of the system to produce that particular action is strengthened or reinforced.
Otherwise, the tendency of the system to production that action is weakened.
(Sutton et al., 1991, Barto 1992)
iSense Java SummitReinforcement Learning
18
RL differs from supervised learning where learning is done from examples provided by a
knowledgeable external supervisor.
RL attempts to learn from its own experience, four parts:
• Policy: defines the learning agents way of behaving at a give time,
• Reward function: defines the goal of the RL problem,
• Value function: defines what is good in the long run,
• Model: mimics the behavior of the environment
iSense Java SummitReinforcement Learning
19
Policy:
• Rule which tells the player which move to make for
every state of the game
Values:
• First, set up a table of numbers, one for each state
of the game
• Each number is the probability of winning from the
state
iSense Java SummitReinforcement Learning
20
We play many games against our opponent:
• We examine states which result from each possible move
• We look up their current values in the table
Most of the time:
• We move greedily and select the move which has the highest probability of winning
• However, sometimes we randomly select from other moves
iSense Java SummitReinforcement Learning
21
When we are playing:
• We adjust the states using the temporal difference:
• V(s1) = V(s1) + alpha [V(s2) – V(s1)]
• s1 is the state before the greedy move
• s2 is the state after the move
• Alpha is the step-size parameter which is the rate of learning
Number of states for Tic-Tac-Toe: 3 ^ 9 = 19,683
Number of states for Backgammon: 10 ^ 20 = 100,000,000,000,000,000,000
https://github.com/suragnair/alpha-zero-general
iSense Java SummitMonte-Carlo Tree Search
22
1. Selection
Starting at root node R, recursively select
optimal child nodes (explained below) until
a leaf node L is reached.
2. Expansion
If L is a not a terminal node (i.e. it does not
end the game) then create one or more
child nodes and select one C.
3. Simulation
Run a simulated playout from C until a
result is achieved.
4. Backpropagation
Update the current move sequence with
the simulation result.
iSense Java SummitMonte-Carlo Tree Search
23
Monte-Carlo Tree Search (MCTS)
• AlphaGo combines the policy and search value networks in an MCTS algorithm that
selects actions by lookahead search,
• Note: valuating policy and value network requires several orders of magnitude more
computation that traditional search heuristics
Thank you, any questions?

More Related Content

Similar to iSense Java Summit - Self-learning Game Playing

Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Intel Nervana
 
Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Ganesan Narayanasamy
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetAmazon Web Services
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 
SplunkLive! Customer Presentation - Cisco Systems, Inc.
SplunkLive! Customer Presentation - Cisco Systems, Inc.SplunkLive! Customer Presentation - Cisco Systems, Inc.
SplunkLive! Customer Presentation - Cisco Systems, Inc.Splunk
 
2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users GroupNitay Joffe
 
Splunk Live in RTP - March-2014-Jeff-Bollinger-Cisco
Splunk Live in RTP - March-2014-Jeff-Bollinger-CiscoSplunk Live in RTP - March-2014-Jeff-Bollinger-Cisco
Splunk Live in RTP - March-2014-Jeff-Bollinger-CiscoJeff Bollinger
 
Hardware Implementation of Cascade SVM
Hardware Implementation of Cascade SVMHardware Implementation of Cascade SVM
Hardware Implementation of Cascade SVMQian Wang
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch Eran Shlomo
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Amazon Web Services
 
Sippin: A Mobile Application Case Study presented at Techfest Louisville
Sippin: A Mobile Application Case Study presented at Techfest LouisvilleSippin: A Mobile Application Case Study presented at Techfest Louisville
Sippin: A Mobile Application Case Study presented at Techfest LouisvilleDawn Yankeelov
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwordsNitay Joffe
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANNMohamed Talaat
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsGreg Makowski
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfAnkita Tiwari
 

Similar to iSense Java Summit - Self-learning Game Playing (20)

Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications
 
Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
SplunkLive! Customer Presentation - Cisco Systems, Inc.
SplunkLive! Customer Presentation - Cisco Systems, Inc.SplunkLive! Customer Presentation - Cisco Systems, Inc.
SplunkLive! Customer Presentation - Cisco Systems, Inc.
 
Associative memory network
Associative memory networkAssociative memory network
Associative memory network
 
2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group
 
Splunk Live in RTP - March-2014-Jeff-Bollinger-Cisco
Splunk Live in RTP - March-2014-Jeff-Bollinger-CiscoSplunk Live in RTP - March-2014-Jeff-Bollinger-Cisco
Splunk Live in RTP - March-2014-Jeff-Bollinger-Cisco
 
Hardware Implementation of Cascade SVM
Hardware Implementation of Cascade SVMHardware Implementation of Cascade SVM
Hardware Implementation of Cascade SVM
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
DIY Java Profiling
DIY Java ProfilingDIY Java Profiling
DIY Java Profiling
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)
 
Sippin: A Mobile Application Case Study presented at Techfest Louisville
Sippin: A Mobile Application Case Study presented at Techfest LouisvilleSippin: A Mobile Application Case Study presented at Techfest Louisville
Sippin: A Mobile Application Case Study presented at Techfest Louisville
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 

Recently uploaded

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

iSense Java Summit - Self-learning Game Playing

  • 1. Self-learning Game Playing Java Summit Richard Abbuhl / Freelancer / ING iSense – 16 May 2018
  • 2. iSense Java Summit 1. Introduction – Richard 2. Preface 3. Self-playing ingredients 4. It’s All About The Data 5. Neural Networks 6. Reinforcement Learning 7. Monte-Carlo Tree Search 8. Questions? 2 Agenda
  • 3. iSense Java Summit • Java / Microservices back-end, Angular / Polymer front-end, Java, Javascript, (C, C++, Ada) • Java / Machine Learning (Education/Interest), https://github.com/richardabbuhl/jmentor (back- propagation and reinforcement learning) • Love the hype. • What’s AI done for you? Introduction - Richard 3
  • 4. iSense Java Summit Who is on the top right? Who is on the bottom right? What do they have in common with machine learning? Preface 4
  • 5. iSense Java SummitBasic Ingredients 5 Machine Learning Game-Playing: - Rules of the game - Data - Machine Learning Algorithm - Search algorithm
  • 6. iSense Java SummitML: It’s All About The Data 6 Early business value • Customer databases Current business value • Big data • Data warehouses • Data lakes Note: ETL (Extract, Transform, and Load)
  • 7. iSense Java SummitML: It’s All About The Data 7 Data set of 30 million moves played by human experts (available at the KGS Go server)
  • 8. iSense Java SummitML: It’s All About The Data 8 What’s wrong with the data? Errors in Data Missing Data Skewed Data Incomplete Data Alpha Go: predicted human moves 57% percent of the time
  • 9. iSense Java SummitIt’s All About The Data (Synthetic Data) 9 Machine learning algorithms needs lots of data (AG: 30M+?, TTT: sample ) Big data is expensive? Alternatives?
  • 10. iSense Java SummitSynthetic Data Set 10 Approach one: generate a data set Design is to create realistic data Easy or not?
  • 11. iSense Java SummitSynthetic Data Set / Self-Playing 11 Approach two: Set Machine Learning weights to initial state For N times: Play a game: Player one moves (either ML algorithm or random) Adjust weights Player two moves (either ML algorithm or random) Adjust weights until done Random move = synthetic data (anneal over time) AlphaGo Zero: no human data needed any more (synthetic / self-playing)
  • 12. iSense Java Summit Basics: • Most Deep Learning is based on back-propagation (1986) which is used to train a neural network to recognize patterns. • Training is done by presenting two members sets of patterns to the network: • Ki = {Ai, Bi}, I = 0,…,p – 1 • Where • Ai = {Xi,0, …, Xi,n-1} • Bi = {Yi,0, …, Yi,m-1} Neural Networks 12
  • 13. iSense Java Summit Example: • For the XOR problem the network is: • 2 inputs, 8 hidden, 1 output • A training set is defined as: • 0.0 0.0 0.9 • 0.0 1.0 -0.9 • 1.0 0.0 -0.9 • 1.0 1.0 0.9 Neural Networks 13
  • 14. iSense Java Summit Example: • For the TTT problem the network is: • 27 inputs, 48 hidden, 1 output • A training set is defined as: • 0.0 0.0 0.9 0.0 0.9 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.540 • 0.0 0.9 0.0 0.0 0.0 0.9 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.460 • 0.0 0.0 0.9 0.9 0.0 0.0 0.0 0.9 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.500 • 0.9 0.0 0.0 0.0 0.0 0.9 0.0 0.9 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.460 • 0.0 0.9 0.0 0.9 0.0 0.0 0.0 0.0 0.9 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.9 0.0 0.0 0.500 Neural Networks 14
  • 15. iSense Java SummitNeural Networks 15 Basics: • Implemented using a feed-forward multi-layer neural network
  • 16. iSense Java Summit Basics: • Training is done as follows: 1. Initialize the weights and thresholds 2. Present training set Ki to the network 3. Calculate the forward pass of the network 4. Calculate the desired output 5. Adapt the weights 6. Calculate the error for the training set 7. Repeat by going to step 2 (*) Training stops when the error for all training sets is less than 0.01 (generalize). Neural Networks 16
  • 17. iSense Java SummitReinforcement Learning 17 Reinforcement Learning originally coined by Minsky (1961). If an action take by a learning system is followed by a satisfactory state of affairs, then the tendency of the system to produce that particular action is strengthened or reinforced. Otherwise, the tendency of the system to production that action is weakened. (Sutton et al., 1991, Barto 1992)
  • 18. iSense Java SummitReinforcement Learning 18 RL differs from supervised learning where learning is done from examples provided by a knowledgeable external supervisor. RL attempts to learn from its own experience, four parts: • Policy: defines the learning agents way of behaving at a give time, • Reward function: defines the goal of the RL problem, • Value function: defines what is good in the long run, • Model: mimics the behavior of the environment
  • 19. iSense Java SummitReinforcement Learning 19 Policy: • Rule which tells the player which move to make for every state of the game Values: • First, set up a table of numbers, one for each state of the game • Each number is the probability of winning from the state
  • 20. iSense Java SummitReinforcement Learning 20 We play many games against our opponent: • We examine states which result from each possible move • We look up their current values in the table Most of the time: • We move greedily and select the move which has the highest probability of winning • However, sometimes we randomly select from other moves
  • 21. iSense Java SummitReinforcement Learning 21 When we are playing: • We adjust the states using the temporal difference: • V(s1) = V(s1) + alpha [V(s2) – V(s1)] • s1 is the state before the greedy move • s2 is the state after the move • Alpha is the step-size parameter which is the rate of learning Number of states for Tic-Tac-Toe: 3 ^ 9 = 19,683 Number of states for Backgammon: 10 ^ 20 = 100,000,000,000,000,000,000 https://github.com/suragnair/alpha-zero-general
  • 22. iSense Java SummitMonte-Carlo Tree Search 22 1. Selection Starting at root node R, recursively select optimal child nodes (explained below) until a leaf node L is reached. 2. Expansion If L is a not a terminal node (i.e. it does not end the game) then create one or more child nodes and select one C. 3. Simulation Run a simulated playout from C until a result is achieved. 4. Backpropagation Update the current move sequence with the simulation result.
  • 23. iSense Java SummitMonte-Carlo Tree Search 23 Monte-Carlo Tree Search (MCTS) • AlphaGo combines the policy and search value networks in an MCTS algorithm that selects actions by lookahead search, • Note: valuating policy and value network requires several orders of magnitude more computation that traditional search heuristics
  • 24. Thank you, any questions?