SlideShare a Scribd company logo
1 of 34
An Overview of Reinforcement Learning
with Tic-Tac-Toe and Python
Contents
Introduction to Machine Learning
Software used in this Project
Algorithm used in this Project
Conclusion
Introduction to Machine Learning
Introduction
• machine learning (ML) is a branch of artificial intelligence (AI) that
allows software applications to become more accurate at predicting
outcomes without being explicitly programmed to do so
• Classical machine learning is often categorized by how an algorithm
learns to become more accurate in its predictions
Types of Machine Learning
• Supervised learning: In this type of machine learning, data scientists supply algorithms
with labeled training data and define the variables they want the algorithm to assess for
correlations. Both the input and the output of the algorithm is specified.
• Unsupervised learning: This type of machine learning involves algorithms that train on
unlabeled data. The algorithm scans through datasets looking for any meaningful
connection. The data that algorithms train on as well as the predictions or
recommendations they output are predetermined.
• Reinforcement learning: works by programming an algorithm with a distinct goal and a
prescribed set of rules for accomplishing that goal. Data scientists also program the
algorithm to seek positive rewards -- which it receives when it performs an action that is
beneficial toward the ultimate goal -- and avoid punishments -- which it receives when it
performs an action that gets it farther away from its ultimate goal.
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Software used in this Project
Python Programming Language
• Python is a high-level, general-purpose, and very popular
programming language. Python programming language (latest
Python 3) is being used in web development, Machine Learning
applications.
• Python was used to code this project because of its simplicity and
available modules that makes it very easy to develop this project
• Numpy module is used for matrix transformations
• Pygame module is used to make user interface
Algorithm used in this Project
Representing The Game State Squares as a
Matrix and a hash
• A game state like shown in figure below is represented in the matrix as
[[ 1,0,0 ], [ 0,1,0 ], [ 0,2,2 ] ]. So this 2d array is an instance of our game
state.
• Here X is presented by 1, O is presented by 2 and empty
square is presented by 0 in our 2d array
• Now state matrix is iterated and values are added to a
python string is formed as hash for this current state, the
hash for the shown image will be “100010022”
[[1,0,0],
[0,1,0],
[0,2,2]]
“100010022”
1
0 2
0
2
1
s=0 s=1 s=2 s=3 s=4 s=5 s=6
Generating Symmetrical States
Symmetrical States
“001000000” “000000100” “000000001” “100000000”
s:0 s:2 s:1
1
0 2
0
2
1
0
2
1
1
0 2 1
0 2
0
2
1
Best Move: (2,2) Best Move: (0,0)
New
Game
S:1
Rotation
{“000000001”: { “hash”:”100000000”, “s”:1}
s=0 s=1 s=2 s=3 s=4 s=5 s=6
{ “100000000” : { (1,0):1, (1,1):1, (2,0):1, (2,1): 1, (2,2):1 } }
1
0 2
0
2
1
{ “100000000” : { (1,0):1, (1,1):1, (2,0):1, (2,1): 1, (2,2):1 } }
0.2 0.2 0.2 0.2 0.2
0
0.05
0.1
0.15
0.2
0.25
(1,0) (1,1) (2,0) (2,1) (2,2)
Probability Distribution
How is game data stored
“
“001000000”: { “hash”:”100000000” , ”s”:0 },
“000000001”: { “hash”:”100000000” , ”s”:1 },
“000000100”: { “hash”:”100000000” , ”s”:2 },
“100000000”:{ (0,1):1, (1,1):1, (1,2):1, (2,1):1, (2,2):1 },
“100020001”:{(1,0):1, (2,0):1, (2,1):1 },
“120000100”:{…}
8 “no_of_games” ( int )
“game_space_link” ( dict )
“game_space” ( dict )
game_data ( dict )
Empty game_stack (list)
1
0 2
0
2
1
0
2
1
1
0 2 1
0 2
0
2
1
Best Move: (2,2) Best Move: (0,0)
New
Game
S:1
Rotation
{“000000001”: { “hash”:”100000000”, “s”:1}
Now the bot checks if there is a winning move by traversing the current state matrix.
If a winning move is found then it is played, then if a winning move is not available
the bot checks if there is a next winning move for the player and by traversing the
current state matrix and if a move is available then it is played so it is not available for
the player anymore.
First player makes the first move at (0,0) of the matrix,
Now a matrix and a hash for this current state is
generated
Example of a played game
[[1,0,0],
[0,0,0],
[0,0,0]]
“100000000”
1
0 2
0
2
1
Empty
game_space_link
Check if current hash “100000000” present in game_space link,
if not.
Check if current state present in game_space
If not.
Check if any symmetrical state of current hash present in game_space.
If not then store current hash in game_space with its available moves as values
Empty
game_space
“100000000”:{ (0,1):1, (1,1):1, (1,2):1, (2,1):1, (2,2):1 }
game_space
Now using prababilty distribution on values for moves, we chose the
best move.
1
0 2
0
2
1
Computer choses move (2,2) with probability 1/5,
Now this move and it’s previous state hash will be
pushed in game stack
1
0 2
0
2
1
1
0 2
0
2
1 (2,2) {“hash”:“100000000”, “move”: (2,2)}
game_stack (lst)
Now second move is played at 2,0 so now bot has no
other choice than to chose the move 1,0 as per the
rule
Move 1,0 is played by the bot as a response
1
0 2
0
2
1
1
0 2
0
2
1
Now third move is played at 0,2 so now bot has no
other choice than to chose the move 0,1 or 1,1 as
per the rule
Move 0,1 is played by the bot as a response
1
0 2
0
2
1
1
0 2
0
2
1
Now fourth move is played at 1,1 and the game is
over, player one wins
1
0 2
0
2
1
Now values from game_stack will be poped and be used
to update the values of moves in game_space
“hash”:“100000000”, “move”: (2,2) game_stack
“100000000”:{ (0,1):1, (1,1):1, (1,2):1, (2,1):1, (2,2):1 } game_space
First value poped from game_stack is “hash”:”100000000”, “move”(2,2), and as the game was lost by bot, the value
of move 2,2 from “100000000” in game_space will be subtracted by two
Stack is empty , so updating values in done
Empty game_stack
“100000000”:{ (0,1):1, (1,1):1, (1,2):1, (2,1):1, (2,2):-1} game_space
Now value at (2,2) has changed from 1 to -1.
1
0 2
0
2
1
Empty
game_space_link
“
“001000000”: { “hash”:”100000000” , ”s”:0 },
“000000001”: { “hash”:”100000000” , ”s”:1 },
“000000100”: { “hash”:”100000000” , ”s”:2 },
“100000000”:{ (0,1):-1, (1,1):5, (1,2):-1, (2,1):-1, (2,2):-1 },
“100020001”:{(1,0):3, (2,0):-1, (2,1):3 },
“120000100”:{…}
8 “no_of_games” ( int )
“game_space_link” ( dict )
“game_space” ( dict )
game_data ( dict )
After some games has been played
This data is saved as a pickle file and used when ever a new game is played
{ “100000000” : { (1,0):-1, (1,1):3, (2,0):-1, (2,1):- 1, (2,2):-1 } }
0.0000001
0.9999996
0.0000001 0.0000001 0.0000001
0
0.2
0.4
0.6
0.8
1
1.2
(1,0) (1,1) (2,0) (2,1) (2,2)
Probability Distribution
{ “100000000” : { (1,0):-1, (1,1):3, (2,0):3, (2,1):- 1, (2,2):-1 } }
0.0000001
0.4999998 0.4999998
0.0000001 0.0000001
0
0.1
0.2
0.3
0.4
0.5
0.6
(1,0) (1,1) (2,0) (2,1) (2,2)
Probability Distribution
{ “100000000” : { (1,0):-1, (1,1):-1, (2,0):-1, (2,1):-1, (2,2):-1 } }
0.2 0.2 0.2 0.2 0.2
0
0.05
0.1
0.15
0.2
0.25
(1,0) (1,1) (2,0) (2,1) (2,2)
Probability Distribution
Conclusion
This whole project mainly focuses on developing a Bot that learns how to play tic
tac toe better by gaining experience of playing just like a human would do. This bot
can learn how to not lose any game after getting minimum training as a second
player when the first person to play a move is a human. This bot can identify the
losing pattern by losing just once and thereafter tries to avoid that particular move.
Thank You

More Related Content

Similar to Overview of Reinforcement Learning with Tic Tac Toe and Python

ACI-Webinar-3-MinMaxAlphaBetaPruning-TicTacToe.pptx
ACI-Webinar-3-MinMaxAlphaBetaPruning-TicTacToe.pptxACI-Webinar-3-MinMaxAlphaBetaPruning-TicTacToe.pptx
ACI-Webinar-3-MinMaxAlphaBetaPruning-TicTacToe.pptx
ssuser1eba67
 

Similar to Overview of Reinforcement Learning with Tic Tac Toe and Python (20)

An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)
 
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard WorldMonitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Internship project presentation_final_upload
Internship project presentation_final_uploadInternship project presentation_final_upload
Internship project presentation_final_upload
 
Oct27
Oct27Oct27
Oct27
 
AI Lesson 07
AI Lesson 07AI Lesson 07
AI Lesson 07
 
Los dskn
Los dsknLos dskn
Los dskn
 
Scmad Chapter07
Scmad Chapter07Scmad Chapter07
Scmad Chapter07
 
ACI-Webinar-3-MinMaxAlphaBetaPruning-TicTacToe.pptx
ACI-Webinar-3-MinMaxAlphaBetaPruning-TicTacToe.pptxACI-Webinar-3-MinMaxAlphaBetaPruning-TicTacToe.pptx
ACI-Webinar-3-MinMaxAlphaBetaPruning-TicTacToe.pptx
 
Basic Python Programming: Part 01 and Part 02
Basic Python Programming: Part 01 and Part 02Basic Python Programming: Part 01 and Part 02
Basic Python Programming: Part 01 and Part 02
 
The Ring programming language version 1.4 book - Part 14 of 30
The Ring programming language version 1.4 book - Part 14 of 30The Ring programming language version 1.4 book - Part 14 of 30
The Ring programming language version 1.4 book - Part 14 of 30
 
Ember.js Tokyo event 2014/09/22 (English)
Ember.js Tokyo event 2014/09/22 (English)Ember.js Tokyo event 2014/09/22 (English)
Ember.js Tokyo event 2014/09/22 (English)
 
GANs
GANsGANs
GANs
 
Gans
GansGans
Gans
 
ゆるふわ強化学習入門
ゆるふわ強化学習入門ゆるふわ強化学習入門
ゆるふわ強化学習入門
 
Session2
Session2Session2
Session2
 
Quality Python Homework Help
Quality Python Homework HelpQuality Python Homework Help
Quality Python Homework Help
 
Tech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflowTech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflow
 

Recently uploaded

Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Lisi Hocke
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Recently uploaded (20)

[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
Effective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeConEffective Strategies for Wix's Scaling challenges - GeeCon
Effective Strategies for Wix's Scaling challenges - GeeCon
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & InnovationWSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
WSO2CON 2024 - OSU & WSO2: A Decade Journey in Integration & Innovation
 
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million PeopleWSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 

Overview of Reinforcement Learning with Tic Tac Toe and Python

  • 1. An Overview of Reinforcement Learning with Tic-Tac-Toe and Python
  • 2. Contents Introduction to Machine Learning Software used in this Project Algorithm used in this Project Conclusion
  • 4. Introduction • machine learning (ML) is a branch of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so • Classical machine learning is often categorized by how an algorithm learns to become more accurate in its predictions
  • 5. Types of Machine Learning • Supervised learning: In this type of machine learning, data scientists supply algorithms with labeled training data and define the variables they want the algorithm to assess for correlations. Both the input and the output of the algorithm is specified. • Unsupervised learning: This type of machine learning involves algorithms that train on unlabeled data. The algorithm scans through datasets looking for any meaningful connection. The data that algorithms train on as well as the predictions or recommendations they output are predetermined. • Reinforcement learning: works by programming an algorithm with a distinct goal and a prescribed set of rules for accomplishing that goal. Data scientists also program the algorithm to seek positive rewards -- which it receives when it performs an action that is beneficial toward the ultimate goal -- and avoid punishments -- which it receives when it performs an action that gets it farther away from its ultimate goal.
  • 9. Software used in this Project
  • 10. Python Programming Language • Python is a high-level, general-purpose, and very popular programming language. Python programming language (latest Python 3) is being used in web development, Machine Learning applications. • Python was used to code this project because of its simplicity and available modules that makes it very easy to develop this project • Numpy module is used for matrix transformations • Pygame module is used to make user interface
  • 11. Algorithm used in this Project
  • 12. Representing The Game State Squares as a Matrix and a hash • A game state like shown in figure below is represented in the matrix as [[ 1,0,0 ], [ 0,1,0 ], [ 0,2,2 ] ]. So this 2d array is an instance of our game state. • Here X is presented by 1, O is presented by 2 and empty square is presented by 0 in our 2d array • Now state matrix is iterated and values are added to a python string is formed as hash for this current state, the hash for the shown image will be “100010022”
  • 14. s=0 s=1 s=2 s=3 s=4 s=5 s=6 Generating Symmetrical States Symmetrical States
  • 15. “001000000” “000000100” “000000001” “100000000” s:0 s:2 s:1
  • 16. 1 0 2 0 2 1 0 2 1 1 0 2 1 0 2 0 2 1 Best Move: (2,2) Best Move: (0,0) New Game S:1 Rotation {“000000001”: { “hash”:”100000000”, “s”:1}
  • 17. s=0 s=1 s=2 s=3 s=4 s=5 s=6 { “100000000” : { (1,0):1, (1,1):1, (2,0):1, (2,1): 1, (2,2):1 } } 1 0 2 0 2 1
  • 18. { “100000000” : { (1,0):1, (1,1):1, (2,0):1, (2,1): 1, (2,2):1 } } 0.2 0.2 0.2 0.2 0.2 0 0.05 0.1 0.15 0.2 0.25 (1,0) (1,1) (2,0) (2,1) (2,2) Probability Distribution
  • 19. How is game data stored “ “001000000”: { “hash”:”100000000” , ”s”:0 }, “000000001”: { “hash”:”100000000” , ”s”:1 }, “000000100”: { “hash”:”100000000” , ”s”:2 }, “100000000”:{ (0,1):1, (1,1):1, (1,2):1, (2,1):1, (2,2):1 }, “100020001”:{(1,0):1, (2,0):1, (2,1):1 }, “120000100”:{…} 8 “no_of_games” ( int ) “game_space_link” ( dict ) “game_space” ( dict ) game_data ( dict ) Empty game_stack (list)
  • 20. 1 0 2 0 2 1 0 2 1 1 0 2 1 0 2 0 2 1 Best Move: (2,2) Best Move: (0,0) New Game S:1 Rotation {“000000001”: { “hash”:”100000000”, “s”:1}
  • 21. Now the bot checks if there is a winning move by traversing the current state matrix. If a winning move is found then it is played, then if a winning move is not available the bot checks if there is a next winning move for the player and by traversing the current state matrix and if a move is available then it is played so it is not available for the player anymore.
  • 22. First player makes the first move at (0,0) of the matrix, Now a matrix and a hash for this current state is generated Example of a played game [[1,0,0], [0,0,0], [0,0,0]] “100000000” 1 0 2 0 2 1
  • 23. Empty game_space_link Check if current hash “100000000” present in game_space link, if not. Check if current state present in game_space If not. Check if any symmetrical state of current hash present in game_space. If not then store current hash in game_space with its available moves as values Empty game_space “100000000”:{ (0,1):1, (1,1):1, (1,2):1, (2,1):1, (2,2):1 } game_space Now using prababilty distribution on values for moves, we chose the best move. 1 0 2 0 2 1
  • 24. Computer choses move (2,2) with probability 1/5, Now this move and it’s previous state hash will be pushed in game stack 1 0 2 0 2 1 1 0 2 0 2 1 (2,2) {“hash”:“100000000”, “move”: (2,2)} game_stack (lst)
  • 25. Now second move is played at 2,0 so now bot has no other choice than to chose the move 1,0 as per the rule Move 1,0 is played by the bot as a response 1 0 2 0 2 1 1 0 2 0 2 1
  • 26. Now third move is played at 0,2 so now bot has no other choice than to chose the move 0,1 or 1,1 as per the rule Move 0,1 is played by the bot as a response 1 0 2 0 2 1 1 0 2 0 2 1
  • 27. Now fourth move is played at 1,1 and the game is over, player one wins 1 0 2 0 2 1 Now values from game_stack will be poped and be used to update the values of moves in game_space “hash”:“100000000”, “move”: (2,2) game_stack “100000000”:{ (0,1):1, (1,1):1, (1,2):1, (2,1):1, (2,2):1 } game_space First value poped from game_stack is “hash”:”100000000”, “move”(2,2), and as the game was lost by bot, the value of move 2,2 from “100000000” in game_space will be subtracted by two
  • 28. Stack is empty , so updating values in done Empty game_stack “100000000”:{ (0,1):1, (1,1):1, (1,2):1, (2,1):1, (2,2):-1} game_space Now value at (2,2) has changed from 1 to -1. 1 0 2 0 2 1 Empty game_space_link
  • 29. “ “001000000”: { “hash”:”100000000” , ”s”:0 }, “000000001”: { “hash”:”100000000” , ”s”:1 }, “000000100”: { “hash”:”100000000” , ”s”:2 }, “100000000”:{ (0,1):-1, (1,1):5, (1,2):-1, (2,1):-1, (2,2):-1 }, “100020001”:{(1,0):3, (2,0):-1, (2,1):3 }, “120000100”:{…} 8 “no_of_games” ( int ) “game_space_link” ( dict ) “game_space” ( dict ) game_data ( dict ) After some games has been played This data is saved as a pickle file and used when ever a new game is played
  • 30. { “100000000” : { (1,0):-1, (1,1):3, (2,0):-1, (2,1):- 1, (2,2):-1 } } 0.0000001 0.9999996 0.0000001 0.0000001 0.0000001 0 0.2 0.4 0.6 0.8 1 1.2 (1,0) (1,1) (2,0) (2,1) (2,2) Probability Distribution
  • 31. { “100000000” : { (1,0):-1, (1,1):3, (2,0):3, (2,1):- 1, (2,2):-1 } } 0.0000001 0.4999998 0.4999998 0.0000001 0.0000001 0 0.1 0.2 0.3 0.4 0.5 0.6 (1,0) (1,1) (2,0) (2,1) (2,2) Probability Distribution
  • 32. { “100000000” : { (1,0):-1, (1,1):-1, (2,0):-1, (2,1):-1, (2,2):-1 } } 0.2 0.2 0.2 0.2 0.2 0 0.05 0.1 0.15 0.2 0.25 (1,0) (1,1) (2,0) (2,1) (2,2) Probability Distribution
  • 33. Conclusion This whole project mainly focuses on developing a Bot that learns how to play tic tac toe better by gaining experience of playing just like a human would do. This bot can learn how to not lose any game after getting minimum training as a second player when the first person to play a move is a human. This bot can identify the losing pattern by losing just once and thereafter tries to avoid that particular move.