From alpha go to alpha zero TLP innova 2018

From Alpha Go
to AlphaZero
TLP Innova
Tenerife 2018

Who I am
Juantomás García
•Chief Envisioning Officer @ Sngular
•GDE (Google Developer Expert) for cloud
•#AbadIA Public Relations
Others
•Co-Author of the first Spanish free software book “La Pastilla Roja”
•Former President of Hispalinux (Spanish Linux User Group)
•Organizer of the Machine Learning Spain and GDG Cloud Madrid.

Who are the Audience
• People interested in Machine Learning
• Wants to know more about what’s is Alpha Go
• With a good technical background

Why I did this presentation
•I love Machine Learning.
•There are a lot of takeaways from this project.
•I wish to divulge it

Outline
•Alpha Go: the epic project
•AlphaGo Zero: re-evolution version
•Alpha Zero: Looking for general solutions
•DIY: Alpha Zero Connect 4
•Takeaways

A brief introduction
• Deep Blue was about brute
force
•They were emulating how
humans play chess

A brief introduction
•A very huge Search Space
Chess -> Opening 20 possible moves
Go -> Opening 361 possible moves

Alpha Go Main Concepts
• Policy Neural Network
“To decide which are the most sensible
moves in a particular board position”.

Alpha Go Main Concepts
• Value Neural Network
“How great is a particular board arrangements”.
“How likely you are to win the game with this position”.

Alpha Go First Approach: SL
• Just train both networks using human games.
• Just old and ordinary Supervised Learning.
• With this: AlphaGo just play with like a weak
human.
• It like the approach of deep blue: just
emulating human chess players

Alpha Go Second Approach: RL
• Improve SL version starting playing again itself.
• With Reinforcement Learning is able to play
well against state of the art go playing programs
• These programs are using MCTS

• It is not 2 NN vs Monte Carlo Tree Search
• Is a better MCTS thanks to the NNs.

• Optimal Value Function V*(s)
“Determine the outcome of the game from every
board position (s is the state)”.
Brute force solution is impossible:
Chess: 35 ** 80
Go: 250 ** 150

•Two solutions for reduce the effective search space:
Truncate the tree subtree search: V(s) like V*(s)
Reducing the breadth of the search with the policy:
P(a|s)
We MCTS rollout the moves choose by the policy
function and evaluate with the optimal value function.

AlphaGo Zero: Re-Evolution version
•Just trained with Reinforcement Learning
•Choose the less out different moves: u(s,a)
•Just one neural network for policy and value.
•Every time a search is done the neural network is
retrained

•Human games was noisy and not reliable.
•Don’t use rollouts for predict who will win.

Alpha Zero: New Challenges
AlphaGo Zero VS AlphaZero:
• Binary outcome (win / loss) × expected outcome (including
• 3 draws or potentially other outcomes)
• Board positions transformed before passing to neural
networks (by randomly selected rotation or redirection) × no
data augmentation
• Games generated by the best player from previous iterations
(margin of 55 %) × continual update using the latest parameters
(without the evaluation and selection steps)
• Hyper-parameters tuned by Bayesian optimisation × reused the
same hyper-parameters without game-specific tuning

Alpha Zero: DYI
https://medium.com/applied-data-science/how-to-build-your-own-alphazero-ai-using-python-and-keras-7

Takeaways
RL is more than Atari Games and GO

Takeaways
AI discovery new ways to play.
Think about new projects like proteins fold.

Takeaways
We’re living awesome times.
Sharing AI papers, tools, models, etc. More
than any time before.

Takeaways
As Ms Fei-Fei said:
“It’s about democratizing AI”

Takeaways
Watch this Documentary Film about Alpha Go:

Questions?
•email: juantomas.garcia@gmail.com
•twitter: @juantomas
This talk have a free questions lifetime warranty: If you have any questions or concerns
about this talk, feel free to contact me anytime.
Selfie Time: If you like the talk just smile while I take
the selfie ;-)
We’re Hiring, Sngular People

From alpha go to alpha zero TLP innova 2018

Recommended

Recommended

More Related Content

Similar to From alpha go to alpha zero TLP innova 2018

Similar to From alpha go to alpha zero TLP innova 2018 (20)

More from Juantomás García Molina

More from Juantomás García Molina (20)

Recently uploaded

Recently uploaded (20)

From alpha go to alpha zero TLP innova 2018