SlideShare a Scribd company logo
Artificial Intelligence
Bellman Equation
Introduction
Portland Data Science Group
Created by Andrew Ferlitsch
Community Outreach Officer
July, 2017
Introduction
• A method for calculating value functions in dynamic
environments.
• Invented by Richard Ernest Bellman in 1953.
• Father of Dynamic Programming, which lead to
modern Reinforcement Programming.
• Concepts include:
• Reward
• Discount Factor
• Deterministic vs. Non-Deterministic
• Plan vs Policy
Basics
• Terminology
S -> Set of all possible States
A -> Set of all possible Actions from a given State
s -> A specific State
a -> A specific Action
Sa Sb Sc
Sd Se Sf
Sg Sh Sm
Si Sj Sk Sl
Start Node
Goal Node
S -> { Sa, Sb, Sc … Sm }
Aa -> { Down, Right }
Ab -> { Down, Left }
Ac -> { Down, Left }
Ad -> { Up, Down, Right }
...
Am -> {} Goal State
s -> Si
a -> Ai -> { Up, Right }
Reward
• Terminology
R -> The ‘Reward’ for being at some state.
v(s) -> Value Function – The anticipated reward for being at a
specific state.
Sa Sb Sc
Sd Se Sf
Sg Sh Sm
Si Sj Sk Sl
Start Node
Goal Node
R = 1
v(Sa) -> 0
v(Sb) -> 0
v(Sc) -> 0
v(Sd) -> 0
…
v(Sl) -> 0
v(Sm) -> 1 Goal State
All Other (non-Goal) Nodes
R = 0
Non-Goal State
Without a Plan or Policy, a Reward cannot be anticipated until we reach the Goal Node.
Discount Factor
• Terminology
t, t+1, t+2, … -> Time step intervals, each corresponding to an action
and a new state.
Rt+1 -> The Reward at the next time step after an action has occurred.
St+1 -> The State at the next time step after an action has occurred.
γ -> (gamma) Discount Factor between 0 and 1.
• The Discount Factor accounts for uncertainty in
obtaining a future reward
Sn-1
Sn
Sn-2
Goal Node, R =1If at Goal, receiving
the reward is certain.
If one step away, receiving
the reward is less certain.
Even further away, receiving
the reward is even less certain.
R = 0
R = 0
Bellman Equation
• Principle of the Bellman Equation
v(s) = Rt + γ Rt+1 + γ2 Rt+2+ γ3 Rt+3 … + γn Rt+n
The value of some state s is the
sum of rewards to a terminal state
state, with the reward of each
successive state discounted.
The reward at the next
Step after taken some
action a.
The reward at subsequent
state is discounted by γ
The reward at next subsequent
state is further discounted by γ2
Discount Factor is
Increased exponentially
Sn-1
Sn
Sn-2
Goal Node, R =1
γ =1
v(Sn) = 1
v(Sn-1) = 1
v(Sn-2) = 1
R = 0 , v(s) = 0 + 1
R = 0, v(s) = 0 + 0 + 1
Sn-1
Sn
Sn-2
Goal Node, R =1
γ = 0.9
v(Sn) = 1
v(Sn-1) = 0.9
v(Sn-2) = 0.81
R = 0 , v(s) = 0 + .9(1)
R = 0, v(s) = 0 + 0 + .9*.9(1)
Note, the Reward
and value are not
the same thing.
Bellman Principle of Optimality
• Bellman Equation – Factored
v(s) = Rt + γ Rt+1 + γ2 Rt+2+ γ3 Rt+3 … + γn Rt+n
v(St+1)
v(s) = Rt + γ( v(St+1) )
• Bellman Optimality – the value of a state is based on the best
action (optimal) for that state, and each subsequent state.
v(s) = argmax( R(s,a) + γ( v(St+1) ) )
a
The action a at state s
which maximizes the
reward.
Bellman Optimality Example
R=1
R=-1Wall
R=10.9
R=-1
γ = 0.9
Wall
R=10.90.81
R=-10.81Wall
Calculate 1 step away Calculate adjacent steps
Best action is move
to the goal node.
Best action is move to
the node with the highest
value.
R=10.90.81
R=-10.81
0.73
Wall
Calculate adjacent steps
R=10.90.81
R=-10.81
0.660.730.66
Wall
Calculate adjacent steps
This produces a plan.
The Optimal Action (Move)
for each state.
Deterministic vs. Non-Deterministic
• Deterministic – The action taken has a 100% certainty of
the expected (desired) outcome => Plan
e.g., in our Grid World example, there is a 100% certainty that
if the action is to move left, that you will move left.
• Non-Deterministic (Stochastic) – The action taken has
less than a 100% certainty of the expected outcome =>
Policy
e.g., if a Robot is in a standing state and the action is to run,
there maybe a 80% of succeeding, but a 20% probability
of falling down.
Bellman Optimality with Probabilities
• Terminology
R(s,a) -> The Reward when at state s and action a is taken.
P(s,a,St+1’) -> The probabilities that when at state s and action a is taken,
of being in one of the successor states St+1’.
• When the outcome is stochastic, we replace the value
of the desired state with the values (summation) of the
possible successor states times their probability:
v(s) = argmax( R(s,a) + γ( v(St+1) ) )
a
v(s) = argmax( R(s,a) + γ∑ P(s,a,St+1’) v(St+1’) )
a St+1’
Bellman Optimality Example
R=1SbSa
R=-1Wall Sc
γ = 0.9
Sb, Right -> { 80% Left,
10% Right,
10% Down }
R=10.72R=0
R=-1Wall R=0
γ = 0.9
v(Sb) = 0 + .9( .8(1) + .1(0) + .1(0)
80% Probability
10% Probability
R=10.72Sa
R=-1Wall Sc
γ = 0.9
Sc, Up -> { 80% Up,
10% Left,
10% Right }
R=10.72
R=-1Wall 0.48
R=0
γ = 0.9
v(Sc) = 0 + .9( .8(.72) + .1(0) + .1(-1)
80% Probability
10% Probability
Greedy vs. Optimal
R=1Sb
R=-1Wall 0.48
γ = 0.9
• Greedy – Take the Action with the highest Probability of
a Reward -> Plan (act as if deterministic).
Sc, Up -> { 80% Sb,
10% Left,
10% Right (‘The Pit’ – terminal state) }
10% of the time will end up in negative terminal state!
Greedy vs. Optimal
R=1Sb
R=-1Wall 0.07
γ = 0.9
• Optimal – Take the Action with certainty we will
proceed towards a positive reward -> Policy.
Sc, Left -> { 80% Wall and Bounce back to Sc,
10% Up (Sb),
10% Down }
If we choose Left, we have 80% chance of bouncing into
the wall and being back where we were.
If we keep bouncing off the wall, eventually we will go Up or Down
(10% of the time), and never go into the Pit!
v(Sc, Left) = 0 + .9( .8(0) + .1(0.72) + .1(0)
Lifespan Penalty
• Lifespan Penalty – There is a cost to each action.
R=1R=-.1R=-.1
R=-1Wall Sc
R=-.1R=-.1R=-.1
γ = 0.9
Sc, Left -> { 80% Wall,
10% Left,
10% Right}
R=1Sb
R=-1Wall -0.02
γ = 0.9
v(Sc, Left) = 0 + .9( .8(-.1) + .1(0.72) + .1(-.1)
When there is a penalty in each action, the best policy might be to take the
chance of falling into the pit!
Penalty
Not Covered
• When probabilities are learned ( not pre-known ) ->
Backward Propagation.
• Suboptimal Solutions for HUGE search spaces.
THIS IS MORE LIKE THE REAL WORLD!

More Related Content

What's hot

Interval Type-2 fuzzy decision making
Interval Type-2 fuzzy decision makingInterval Type-2 fuzzy decision making
Interval Type-2 fuzzy decision makingBob John
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolutionAmar Jukuntla
 
Goal stack planning.ppt
Goal stack planning.pptGoal stack planning.ppt
Goal stack planning.pptSadagopanS
 
Genetic Algorithm in Artificial Intelligence
Genetic Algorithm in Artificial IntelligenceGenetic Algorithm in Artificial Intelligence
Genetic Algorithm in Artificial IntelligenceSinbad Konick
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMvikas dhakane
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialOmar Enayet
 
Adversarial search
Adversarial searchAdversarial search
Adversarial searchNilu Desai
 
Artificial intelligence- Logic Agents
Artificial intelligence- Logic AgentsArtificial intelligence- Logic Agents
Artificial intelligence- Logic AgentsNuruzzaman Milon
 
Simulated annealing
Simulated annealingSimulated annealing
Simulated annealingDaniel Suria
 
Kalman filter for object tracking
Kalman filter for object trackingKalman filter for object tracking
Kalman filter for object trackingMohit Yadav
 
Adversarial Search
Adversarial SearchAdversarial Search
Adversarial SearchMegha Sharma
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision processVARUN KUMAR
 
UNIT - I PROBLEM SOLVING AGENTS and EXAMPLES.pptx.pdf
UNIT - I PROBLEM SOLVING AGENTS and EXAMPLES.pptx.pdfUNIT - I PROBLEM SOLVING AGENTS and EXAMPLES.pptx.pdf
UNIT - I PROBLEM SOLVING AGENTS and EXAMPLES.pptx.pdfJenishaR1
 

What's hot (20)

Interval Type-2 fuzzy decision making
Interval Type-2 fuzzy decision makingInterval Type-2 fuzzy decision making
Interval Type-2 fuzzy decision making
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolution
 
Goal stack planning.ppt
Goal stack planning.pptGoal stack planning.ppt
Goal stack planning.ppt
 
Genetic Algorithm in Artificial Intelligence
Genetic Algorithm in Artificial IntelligenceGenetic Algorithm in Artificial Intelligence
Genetic Algorithm in Artificial Intelligence
 
Classical Planning
Classical PlanningClassical Planning
Classical Planning
 
I. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHMI. AO* SEARCH ALGORITHM
I. AO* SEARCH ALGORITHM
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
 
Artificial intelligence- Logic Agents
Artificial intelligence- Logic AgentsArtificial intelligence- Logic Agents
Artificial intelligence- Logic Agents
 
Simulated annealing
Simulated annealingSimulated annealing
Simulated annealing
 
Kalman filter for object tracking
Kalman filter for object trackingKalman filter for object tracking
Kalman filter for object tracking
 
Adversarial Search
Adversarial SearchAdversarial Search
Adversarial Search
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision process
 
UNIT - I PROBLEM SOLVING AGENTS and EXAMPLES.pptx.pdf
UNIT - I PROBLEM SOLVING AGENTS and EXAMPLES.pptx.pdfUNIT - I PROBLEM SOLVING AGENTS and EXAMPLES.pptx.pdf
UNIT - I PROBLEM SOLVING AGENTS and EXAMPLES.pptx.pdf
 

Similar to AI - Introduction to Bellman Equations

Cs221 lecture8-fall11
Cs221 lecture8-fall11Cs221 lecture8-fall11
Cs221 lecture8-fall11darwinrlo
 
Bellman's equation Reinforcement learning - II
Bellman's equation Reinforcement learning - IIBellman's equation Reinforcement learning - II
Bellman's equation Reinforcement learning - IIMegha Sharma
 
AI - Introduction to Markov Principles
AI - Introduction to Markov PrinciplesAI - Introduction to Markov Principles
AI - Introduction to Markov PrinciplesAndrew Ferlitsch
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
Cs229 notes12
Cs229 notes12Cs229 notes12
Cs229 notes12VuTran231
 
Free fall PHYSICS IGCSE FORM 3 MRSM
Free fall PHYSICS IGCSE  FORM 3 MRSMFree fall PHYSICS IGCSE  FORM 3 MRSM
Free fall PHYSICS IGCSE FORM 3 MRSMNurul Fadhilah
 
Group No 05, calculus.pptx
Group No 05, calculus.pptxGroup No 05, calculus.pptx
Group No 05, calculus.pptxEmonKundu
 
Introduction to reinforcement learning - Phu Nguyen
Introduction to reinforcement learning - Phu NguyenIntroduction to reinforcement learning - Phu Nguyen
Introduction to reinforcement learning - Phu NguyenTu Le Dinh
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptxRithikRaj25
 
AI - Local Search - Hill Climbing
AI - Local Search - Hill ClimbingAI - Local Search - Hill Climbing
AI - Local Search - Hill ClimbingAndrew Ferlitsch
 
S19_lecture6_exploreexploitinbandits.pdf
S19_lecture6_exploreexploitinbandits.pdfS19_lecture6_exploreexploitinbandits.pdf
S19_lecture6_exploreexploitinbandits.pdfLPrashanthi
 
Machine learning (13)
Machine learning (13)Machine learning (13)
Machine learning (13)NYversity
 

Similar to AI - Introduction to Bellman Equations (20)

Cs221 lecture8-fall11
Cs221 lecture8-fall11Cs221 lecture8-fall11
Cs221 lecture8-fall11
 
Deep RL.pdf
Deep RL.pdfDeep RL.pdf
Deep RL.pdf
 
Bellman's equation Reinforcement learning - II
Bellman's equation Reinforcement learning - IIBellman's equation Reinforcement learning - II
Bellman's equation Reinforcement learning - II
 
AI - Introduction to Markov Principles
AI - Introduction to Markov PrinciplesAI - Introduction to Markov Principles
AI - Introduction to Markov Principles
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Fundamentals of RL.pptx
Fundamentals of RL.pptxFundamentals of RL.pptx
Fundamentals of RL.pptx
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
2-D Transformations.pdf
2-D Transformations.pdf2-D Transformations.pdf
2-D Transformations.pdf
 
Cs229 notes12
Cs229 notes12Cs229 notes12
Cs229 notes12
 
Free fall PHYSICS IGCSE FORM 3 MRSM
Free fall PHYSICS IGCSE  FORM 3 MRSMFree fall PHYSICS IGCSE  FORM 3 MRSM
Free fall PHYSICS IGCSE FORM 3 MRSM
 
Group No 05, calculus.pptx
Group No 05, calculus.pptxGroup No 05, calculus.pptx
Group No 05, calculus.pptx
 
Introduction to reinforcement learning - Phu Nguyen
Introduction to reinforcement learning - Phu NguyenIntroduction to reinforcement learning - Phu Nguyen
Introduction to reinforcement learning - Phu Nguyen
 
RL intro
RL introRL intro
RL intro
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
 
Tracking[1]
Tracking[1]Tracking[1]
Tracking[1]
 
Finalver
FinalverFinalver
Finalver
 
AI - Local Search - Hill Climbing
AI - Local Search - Hill ClimbingAI - Local Search - Hill Climbing
AI - Local Search - Hill Climbing
 
S19_lecture6_exploreexploitinbandits.pdf
S19_lecture6_exploreexploitinbandits.pdfS19_lecture6_exploreexploitinbandits.pdf
S19_lecture6_exploreexploitinbandits.pdf
 
Making Complex Decisions(Artificial Intelligence)
Making Complex Decisions(Artificial Intelligence)Making Complex Decisions(Artificial Intelligence)
Making Complex Decisions(Artificial Intelligence)
 
Machine learning (13)
Machine learning (13)Machine learning (13)
Machine learning (13)
 

More from Andrew Ferlitsch

Pareto Principle Applied to QA
Pareto Principle Applied to QAPareto Principle Applied to QA
Pareto Principle Applied to QAAndrew Ferlitsch
 
Whiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in PythonWhiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in PythonAndrew Ferlitsch
 
Object Oriented Programming Principles
Object Oriented Programming PrinciplesObject Oriented Programming Principles
Object Oriented Programming PrinciplesAndrew Ferlitsch
 
Python - Installing and Using Python and Jupyter Notepad
Python - Installing and Using Python and Jupyter NotepadPython - Installing and Using Python and Jupyter Notepad
Python - Installing and Using Python and Jupyter NotepadAndrew Ferlitsch
 
Natural Language Processing - Groupings (Associations) Generation
Natural Language Processing - Groupings (Associations) GenerationNatural Language Processing - Groupings (Associations) Generation
Natural Language Processing - Groupings (Associations) GenerationAndrew Ferlitsch
 
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...Andrew Ferlitsch
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksAndrew Ferlitsch
 
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksAndrew Ferlitsch
 
Machine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural NetworksMachine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural NetworksAndrew Ferlitsch
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsAndrew Ferlitsch
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear RegressionAndrew Ferlitsch
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear RegressionAndrew Ferlitsch
 
Machine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable ConversionMachine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable ConversionAndrew Ferlitsch
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsAndrew Ferlitsch
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset PreparationAndrew Ferlitsch
 
Machine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowMachine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowAndrew Ferlitsch
 

More from Andrew Ferlitsch (20)

AI - Intelligent Agents
AI - Intelligent AgentsAI - Intelligent Agents
AI - Intelligent Agents
 
Pareto Principle Applied to QA
Pareto Principle Applied to QAPareto Principle Applied to QA
Pareto Principle Applied to QA
 
Whiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in PythonWhiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in Python
 
Object Oriented Programming Principles
Object Oriented Programming PrinciplesObject Oriented Programming Principles
Object Oriented Programming Principles
 
Python - OOP Programming
Python - OOP ProgrammingPython - OOP Programming
Python - OOP Programming
 
Python - Installing and Using Python and Jupyter Notepad
Python - Installing and Using Python and Jupyter NotepadPython - Installing and Using Python and Jupyter Notepad
Python - Installing and Using Python and Jupyter Notepad
 
Natural Language Processing - Groupings (Associations) Generation
Natural Language Processing - Groupings (Associations) GenerationNatural Language Processing - Groupings (Associations) Generation
Natural Language Processing - Groupings (Associations) Generation
 
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural Networks
 
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural Networks
 
Machine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural NetworksMachine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural Networks
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear Regression
 
Machine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable ConversionMachine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable Conversion
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
Machine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowMachine Learning - Introduction to Tensorflow
Machine Learning - Introduction to Tensorflow
 

Recently uploaded

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfChristopherTHyatt
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfalexjohnson7307
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 

Recently uploaded (20)

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 

AI - Introduction to Bellman Equations

  • 1. Artificial Intelligence Bellman Equation Introduction Portland Data Science Group Created by Andrew Ferlitsch Community Outreach Officer July, 2017
  • 2. Introduction • A method for calculating value functions in dynamic environments. • Invented by Richard Ernest Bellman in 1953. • Father of Dynamic Programming, which lead to modern Reinforcement Programming. • Concepts include: • Reward • Discount Factor • Deterministic vs. Non-Deterministic • Plan vs Policy
  • 3. Basics • Terminology S -> Set of all possible States A -> Set of all possible Actions from a given State s -> A specific State a -> A specific Action Sa Sb Sc Sd Se Sf Sg Sh Sm Si Sj Sk Sl Start Node Goal Node S -> { Sa, Sb, Sc … Sm } Aa -> { Down, Right } Ab -> { Down, Left } Ac -> { Down, Left } Ad -> { Up, Down, Right } ... Am -> {} Goal State s -> Si a -> Ai -> { Up, Right }
  • 4. Reward • Terminology R -> The ‘Reward’ for being at some state. v(s) -> Value Function – The anticipated reward for being at a specific state. Sa Sb Sc Sd Se Sf Sg Sh Sm Si Sj Sk Sl Start Node Goal Node R = 1 v(Sa) -> 0 v(Sb) -> 0 v(Sc) -> 0 v(Sd) -> 0 … v(Sl) -> 0 v(Sm) -> 1 Goal State All Other (non-Goal) Nodes R = 0 Non-Goal State Without a Plan or Policy, a Reward cannot be anticipated until we reach the Goal Node.
  • 5. Discount Factor • Terminology t, t+1, t+2, … -> Time step intervals, each corresponding to an action and a new state. Rt+1 -> The Reward at the next time step after an action has occurred. St+1 -> The State at the next time step after an action has occurred. γ -> (gamma) Discount Factor between 0 and 1. • The Discount Factor accounts for uncertainty in obtaining a future reward Sn-1 Sn Sn-2 Goal Node, R =1If at Goal, receiving the reward is certain. If one step away, receiving the reward is less certain. Even further away, receiving the reward is even less certain. R = 0 R = 0
  • 6. Bellman Equation • Principle of the Bellman Equation v(s) = Rt + γ Rt+1 + γ2 Rt+2+ γ3 Rt+3 … + γn Rt+n The value of some state s is the sum of rewards to a terminal state state, with the reward of each successive state discounted. The reward at the next Step after taken some action a. The reward at subsequent state is discounted by γ The reward at next subsequent state is further discounted by γ2 Discount Factor is Increased exponentially Sn-1 Sn Sn-2 Goal Node, R =1 γ =1 v(Sn) = 1 v(Sn-1) = 1 v(Sn-2) = 1 R = 0 , v(s) = 0 + 1 R = 0, v(s) = 0 + 0 + 1 Sn-1 Sn Sn-2 Goal Node, R =1 γ = 0.9 v(Sn) = 1 v(Sn-1) = 0.9 v(Sn-2) = 0.81 R = 0 , v(s) = 0 + .9(1) R = 0, v(s) = 0 + 0 + .9*.9(1) Note, the Reward and value are not the same thing.
  • 7. Bellman Principle of Optimality • Bellman Equation – Factored v(s) = Rt + γ Rt+1 + γ2 Rt+2+ γ3 Rt+3 … + γn Rt+n v(St+1) v(s) = Rt + γ( v(St+1) ) • Bellman Optimality – the value of a state is based on the best action (optimal) for that state, and each subsequent state. v(s) = argmax( R(s,a) + γ( v(St+1) ) ) a The action a at state s which maximizes the reward.
  • 8. Bellman Optimality Example R=1 R=-1Wall R=10.9 R=-1 γ = 0.9 Wall R=10.90.81 R=-10.81Wall Calculate 1 step away Calculate adjacent steps Best action is move to the goal node. Best action is move to the node with the highest value. R=10.90.81 R=-10.81 0.73 Wall Calculate adjacent steps R=10.90.81 R=-10.81 0.660.730.66 Wall Calculate adjacent steps This produces a plan. The Optimal Action (Move) for each state.
  • 9. Deterministic vs. Non-Deterministic • Deterministic – The action taken has a 100% certainty of the expected (desired) outcome => Plan e.g., in our Grid World example, there is a 100% certainty that if the action is to move left, that you will move left. • Non-Deterministic (Stochastic) – The action taken has less than a 100% certainty of the expected outcome => Policy e.g., if a Robot is in a standing state and the action is to run, there maybe a 80% of succeeding, but a 20% probability of falling down.
  • 10. Bellman Optimality with Probabilities • Terminology R(s,a) -> The Reward when at state s and action a is taken. P(s,a,St+1’) -> The probabilities that when at state s and action a is taken, of being in one of the successor states St+1’. • When the outcome is stochastic, we replace the value of the desired state with the values (summation) of the possible successor states times their probability: v(s) = argmax( R(s,a) + γ( v(St+1) ) ) a v(s) = argmax( R(s,a) + γ∑ P(s,a,St+1’) v(St+1’) ) a St+1’
  • 11. Bellman Optimality Example R=1SbSa R=-1Wall Sc γ = 0.9 Sb, Right -> { 80% Left, 10% Right, 10% Down } R=10.72R=0 R=-1Wall R=0 γ = 0.9 v(Sb) = 0 + .9( .8(1) + .1(0) + .1(0) 80% Probability 10% Probability R=10.72Sa R=-1Wall Sc γ = 0.9 Sc, Up -> { 80% Up, 10% Left, 10% Right } R=10.72 R=-1Wall 0.48 R=0 γ = 0.9 v(Sc) = 0 + .9( .8(.72) + .1(0) + .1(-1) 80% Probability 10% Probability
  • 12. Greedy vs. Optimal R=1Sb R=-1Wall 0.48 γ = 0.9 • Greedy – Take the Action with the highest Probability of a Reward -> Plan (act as if deterministic). Sc, Up -> { 80% Sb, 10% Left, 10% Right (‘The Pit’ – terminal state) } 10% of the time will end up in negative terminal state!
  • 13. Greedy vs. Optimal R=1Sb R=-1Wall 0.07 γ = 0.9 • Optimal – Take the Action with certainty we will proceed towards a positive reward -> Policy. Sc, Left -> { 80% Wall and Bounce back to Sc, 10% Up (Sb), 10% Down } If we choose Left, we have 80% chance of bouncing into the wall and being back where we were. If we keep bouncing off the wall, eventually we will go Up or Down (10% of the time), and never go into the Pit! v(Sc, Left) = 0 + .9( .8(0) + .1(0.72) + .1(0)
  • 14. Lifespan Penalty • Lifespan Penalty – There is a cost to each action. R=1R=-.1R=-.1 R=-1Wall Sc R=-.1R=-.1R=-.1 γ = 0.9 Sc, Left -> { 80% Wall, 10% Left, 10% Right} R=1Sb R=-1Wall -0.02 γ = 0.9 v(Sc, Left) = 0 + .9( .8(-.1) + .1(0.72) + .1(-.1) When there is a penalty in each action, the best policy might be to take the chance of falling into the pit! Penalty
  • 15. Not Covered • When probabilities are learned ( not pre-known ) -> Backward Propagation. • Suboptimal Solutions for HUGE search spaces. THIS IS MORE LIKE THE REAL WORLD!