SlideShare a Scribd company logo
Making Complex Decisions
Department of Computer Science & Engineering
Hamdard University Bangladesh
1
Sequential Decisions
• Agent’s utility depends on a sequence of decisions
• Also known as accessible, deterministic domains
 Utilities,
 Uncertainty,
 Sensing
 Generalize search
 Planning problems.
2
Simple Robot Navigation Problem
• In each state, the possible actions are U, D, R, and L
3
Probabilistic Transition Model
• In each state, the possible actions are U, D, R, and L
• The effect of U is as follows (transition model):
• With probability 0.8 the robot moves up one square
4
Probabilistic Transition Model
• In each state, the possible actions are U, D, R, and L
• The effect of U is as follows (transition model):
• With probability 0.8 the robot moves up one square
•With probability 0.1 the robot moves right one square
•With probability 0.1 the robot moves left one square.
5
Markov Property
 The transition properties depend only on
the current state, not on previous history
(how that state was reached)
6
Markov Decision Process (MDP)
• Defined as a tuple: <S, A, M, R>
– S: State
– A: Action
– M: Transition function
– R: Reward
• Choose a sequence of actions
- Utility based on a sequence of decisions
7
Generalization
 Inputs:
• Initial state s0
• Action model
• Reward R(si) collected in each state si
 A state is terminal if it has no successor
 Starting at s0, the agent keeps executing actions until it
reaches a terminal state
 Its goal is to maximize the expected sum of rewards collected
(additive rewards)
 Additive rewards: U(s0, s1, s2, …) = R(s0) + R(s1) + R(s2)+ …
 Discounted rewards
U(s0, s1, s2, …) = R(s0) + R(s1) +  2R(s2) + … (0    1)
8
Example MDP
• Machine can be in one of three states: good, deteriorating, broken
• Can take two actions: maintain, ignore
9
o To know what state have reached (accessible)
o Calculate best action for each state
- Always know what to do next!
o Mapping from states to actions is called a policy
Policies
10
Policies
• No time period is different from the others
• Optimal thing to do in state s should not depend on time period
– … because of infinite horizon
– With finite horizon, don’t want to maintain machine in last period
• A policy is a function π from states to actions
• Example policy: π(good shape) = ignore, π(deteriorating) = ignore,
π(broken) = maintain 11
Evaluating a policy
• Key observation:
MDP + policy = Markov process with rewards
• To evaluate Markov process with rewards: system of linear
equations
• Gives algorithm for finding optimal policy: try every
possible policy, evaluate
– Terribly inefficient
12
Bellman equation
• Suppose state s, and play optimally from there on
• This leads to expected value v*(s)
• Bellman equation:
v*(s) = maxa R(s, a) + δΣsꞌ P(s, a, sꞌ) v*(sꞌ)
• Given v*, finding optimal policy is easy
13
o Calculate utility of each state U (state)
o Use state utilities to select optimal action
Value iteration
14
Value iteration algorithm for finding
optimal policy
 Start with arbitrary utility values
 Update to make them locally consistent with bellman eqn.
 Repeat until “no change”
15
Policy iteration algorithm for finding
optimal policy
• Easy to compute values given a policy
– No max operator
• Policies may not be highly sensitive to exact utility values
⇒ May be less work to iterate through policies than utilities
16
π ← an arbitrary initial policy
repeat until no change in π
compute utilities given π (value determination)
Update π
as if utilities were correct (i.e., local MEU)
Policy Iteration Algorithm
17
Thanks To All
18

More Related Content

What's hot

Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference Learning
Seung Jae Lee
 
Heuristc Search Techniques
Heuristc Search TechniquesHeuristc Search Techniques
Heuristc Search Techniques
Jismy .K.Jose
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
Subrat Panda, PhD
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
Melaku Eneayehu
 
Reinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsReinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular Methods
Seung Jae Lee
 
Hill climbing algorithm in artificial intelligence
Hill climbing algorithm in artificial intelligenceHill climbing algorithm in artificial intelligence
Hill climbing algorithm in artificial intelligence
sandeep54552
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Salem-Kabbani
 
Artificial intelligence- Logic Agents
Artificial intelligence- Logic AgentsArtificial intelligence- Logic Agents
Artificial intelligence- Logic AgentsNuruzzaman Milon
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
Jie-Han Chen
 
Matching techniques
Matching techniquesMatching techniques
Matching techniques
Nagpalkirti
 
Problem reduction AND OR GRAPH & AO* algorithm.ppt
Problem reduction AND OR GRAPH & AO* algorithm.pptProblem reduction AND OR GRAPH & AO* algorithm.ppt
Problem reduction AND OR GRAPH & AO* algorithm.ppt
arunsingh660
 
Reinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step BootstrappingReinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step Bootstrapping
Seung Jae Lee
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
DongHyun Kwak
 
Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
Amruth Veerabhadraiah
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
Mikko Mäkipää
 
Hill climbing algorithm
Hill climbing algorithmHill climbing algorithm
Hill climbing algorithm
Dr. C.V. Suresh Babu
 
Local beam search example
Local beam search exampleLocal beam search example
Local beam search example
Megha Sharma
 
Local search algorithm
Local search algorithmLocal search algorithm
Local search algorithm
Megha Sharma
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Peerasak C.
 

What's hot (20)

Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference LearningReinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 6. Temporal Difference Learning
 
Heuristc Search Techniques
Heuristc Search TechniquesHeuristc Search Techniques
Heuristc Search Techniques
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Reinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsReinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular Methods
 
Hill climbing algorithm in artificial intelligence
Hill climbing algorithm in artificial intelligenceHill climbing algorithm in artificial intelligence
Hill climbing algorithm in artificial intelligence
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Artificial intelligence- Logic Agents
Artificial intelligence- Logic AgentsArtificial intelligence- Logic Agents
Artificial intelligence- Logic Agents
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Matching techniques
Matching techniquesMatching techniques
Matching techniques
 
Problem reduction AND OR GRAPH & AO* algorithm.ppt
Problem reduction AND OR GRAPH & AO* algorithm.pptProblem reduction AND OR GRAPH & AO* algorithm.ppt
Problem reduction AND OR GRAPH & AO* algorithm.ppt
 
Reinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step BootstrappingReinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step Bootstrapping
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Hill climbing
Hill climbingHill climbing
Hill climbing
 
Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
 
Hill climbing algorithm
Hill climbing algorithmHill climbing algorithm
Hill climbing algorithm
 
Local beam search example
Local beam search exampleLocal beam search example
Local beam search example
 
Local search algorithm
Local search algorithmLocal search algorithm
Local search algorithm
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
 

Viewers also liked

AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
DataminingTools Inc
 
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)Imran Ali
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
DataminingTools Inc
 
04 search heuristic
04 search heuristic04 search heuristic
04 search heuristic
Nour Zeineddine
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networks
Sungjoon Choi
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networks
Fujimoto Keisuke
 
Artificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real WorldArtificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real World
ahmad bassiouny
 
Linear programing
Linear programingLinear programing
Linear programing
Aniruddh Tiwari
 
Linear Programming
Linear ProgrammingLinear Programming
Linear Programming
Pulchowk Campus
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
REHMAT ULLAH
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentationlpaviglianiti
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
u053675
 
Artificial inteligence
Artificial inteligenceArtificial inteligence
Artificial inteligence
Intekhab Alam Khan
 
Decision making & problem solving
Decision making & problem solvingDecision making & problem solving
Decision making & problem solvingashish1afmi
 

Viewers also liked (16)

AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
04 search heuristic
04 search heuristic04 search heuristic
04 search heuristic
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networks
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networks
 
Learning
LearningLearning
Learning
 
Artificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real WorldArtificial Intelligence 1 Planning In The Real World
Artificial Intelligence 1 Planning In The Real World
 
Linear programing
Linear programingLinear programing
Linear programing
 
Linear Programming
Linear ProgrammingLinear Programming
Linear Programming
 
Artificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKSArtificial intelligence NEURAL NETWORKS
Artificial intelligence NEURAL NETWORKS
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Artificial inteligence
Artificial inteligenceArtificial inteligence
Artificial inteligence
 
Decision making & problem solving
Decision making & problem solvingDecision making & problem solving
Decision making & problem solving
 

Similar to Making Complex Decisions(Artificial Intelligence)

Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
Natan Katz
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
Nikolay Pavlov
 
Intro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIIntro to Reinforcement learning - part II
Intro to Reinforcement learning - part II
Mikko Mäkipää
 
CH2_AI_Lecture1.ppt
CH2_AI_Lecture1.pptCH2_AI_Lecture1.ppt
CH2_AI_Lecture1.ppt
AhmedNURHUSIEN
 
reinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of universityreinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of university
MOHDNADEEM971008
 
reinforcement-learning.ppt
reinforcement-learning.pptreinforcement-learning.ppt
reinforcement-learning.ppt
hemalathache
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
RithikRaj25
 
Dexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIDexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAI
Anand Joshi
 
Cs221 lecture8-fall11
Cs221 lecture8-fall11Cs221 lecture8-fall11
Cs221 lecture8-fall11darwinrlo
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
azzeddine chenine
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
MLconf
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
IDEAS - Int'l Data Engineering and Science Association
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
Prabhu Kumar
 
RL intro
RL introRL intro
RL intro
KhangBom
 
Intro to Reinforcement learning - part I
Intro to Reinforcement learning - part IIntro to Reinforcement learning - part I
Intro to Reinforcement learning - part I
Mikko Mäkipää
 
AI_Planning.pdf
AI_Planning.pdfAI_Planning.pdf
AI_Planning.pdf
SUSHMARATHI3
 
Structured prediction with reinforcement learning
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learning
guruprasad110
 
technical seminar2.pptx.on markov decision process
technical seminar2.pptx.on markov decision processtechnical seminar2.pptx.on markov decision process
technical seminar2.pptx.on markov decision process
mudavathnarasimhanai
 
Deep RL.pdf
Deep RL.pdfDeep RL.pdf

Similar to Making Complex Decisions(Artificial Intelligence) (20)

Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Intro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIIntro to Reinforcement learning - part II
Intro to Reinforcement learning - part II
 
CH2_AI_Lecture1.ppt
CH2_AI_Lecture1.pptCH2_AI_Lecture1.ppt
CH2_AI_Lecture1.ppt
 
reinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of universityreinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of university
 
reinforcement-learning.ppt
reinforcement-learning.pptreinforcement-learning.ppt
reinforcement-learning.ppt
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
 
Dexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIDexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAI
 
Cs221 lecture8-fall11
Cs221 lecture8-fall11Cs221 lecture8-fall11
Cs221 lecture8-fall11
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
POMDP Seminar Backup3
POMDP Seminar Backup3POMDP Seminar Backup3
POMDP Seminar Backup3
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
RL intro
RL introRL intro
RL intro
 
Intro to Reinforcement learning - part I
Intro to Reinforcement learning - part IIntro to Reinforcement learning - part I
Intro to Reinforcement learning - part I
 
AI_Planning.pdf
AI_Planning.pdfAI_Planning.pdf
AI_Planning.pdf
 
Structured prediction with reinforcement learning
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learning
 
technical seminar2.pptx.on markov decision process
technical seminar2.pptx.on markov decision processtechnical seminar2.pptx.on markov decision process
technical seminar2.pptx.on markov decision process
 
Deep RL.pdf
Deep RL.pdfDeep RL.pdf
Deep RL.pdf
 

More from United International University

Digital Devices (3rd chapter-2nd part)
Digital Devices (3rd chapter-2nd part)Digital Devices (3rd chapter-2nd part)
Digital Devices (3rd chapter-2nd part)
United International University
 
Network Topology (partial)
Network Topology (partial)Network Topology (partial)
Network Topology (partial)
United International University
 
Corona prediction from symptoms v1.4
Corona prediction from symptoms v1.4Corona prediction from symptoms v1.4
Corona prediction from symptoms v1.4
United International University
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
United International University
 
ICT-Number system.সংখ্যা পদ্ধতি(৩য় অধ্যায়-১ম অংশ)
ICT-Number system.সংখ্যা পদ্ধতি(৩য় অধ্যায়-১ম অংশ)ICT-Number system.সংখ্যা পদ্ধতি(৩য় অধ্যায়-১ম অংশ)
ICT-Number system.সংখ্যা পদ্ধতি(৩য় অধ্যায়-১ম অংশ)
United International University
 
Wireshark Lab HTTP, DNS and ARP v7 solution
Wireshark Lab HTTP, DNS and ARP v7 solutionWireshark Lab HTTP, DNS and ARP v7 solution
Wireshark Lab HTTP, DNS and ARP v7 solution
United International University
 
Wireshark lab ssl v7 solution
Wireshark lab ssl v7 solutionWireshark lab ssl v7 solution
Wireshark lab ssl v7 solution
United International University
 
Network Security(MD5)
Network Security(MD5)Network Security(MD5)
Network Security(MD5)
United International University
 
Secure Electronic Transaction
Secure Electronic TransactionSecure Electronic Transaction
Secure Electronic Transaction
United International University
 
Oracle installation
Oracle installationOracle installation
Oracle installation
United International University
 
IEEE 802.11 Project
IEEE 802.11 ProjectIEEE 802.11 Project
IEEE 802.11 Project
United International University
 
SONET-Communication Engineering
SONET-Communication EngineeringSONET-Communication Engineering
SONET-Communication Engineering
United International University
 
Security Issues for Cellular Telephony
Security Issues for Cellular TelephonySecurity Issues for Cellular Telephony
Security Issues for Cellular Telephony
United International University
 
All types of model(Simulation & Modelling) #ShareThisIfYouLike
All types of model(Simulation & Modelling) #ShareThisIfYouLikeAll types of model(Simulation & Modelling) #ShareThisIfYouLike
All types of model(Simulation & Modelling) #ShareThisIfYouLike
United International University
 
Type Checking(Compiler Design) #ShareThisIfYouLike
Type Checking(Compiler Design) #ShareThisIfYouLikeType Checking(Compiler Design) #ShareThisIfYouLike
Type Checking(Compiler Design) #ShareThisIfYouLike
United International University
 
System imolementation(Modern Systems Analysis and Design)
System imolementation(Modern Systems Analysis and Design)System imolementation(Modern Systems Analysis and Design)
System imolementation(Modern Systems Analysis and Design)
United International University
 
Free Space Management, Efficiency & Performance, Recovery and NFS
Free Space Management, Efficiency & Performance, Recovery and NFSFree Space Management, Efficiency & Performance, Recovery and NFS
Free Space Management, Efficiency & Performance, Recovery and NFS
United International University
 
Overview of Computer Graphics
Overview of Computer GraphicsOverview of Computer Graphics
Overview of Computer Graphics
United International University
 
Keyboard & Mouse basics
Keyboard & Mouse basics Keyboard & Mouse basics
Keyboard & Mouse basics
United International University
 
Organization of a computer
Organization of a computerOrganization of a computer
Organization of a computer
United International University
 

More from United International University (20)

Digital Devices (3rd chapter-2nd part)
Digital Devices (3rd chapter-2nd part)Digital Devices (3rd chapter-2nd part)
Digital Devices (3rd chapter-2nd part)
 
Network Topology (partial)
Network Topology (partial)Network Topology (partial)
Network Topology (partial)
 
Corona prediction from symptoms v1.4
Corona prediction from symptoms v1.4Corona prediction from symptoms v1.4
Corona prediction from symptoms v1.4
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
ICT-Number system.সংখ্যা পদ্ধতি(৩য় অধ্যায়-১ম অংশ)
ICT-Number system.সংখ্যা পদ্ধতি(৩য় অধ্যায়-১ম অংশ)ICT-Number system.সংখ্যা পদ্ধতি(৩য় অধ্যায়-১ম অংশ)
ICT-Number system.সংখ্যা পদ্ধতি(৩য় অধ্যায়-১ম অংশ)
 
Wireshark Lab HTTP, DNS and ARP v7 solution
Wireshark Lab HTTP, DNS and ARP v7 solutionWireshark Lab HTTP, DNS and ARP v7 solution
Wireshark Lab HTTP, DNS and ARP v7 solution
 
Wireshark lab ssl v7 solution
Wireshark lab ssl v7 solutionWireshark lab ssl v7 solution
Wireshark lab ssl v7 solution
 
Network Security(MD5)
Network Security(MD5)Network Security(MD5)
Network Security(MD5)
 
Secure Electronic Transaction
Secure Electronic TransactionSecure Electronic Transaction
Secure Electronic Transaction
 
Oracle installation
Oracle installationOracle installation
Oracle installation
 
IEEE 802.11 Project
IEEE 802.11 ProjectIEEE 802.11 Project
IEEE 802.11 Project
 
SONET-Communication Engineering
SONET-Communication EngineeringSONET-Communication Engineering
SONET-Communication Engineering
 
Security Issues for Cellular Telephony
Security Issues for Cellular TelephonySecurity Issues for Cellular Telephony
Security Issues for Cellular Telephony
 
All types of model(Simulation & Modelling) #ShareThisIfYouLike
All types of model(Simulation & Modelling) #ShareThisIfYouLikeAll types of model(Simulation & Modelling) #ShareThisIfYouLike
All types of model(Simulation & Modelling) #ShareThisIfYouLike
 
Type Checking(Compiler Design) #ShareThisIfYouLike
Type Checking(Compiler Design) #ShareThisIfYouLikeType Checking(Compiler Design) #ShareThisIfYouLike
Type Checking(Compiler Design) #ShareThisIfYouLike
 
System imolementation(Modern Systems Analysis and Design)
System imolementation(Modern Systems Analysis and Design)System imolementation(Modern Systems Analysis and Design)
System imolementation(Modern Systems Analysis and Design)
 
Free Space Management, Efficiency & Performance, Recovery and NFS
Free Space Management, Efficiency & Performance, Recovery and NFSFree Space Management, Efficiency & Performance, Recovery and NFS
Free Space Management, Efficiency & Performance, Recovery and NFS
 
Overview of Computer Graphics
Overview of Computer GraphicsOverview of Computer Graphics
Overview of Computer Graphics
 
Keyboard & Mouse basics
Keyboard & Mouse basics Keyboard & Mouse basics
Keyboard & Mouse basics
 
Organization of a computer
Organization of a computerOrganization of a computer
Organization of a computer
 

Recently uploaded

NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 

Recently uploaded (20)

NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 

Making Complex Decisions(Artificial Intelligence)

  • 1. Making Complex Decisions Department of Computer Science & Engineering Hamdard University Bangladesh 1
  • 2. Sequential Decisions • Agent’s utility depends on a sequence of decisions • Also known as accessible, deterministic domains  Utilities,  Uncertainty,  Sensing  Generalize search  Planning problems. 2
  • 3. Simple Robot Navigation Problem • In each state, the possible actions are U, D, R, and L 3
  • 4. Probabilistic Transition Model • In each state, the possible actions are U, D, R, and L • The effect of U is as follows (transition model): • With probability 0.8 the robot moves up one square 4
  • 5. Probabilistic Transition Model • In each state, the possible actions are U, D, R, and L • The effect of U is as follows (transition model): • With probability 0.8 the robot moves up one square •With probability 0.1 the robot moves right one square •With probability 0.1 the robot moves left one square. 5
  • 6. Markov Property  The transition properties depend only on the current state, not on previous history (how that state was reached) 6
  • 7. Markov Decision Process (MDP) • Defined as a tuple: <S, A, M, R> – S: State – A: Action – M: Transition function – R: Reward • Choose a sequence of actions - Utility based on a sequence of decisions 7
  • 8. Generalization  Inputs: • Initial state s0 • Action model • Reward R(si) collected in each state si  A state is terminal if it has no successor  Starting at s0, the agent keeps executing actions until it reaches a terminal state  Its goal is to maximize the expected sum of rewards collected (additive rewards)  Additive rewards: U(s0, s1, s2, …) = R(s0) + R(s1) + R(s2)+ …  Discounted rewards U(s0, s1, s2, …) = R(s0) + R(s1) +  2R(s2) + … (0    1) 8
  • 9. Example MDP • Machine can be in one of three states: good, deteriorating, broken • Can take two actions: maintain, ignore 9
  • 10. o To know what state have reached (accessible) o Calculate best action for each state - Always know what to do next! o Mapping from states to actions is called a policy Policies 10
  • 11. Policies • No time period is different from the others • Optimal thing to do in state s should not depend on time period – … because of infinite horizon – With finite horizon, don’t want to maintain machine in last period • A policy is a function π from states to actions • Example policy: π(good shape) = ignore, π(deteriorating) = ignore, π(broken) = maintain 11
  • 12. Evaluating a policy • Key observation: MDP + policy = Markov process with rewards • To evaluate Markov process with rewards: system of linear equations • Gives algorithm for finding optimal policy: try every possible policy, evaluate – Terribly inefficient 12
  • 13. Bellman equation • Suppose state s, and play optimally from there on • This leads to expected value v*(s) • Bellman equation: v*(s) = maxa R(s, a) + δΣsꞌ P(s, a, sꞌ) v*(sꞌ) • Given v*, finding optimal policy is easy 13
  • 14. o Calculate utility of each state U (state) o Use state utilities to select optimal action Value iteration 14
  • 15. Value iteration algorithm for finding optimal policy  Start with arbitrary utility values  Update to make them locally consistent with bellman eqn.  Repeat until “no change” 15
  • 16. Policy iteration algorithm for finding optimal policy • Easy to compute values given a policy – No max operator • Policies may not be highly sensitive to exact utility values ⇒ May be less work to iterate through policies than utilities 16
  • 17. π ← an arbitrary initial policy repeat until no change in π compute utilities given π (value determination) Update π as if utilities were correct (i.e., local MEU) Policy Iteration Algorithm 17