SlideShare a Scribd company logo
ST. SOLDIER INSTITUTE OF
ENGINEERING & TECHNOLOGY
SESSION – (2019-2023)
PRACTICAL FILE OF ARTIFICIAL
INTELLIGENCE LAB
(BTCS 605-18)
INDEX
SR. NO. AIM PAGE NO. SIGNATURE
1. Write a programme to conduct
uninformed and informed
search.
1 – 12
2. Write a programme to conduct
game search.
13 – 17
3. Write a programme to
construct a Bayesian network
from given data.
18 – 23
4. Write a programme to infer
from the Bayesian network.
24 – 28
5. Write a programme to run
value and policy iteration in a
grid world.
29 – 33
6. Write a programme to do
reinforcement learning in a
grid world
34 - 40
EXPERIMENT N0. 1
AIM : Write a program to conduct uniformed and informed search.
THEORY :
Uninformed Search Algorithm
Uninformed search is a class of general-purpose search algorithms which
operates in brute force-way. Uninformed search algorithms do not have
additional information about state or search space other than how to
traverse the tree, so it is also called blind search.
Following are the various types of uninformed search algorithms:
1.Breadth-first Search
2.Depth-first Search
1. Breadth-first Search:
Breadth-first search is the most common search strategy for traversing a
tree or graph. This algorithm searches breadthwise in a tree or graph, so
it is called breadth-first search. BFS algorithm starts searching from the
root node of the tree and expands all successor node at the current level
before moving to nodes of next level. The breadth-first search algorithm
is an example of a general-graph search algorithm. Breadth-first search
implemented using FIFO queue data structure.
Code for BFS :
#include<bits/stdc++.h>
using namespace std;
// This class represents a directed graph using
// adjacency list representation
class Graph
{
int V; // No. of vertices
// Pointer to an array containing adjacency
// lists
vector<list<int>> adj;
public:
Graph(int V); // Constructor
// function to add an edge to graph
void addEdge(int v, int w);
// prints BFS traversal from a given source s
void BFS(int s);
};
Graph::Graph(int V)
{
this->V = V;
adj.resize(V);
}
void Graph::addEdge(int v, int w)
{
adj*v+.push_back(w); // Add w to v’s list.
}
void Graph::BFS(int s)
{
// Mark all the vertices as not visited
vector<bool> visited;
visited.resize(V,false);
// Create a queue for BFS
list<int> queue;
// Mark the current node as visited and enqueue it
visited[s] = true;
queue.push_back(s);
while(!queue.empty())
{
// Dequeue a vertex from queue and print it
s = queue.front();
cout << s << " ";
queue.pop_front();
// Get all adjacent vertices of the dequeued
// vertex s. If a adjacent has not been visited,
// then mark it visited and enqueue it
for (auto adjecent: adj[s])
{
if (!visited[adjecent])
{
visited[adjecent] = true;
queue.push_back(adjecent);
}
}
}
}
// Driver program to test methods of graph class
int main()
{
// Create a graph given in the above diagram
Graph g(4);
g.addEdge(0, 1);
g.addEdge(0, 2);
g.addEdge(1, 2);
g.addEdge(2, 0);
g.addEdge(2, 3);
g.addEdge(3, 3);
cout << "Following is Breadth First Traversal "
<< "(starting from vertex 2) n";
g.BFS(2);
return 0;
}
2. Depth-first Search
Depth-first search isa recursive algorithm for traversing a tree or graph
data structure. It is called the depth-first search because it starts from the
root node and follows each path to its greatest depth node before
moving to the next path. DFS uses a stack data structure for its
implementation. The process of the DFS algorithm is similar to the BFS
algorithm
EXPERIMENT NO. 1
AIM: Write a program to conduct uniformed and informed search
Output:
Following is Breadth First Traversal (starting from vertex 2)
2 0 3 1
Code for DFS:
#include <bits/stdc++.h>
using namespace std;
// Graph class represents a directed graph
// using adjacency list representation
class Graph {
public:
map<int, bool> visited;
map<int, list<int> > adj;
// function to add an edge to graph
void addEdge(int v, int w);
// DFS traversal of the vertices
// reachable from v
void DFS(int v);
};
void Graph::addEdge(int v, int w)
{
adj*v+.push_back(w); // Add w to v’s list.
}
void Graph::DFS(int v)
{
// Mark the current node as visited and
// print it
visited[v] = true;
cout << v << " ";
// Recurse for all the vertices adjacent
// to this vertex
list<int>::iterator i;
for (i = adj[v].begin(); i != adj[v].end(); ++i)
if (!visited[*i])
DFS(*i);
}
// Driver code
int main()
{
// Create a graph given in the above diagram
Graph g;
g.addEdge(0, 1);
g.addEdge(0, 2);
g.addEdge(1, 2);
g.addEdge(2, 0);
g.addEdge(2, 3);
g.addEdge(3, 3);
cout << "Following is Depth First Traversal"
" (starting from vertex 2) n";
g.DFS(2);
return 0;
}
Informed Search:
Informed Search algorithms have information on the goal state which
helps in more efficient searching. This information is obtained by a
function that estimates how close a state is to the goal state.
In the informed search main algorithm which is given below:
1.A* Search Algorithm
A* search is the most commonly known form of best-first search. It uses
heuristic function h(n), and cost to reach the node n from the start state
g(n). It has combined features of UCS and greedy best-first search, by
which it solve the problem efficiently. A* search algorithm finds the
shortest path through the search space using the heuristic function. This
search algorithm expands less search tree and provides optimal result
faster. A* algorithm is similar to UCS except that it uses g(n)+h(n) instead
of g(n).
Code for A* algorithm
#include <iostream>
#include "source/AStar.hpp"
int main()
{
AStar::Generator generator;
// Set 2d map size.
generator.setWorldSize({25, 25});
// You can use a few heuristics : manhattan, euclidean or octagonal.
generator.setHeuristic(AStar::Heuristic::euclidean);
generator.setDiagonalMovement(true);
std::cout << "Generate path ... n";
// This method returns vector of coordinates from target to source.
auto path = generator.findPath({0, 0}, {20, 20});
for(auto& coordinate : path) {
std::cout << coordinate.x << " " << coordinate.y << "n";
}
}
OUTPUT OF A* algorithm
OUTPUT :
Following is Depth First Traversal (starting from vertex 2)
2 0 1 3
EXPERIMENT N0. 2
AIM :Write a program to conduct game search.
THEORY :Game playing was one of the first tasks undertaken in Artificial
Intelligence. Game theory has its history from 1950, almost from the days
when computers became programmable. The very first game that is been
tackled in AI is chess. Initiators in the field of game theory in AI were
Konard Zuse (the inventor of the first programmable computer and the
first programming language), Claude Shannon (the inventor of
information theory), Norbert Wiener (the creator of modern control
theory), and Alan Turing. Since then, there has been a steady progress in
the standard of play, to the point that machines have defeated human
champions (although not every time) in chess and backgammon, and are
competitive in many other games.
Types of Game
1. Perfect Information Game: In which player knows all the possible
moves of himself and opponent and their results.
E.g. Chess.2
2. Imperfect Information Game: In which player does not know all the
possible moves of the opponent.
E.g. Bridge since all the cards are not visible to player
Mini-Max Algorithm in Artificial Intelligence:
Mini-max algorithm is a recursive or backtracking algorithm which is used
in decision-making and game theory. It provides an optimal move for the
player assuming that opponent is also playing optimally. Mini-Max
algorithm uses recursion to search through the game-tree. Min-Max
algorithm is mostly used for game playing in AI. Such as Chess, Checkers,
tic-tac-toe, go, and various tow-players game. This Algorithm computes
the minimax decision for the current state .In this algorithm two players
play the game, one is called MAX and other is called MIN. Both the
players fight it as the opponent player gets the minimum benefit while
they get the maximum benefit. Both Players of the game are opponent of
each other, where MAX will select the maximized value and MIN will
select the minimized value. The minimax algorithm performs a depth-first
search algorithm for the exploration of the complete game tree. The
minimax algorithm proceeds all the way down to the terminal node of
the tree, then backtrack the tree as the recursion.
Code for minmax algorithm :
// A simple C++ program to find
// maximum score that
// maximizing player can get.
#include<bits/stdc++.h>
using namespace std;
// Returns the optimal value a maximizer can obtain.
// depth is current depth in game tree.
// nodeIndex is index of current node in scores[].
// isMax is true if current move is
// of maximizer, else false
// scores[] stores leaves of Game tree.
// h is maximum height of Game tree
int minimax(int depth, int nodeIndex, bool isMax,
int scores[], int h)
{
// Terminating condition. i.e
// leaf node is reached
if (depth == h)
return scores[nodeIndex];
// If current move is maximizer,
// find the maximum attainable
// value
if (isMax)
return max(minimax(depth+1, nodeIndex*2, false, scores, h),
minimax(depth+1, nodeIndex*2 + 1, false, scores, h));
// Else (If current move is Minimizer), find the minimum
// attainable value
else
return min(minimax(depth+1, nodeIndex*2, true, scores, h),
minimax(depth+1, nodeIndex*2 + 1, true, scores, h));
}
// A utility function to find Log n in base 2
int log2(int n)
{
return (n==1)? 0 : 1 + log2(n/2);
}
// Driver code
int main()
{
// The number of elements in scores must be
// a power of 2.
int scores[] = {3, 5, 2, 9, 12, 5, 23, 23};
int n = sizeof(scores)/sizeof(scores[0]);
int h = log2(n);
int res = minimax(0, 0, true, scores, h);
cout << "The optimal value is : " << res << endl;
return 0;
}
EXPERIMENT N0. 2
AIM : Write a program to conduct game search
Output:
The optimal value is: 12
EXPERIMENT N0. 3
AIM :Write a program to construct a Bayesian network from given data
THEORY:A Bayesian network is a directed acyclic graph in which each
edge corresponds to a conditional dependency, and each node
corresponds to a unique random variable. Bayesian network consists of
two major parts: a directed acyclic graph and a set of conditional
probability distributions
1. The directed acyclic graph is a set of random variables represented
by nodes.
2. The conditional probability distribution of a node (random variable)
is defined for every possible outcome of the preceding causal
node(s).
For illustration, consider the following example. Suppose we attempt to
turn on our computer, but the computer does not start
(observation/evidence). We would like to know which of the possible
causes of computer failure is more likely. In this simplified illustration, we
assume only two possible causes of this misfortune: electricity failure and
computer malfunction.
The corresponding directed acyclic graph is depicted in below figure
causes
evidence
Electricity failure Computer
malfunction
Computer failure
The goal is to calculate the posterior conditional probability distribution
of each of the possible unobserved causes given the observed evidence,
i.e. P [Cause | Evidence].
Data Set:
Title: Heart Disease Databases
The Cleveland database contains 76 attributes, but all published
experiments refer to using a
subset of 14 of them. In particular, the Cleveland database is the only one
that has been used
by ML researchers to this date. The "Heartdisease" field refers to the
presence of heart disease in the patient. It is integer valued from 0 (no
presence) to 4.
Database: 0 1 2 3 4 Total
Cleveland: 164 55 36 35 13 303
Attribute Information:
1. age: age in years
2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
-anginal pain
4. trestbps: resting blood pressure (in mm Hg on admission to the
hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
-T wave abnormality (T wave inversions and/or ST
elevation
or depression of > 0.05 mV)
hypertrophy by
Estes'
criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11.slope: the slope of the peak exercise ST segment
ping
12. ca = number of major vessels (0-3) colored by flourosopy
13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
14.Heartdisease: It is integer valued from 0 (no presence) to 4. Diagnosis
of heart disease (angiographic disease status)
Some instance from the dataset:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal Heart
disease
63 1 1 145 1 1 2 150 0 2.3 3 0 6 0
67 1 4 160 4 0 2 108 1 1.5 2 3 3 2
67 1 4 120 4 0 2 129 1 2.6 2 2 7 1
41 0 2 130 2 0 2 172 0 1.4 1 O 3 0
62 0 4 140 4 0 2 160 0 3.6 3 2 3 3
60 1 4 130 206 0 2 132 1 2.4 2 2 7 4
Code :
import numpy as np
import csv
import pandas as pd
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
#read Cleveland Heart Disease data
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
#display the data
print('Few examples from the dataset are given below')
print(heartDisease.head())
#Model Bayesian Network
Model=BayesianModel([('age','trestbps'),('age','fbs'),
('sex','trestbps'),('exang','trestbps'),('trestbps','heartdise
ase'),('fbs','heartdisease'),('heartdisease','restecg'),
('heartdisease','thalach'),('heartdisease','chol')])
#Learning CPDs using Maximum Likelihood Estimators
print('n Learning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
# Inferencing with Bayesian Network
print('n Inferencing with Bayesian Network:')
HeartDisease_infer = VariableElimination(model)
#computing the Probability of HeartDisease given Age
print('n 1. Probability of HeartDisease given Age=30')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence
={'age':28})
print(q['heartdisease'])
#computing the Probability of HeartDisease given cholesterol
print('n 2. Probability of HeartDisease given cholesterol=100')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence
={'chol':100})
print(q['heartdisease'])
EXPERIMENT N0. 3
AIM : Write a program to construct a Bayesian network from given data
Output:
Few examples from the dataset are given below
Age sex cp trestbps ... slope ca thal heartdisease
0 63 1 1 145 … 3 0 6 0
1 67 1 4 160 … 2 3 3 2
2 67 1 4 120 … 2 2 7 1
3 37 1 3 130 … 3 0 3 0
4 41 0 2 130 … 1 0 3 0
EXPERIMENT N0. 4
AIM:Write a program to infer from the Bayesian network.
THEORY :An acyclic directed graph is used to create a Bayesian network,
which is a probability model. It’s factored by utilizing a single conditional
probability distribution for each variable in the model, whose distribution
is based on the parents in the graph. The simple principle of probability
underpins Bayesian models. So, first, let’s define conditional probability
and joint probability distribution.
Conditional Probability
Conditional probability is a measure of the likelihood of an event
occurring provided that another event has already occurred (through
assumption, supposition, statement, or evidence). If A is the event of
interest and B is known or considered to have occurred, the conditional
probability of A given B is generally stated as P(A|B) or, less frequently,
PB(A) if A is the event of interest and B is known or thought to have
occurred. This can also be expressed as a percentage of the likelihood of
B crossing with A:
Joint Probability
The chance of two (or more) events together is known as the joint
probability. The sum of the probabilities of two or more random variables
is the joint probability distribution. For example, the joint probability of
events A and B is expressed formally as:
The letter P is the first letter of the alphabet (A and B).
The upside-down capital “U” operator or, in some situations, a comma “,”
represents the “and” or conjunction.
P(A ^ B)
P(A, B)
By multiplying the chance of event A by the likelihood of event B, the
combined probability for occurrences A and B is calculated.
Posterior Probability
In Bayesian statistics, the conditional probability of a random occurrence
or an ambiguous assertion is the conditional probability given the
relevant data or background. “After taking into account the relevant
evidence pertinent to the specific subject under consideration,”
“posterior” means in this case. The probability distribution of an
unknown quantity interpreted as a random variable based on data from
an experiment or survey is known as the posterior probability
distribution.
Inferencing with Bayesian Network
In this demonstration, we’ll use Bayesian Networks to solve the well-
known Monty Hall Problem. Let me explain the Monty Hall problem to
those of you who are unfamiliar with it:
This problem entails a competition in which a contestant must choose
one of three doors, one of which conceals a price. The show’s host
(Monty) unlocks an empty door and asks the contestant if he wants to
swap to the other door after the contestant has chosen one. The decision
is whether to keep the current door or replace it with a new one. It is
preferable to enter by the other door because the price is more likely to
be higher. To come out from this ambiguity let’s model this with a
Bayesian network.
CODE :
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
import networkx as nx
import pylab as plt
# Defining Bayesian Structure
model = BayesianNetwork([('Guest', 'Host'), ('Price', 'Host')])
# Defining the CPDs:
cpd_guest = TabularCPD('Guest', 3, [[0.33], [0.33], [0.33]])
cpd_price = TabularCPD('Price', 3, [[0.33], [0.33], [0.33]])
cpd_host = TabularCPD('Host', 3, [[0, 0, 0, 0, 0.5, 1, 0, 1, 0.5],
[0.5, 0, 1, 0, 0, 0, 1, 0, 0.5],
[0.5, 1, 0, 1, 0.5, 0, 0, 0, 0]],
evidence=['Guest', 'Price'], evidence_card=[3, 3])
# Associating the CPDs with the network structure.
model.add_cpds(cpd_guest, cpd_price, cpd_host)
model.check_model()
# Infering the posterior probability
from pgmpy.inference import VariableElimination
infer = VariableElimination(model)
posterior_p = infer.query(['Host'], evidence={'Guest': 2, 'Price': 2})
print(posterior_p)
EXPERIMENT N0. 4
AIM: Write a program to infer from the Bayesian network
OUTPUT :
EXPERIMENT NO. 5
AIM:Write a program to run value and policy iteration in a grid world
THEORY :Value Iteration
With the tools we have explored until now, a new question arises: why
do we need to consider an initial policy at all? The idea of the value
iteration algorithm is that we can compute the value function without a
policy. Instead of letting the policy, π, dictate which actions are selected,
we will select those actions that maximize the expected reward:
CODE FOR VALUE ITERATION :
def valueIteration(self, gridWorld, gamma = 1):
self.resetPolicy() # ensure empty policy before calling evaluatePolicy
V_old = None
V_new = np.repeat(0, gridWorld.size())
convergedCellIndices = np.zeros(0)
while len(convergedCellIndices) != len(V_new):
V_old = V_new
V_new = self.evaluatePolicySweep(gridWorld, V_old, gamma,
convergedCellIndices)
convergedCellIndices = self.findConvergedCells(V_old, V_new)
greedyPolicy = findGreedyPolicy(V_new, gridWorld, self.gameLogic)
self.setPolicy(greedyPolicy)
self.setWidth(gridWorld.getWidth())
self.setHeight(gridWorld.getHeight())
return(V_new)
POLICY ITERATION :
A simple strategy for this is a greedy algorithm that iterates over all the
cells in the grid and then chooses the action that maximizes the expected
reward according to the value function.
This approach implicitly determines the action-value function, which is
defined as
Qπ(s,a)=∑s′Pass′*Rass′+γVπ(s′)+Qπ(s,a)=∑s′Pss′a*Rss′a+γVπ(s′)+
The improvePolicy function determines the value function of a policy (if
it’s not available yet) and then calls findGreedyPolicy to identify the
optimal action for every state:
def improvePolicy(policy, gridWorld, gamma = 1):
policy = copy.deepcopy(policy) # dont modify old policy
if len(policy.values) == 0:
# policy needs to be evaluated first
policy.evaluatePolicy(gridWorld)
greedyPolicy = findGreedyPolicy(policy.getValues(), gridWorld, 
policy.gameLogic, gamma)
policy.setPolicy(greedyPolicy)
return policy
def findGreedyPolicy(values, gridWorld, gameLogic, gamma = 1):
# create a greedy policy based on the values param
stateGen = StateGenerator()
greedyPolicy = [Action(Actions.NONE)] * len(values)
for (i, cell) in enumerate(gridWorld.getCells()):
gridWorld.setActor(cell)
if not cell.canBeEntered():
continue
maxPair = (Actions.NONE, -np.inf)
for actionType in Actions:
if actionType == Actions.NONE:
continue
proposedCell = gridWorld.proposeMove(actionType)
if proposedCell is None:
# action is nonsensical in this state
continue
Q = 0.0 # action-value function
proposedStates = stateGen.generateState(gridWorld, actionType, cell)
for proposedState in proposedStates:
actorPos = proposedState.getIndex()
transitionProb = gameLogic.getTransitionProbability(cell, proposedState,
actionType, gridWorld)
reward = gameLogic.R(cell, proposedState, actionType)
expectedValue = transitionProb * (reward + gamma * values[actorPos])
Q += expectedValue
if Q > maxPair[1]:
maxPair = (actionType, Q)
gridWorld.unsetActor(cell) # reset state
greedyPolicy[i] = Action(maxPair[0])
return greedyPolicy
EXPERIMENT NO. 5
AIM :Write a program to run value and policy iteration in a grid world.
OUTPUT :
OUTPUT :
EXPERIMENT NO : 6
AIM :Write a program to do reinforcement learning in a grid world .
THEORY : Reinforcement Learning (RL) involves decision making
under uncertainty which tries to maximize return over successive
states.There are four main elements of a Reinforcement Learning system:
a policy, a reward signal, a value function. The policy is a mapping from
the states to actions or a probability distribution of actions. Every action
the agent takes results in a numerical reward. The agent’s sole purpose is
to maximize the reward in the long run.
Reinforcement Learning involves decision making under uncertainty
which tries to maximize return over successive states.There are four main
elements of a Reinforcement Learning system: a policy, a reward signal, a
value function. The policy is a mapping from the states to actions or a
probability distribution of actions. Every action the agent takes results in
a numerical reward. The agent’s sole purpose is to maximize the reward
in the long run. The reward indicates the immediate return, a value
function specifies the return in the long run. Value of a state is the
expected reward that an agent can accrue The agent/robot takes an
action in At in state St and moves to state S’t anf gets a reward Rt+1 as
shown
An agent will seek to maximize the overall return as it transition across states
The expected return can be expressed as
where is the expected return in time t and the discounted expected
return in time t+1
A policy is a mapping from states to probabilities of selecting each possible action. If the
agent is following policy at time t, then is the probability that = a if = s.
The value function of a state s under a policy , denoted , is the expected return when
starting in s and following thereafter
This can be written as
=
Similarly the action value function gives the expected return when taking an action ‘a’ in
state ‘s’
These are Bellman’s equation for the state value function
The Bellman equations give the equation for each of the state
The Bellman optimality equations give the optimal policy of choosing specific actions in
specific states to achieve the maximum reward and reach the goal efficiently. They are given
as
The Bellman equations cannot be used directly in goal directed problems and dynamic
programming is used instead where the value functions are computed iteratively
In the problem below the Maze has 2 end states as shown in the corner. There are four
possible actions in each state up, down, right and left. If an action in a state takes it out of
the grid then the agent remains in the same state. All actions have a reward of -1 while the
end states have a reward of 0
This is shown as
where the reward for any transition is Rt=−1Rt=−1 except the transition to the end states at
the corner which have a reward of 0. The policy is a uniform policy with all actions being
equi-probable with a probability of 1/4 or 0.25
1. Gridworld-1
In [1]:
import numpy as np
import random
In [2]:
gamma = 1 # discounting rate
gridSize = 4
rewardValue = -1
terminationStates = [[0,0], [gridSize-1, gridSize-1]]
actions = [[-1, 0], [1, 0], [0, 1], [0, -1]]
numIterations = 1000
The action value provides the next state for a given action in a state and the accrued reward
In [3]:
def actionValue(initialPosition,action):
if initialPosition in terminationStates:
finalPosition = initialPosition
reward=0
else:
#Compute final position
finalPosition = np.array(initialPosition) + np.array(action)
reward= rewardValue
# If the action moves the finalPosition out of the grid, stay in same cell
if -1 in finalPosition or gridSize in finalPosition:
finalPosition = initialPosition
reward= rewardValue
#print(finalPosition)
return finalPosition, reward
1a. Bellman Update
In [4]:
# Initialize valueMap and valueMap1
valueMap = np.zeros((gridSize, gridSize))
valueMap1 = np.zeros((gridSize, gridSize))
states = [[i, j] for i in range(gridSize) for j in range(gridSize)]
In [5]:
def policy_evaluation(numIterations,gamma,theta,valueMap):
for i in range(numIterations):
delta=0
for state in states:
weightedRewards=0
for action in actions:
finalPosition,reward = actionValue(state,action)
weightedRewards += 1/4* (reward + gamma *
valueMap[finalPosition[0],finalPosition][1])
valueMap1[state[0],state[1]]=weightedRewards
delta =max(delta,abs(weightedRewards-valueMap[state[0],state[1]]))
valueMap = np.copy(valueMap1)
if(delta < 0.01):
print(valueMap)
break
In [6]:
valueMap = np.zeros((gridSize, gridSize))
valueMap1 = np.zeros((gridSize, gridSize))
states = [[i, j] for i in range(gridSize) for j in range(gridSize)]
policy_evaluation(1000,1,0.001,valueMap)
[[ 0. -13.89528403 -19.84482978 -21.82635535]
[-13.89528403 -17.86330422 -19.84586777 -19.84482978]
[-19.84482978 -19.84586777 -17.86330422 -13.89528403]
[-21.82635535 -19.84482978 -13.89528403 0. ]]
In [7]:
valueMap = np.zeros((gridSize, gridSize))
valueMap1 = np.zeros((gridSize, gridSize))
states = [[i, j] for i in range(gridSize) for j in range(gridSize)]
pi = np.ones((gridSize,gridSize))/4
pi1 = np.chararray((gridSize, gridSize))
pi1[:] = 'a'
In [8]:
# Compute the value state function for the Grid
def policy_evaluate(states,actions,gamma,valueMap):
#print("iterations=",i)
for state in states:
weightedRewards=0
for action in actions:
finalPosition,reward = actionValue(state,action)
weightedRewards += 1/4* (reward + gamma * valueMap[finalPosition[0],finalPosition][1])
# Set the computed weighted rewards to valueMap1
valueMap1[state[0],state[1]]=weightedRewards
# Copy to original valueMap
valueMap = np.copy(valueMap1)
return(valueMap)
In [9]:
def argmax(q_values):
idx=np.argmax(q_values)
return(np.random.choice(np.where(a==a[idx])[0].tolist()))
# Compute the best action in each state
def greedify_policy(state,pi,pi1,gamma,valueMap):
q_values=np.zeros(len(actions))
for idx,action in enumerate(actions):
finalPosition,reward = actionValue(state,action)
q_values[idx] += 1/4* (reward + gamma * valueMap[finalPosition[0],finalPosition][1])
# Find the index of the action for which the q_value is
idx=q_values.argmax()
pi[state[0],state[1]]=idx
if(idx == 0):
pi1[state[0],state[1]]='u'
elif(idx == 1):
pi1[state[0],state[1]]='d'
elif(idx == 2):
pi1[state[0],state[1]]='r'
elif(idx == 3):
pi1[state[0],state[1]]='l'
In [10]:
def improve_policy(pi, pi1,gamma,valueMap):
policy_stable = True
for state in states:
old = pi[state].copy()
# Greedify policy for state
greedify_policy(state,pi,pi1,gamma,valueMap)
if not np.array_equal(pi[state], old):
policy_stable = False
print(pi)
print(pi1)
return pi, pi1, policy_stable
In [11]:
def policy_iteration(gamma, theta):
valueMap = np.zeros((gridSize, gridSize))
pi = np.ones((gridSize,gridSize))/4
pi1 = np.chararray((gridSize, gridSize))
pi1[:] = 'a'
policy_stable = False
print("here")
while not policy_stable:
valueMap = policy_evaluate(states,actions,gamma,valueMap)
pi,pi1, policy_stable = improve_policy(pi,pi1, gamma,valueMap)
return valueMap, pi,pi1
In [12]:
theta=0.1
valueMap, pi,pi1 = policy_iteration(gamma, theta)
[[0. 3. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 1.]
[0. 0. 2. 0.]]
[[b'u' b'l' b'u' b'u']
[b'u' b'u' b'u' b'u']
[b'u' b'u' b'u' b'd']
[b'u' b'u' b'r' b'u']]
[[0. 3. 3. 0.]
[0. 0. 0. 1.]
[0. 0. 1. 1.]
[0. 2. 2. 0.]]
[[b'u' b'l' b'l' b'u']
[b'u' b'u' b'u' b'd']
[b'u' b'u' b'd' b'd']
[b'u' b'r' b'r' b'u']]
[[0. 3. 3. 1.]
[0. 0. 1. 1.]
[0. 0. 1. 1.]
[0. 2. 2. 0.]]
[[b'u' b'l' b'l' b'd']
[b'u' b'u' b'd' b'd']
[b'u' b'u' b'd' b'd']
[b'u' b'r' b'r' b'u']]
[[0. 3. 3. 1.]
[0. 0. 1. 1.]
[0. 0. 1. 1.]
[0. 2. 2. 0.]]
[[b'u' b'l' b'l' b'd']
[b'u' b'u' b'd' b'd']
[b'u' b'u' b'd' b'd']
[b'u' b'r' b'r' b'u']]
EXPERIMENT NO : 6
AIM :Write a program to do reinforcement learning in a grid world
output:
The valueMap shows the optimal path from any state

More Related Content

Similar to AI Lab menu for ptu students and easy to use and best quality and help for 6th sem students

Javascript
JavascriptJavascript
Javascript
Vlad Ifrim
 
Import java
Import javaImport java
Import java
heni2121
 
Functions
FunctionsFunctions
Functions
Swarup Boro
 
C++ BinaryTree Help Creating main function for Trees...Here are .pdf
C++ BinaryTree Help  Creating main function for Trees...Here are .pdfC++ BinaryTree Help  Creating main function for Trees...Here are .pdf
C++ BinaryTree Help Creating main function for Trees...Here are .pdf
forecastfashions
 
#include iostream #include deque #include stdio.h   scan.pdf
#include iostream #include deque #include stdio.h   scan.pdf#include iostream #include deque #include stdio.h   scan.pdf
#include iostream #include deque #include stdio.h   scan.pdf
anandmobile
 
Internal workshop es6_2015
Internal workshop es6_2015Internal workshop es6_2015
Internal workshop es6_2015
Miguel Ruiz Rodriguez
 
Using an Array include ltstdiohgt include ltmpih.pdf
Using an Array include ltstdiohgt include ltmpih.pdfUsing an Array include ltstdiohgt include ltmpih.pdf
Using an Array include ltstdiohgt include ltmpih.pdf
giriraj65
 
Im having trouble figuring out how to code these sections for an a.pdf
Im having trouble figuring out how to code these sections for an a.pdfIm having trouble figuring out how to code these sections for an a.pdf
Im having trouble figuring out how to code these sections for an a.pdf
rishteygallery
 
Using Java, please write the program for the following prompt in the.pdf
Using Java, please write the program for the following prompt in the.pdfUsing Java, please write the program for the following prompt in the.pdf
Using Java, please write the program for the following prompt in the.pdf
forecastfashions
 
need help with code I wrote. This code is a maze gui, and i need hel.pdf
need help with code I wrote. This code is a maze gui, and i need hel.pdfneed help with code I wrote. This code is a maze gui, and i need hel.pdf
need help with code I wrote. This code is a maze gui, and i need hel.pdf
arcotstarsports
 
C++ manual Report Full
C++ manual Report FullC++ manual Report Full
C++ manual Report Full
Thesis Scientist Private Limited
 
Deep dive into deeplearn.js
Deep dive into deeplearn.jsDeep dive into deeplearn.js
Deep dive into deeplearn.js
Kai Sasaki
 
i have written ths code as per your requirements with clear comments.pdf
i have written ths code as per your requirements with clear comments.pdfi have written ths code as per your requirements with clear comments.pdf
i have written ths code as per your requirements with clear comments.pdf
anandf0099
 
Graphics practical lab manual
Graphics practical lab manualGraphics practical lab manual
Graphics practical lab manual
Vivek Kumar Sinha
 
Pointer
PointerPointer
Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...
Andreas Dewes
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
Flink Forward
 
This is Function Class public abstract class Function {    .pdf
This is Function Class public abstract class Function {    .pdfThis is Function Class public abstract class Function {    .pdf
This is Function Class public abstract class Function {    .pdf
amitbagga0808
 
DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDY
DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDYDATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDY
DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDY
Malikireddy Bramhananda Reddy
 
SaveI need help with this maze gui that I wrote in java, I am tryi.pdf
SaveI need help with this maze gui that I wrote in java, I am tryi.pdfSaveI need help with this maze gui that I wrote in java, I am tryi.pdf
SaveI need help with this maze gui that I wrote in java, I am tryi.pdf
arihantstoneart
 

Similar to AI Lab menu for ptu students and easy to use and best quality and help for 6th sem students (20)

Javascript
JavascriptJavascript
Javascript
 
Import java
Import javaImport java
Import java
 
Functions
FunctionsFunctions
Functions
 
C++ BinaryTree Help Creating main function for Trees...Here are .pdf
C++ BinaryTree Help  Creating main function for Trees...Here are .pdfC++ BinaryTree Help  Creating main function for Trees...Here are .pdf
C++ BinaryTree Help Creating main function for Trees...Here are .pdf
 
#include iostream #include deque #include stdio.h   scan.pdf
#include iostream #include deque #include stdio.h   scan.pdf#include iostream #include deque #include stdio.h   scan.pdf
#include iostream #include deque #include stdio.h   scan.pdf
 
Internal workshop es6_2015
Internal workshop es6_2015Internal workshop es6_2015
Internal workshop es6_2015
 
Using an Array include ltstdiohgt include ltmpih.pdf
Using an Array include ltstdiohgt include ltmpih.pdfUsing an Array include ltstdiohgt include ltmpih.pdf
Using an Array include ltstdiohgt include ltmpih.pdf
 
Im having trouble figuring out how to code these sections for an a.pdf
Im having trouble figuring out how to code these sections for an a.pdfIm having trouble figuring out how to code these sections for an a.pdf
Im having trouble figuring out how to code these sections for an a.pdf
 
Using Java, please write the program for the following prompt in the.pdf
Using Java, please write the program for the following prompt in the.pdfUsing Java, please write the program for the following prompt in the.pdf
Using Java, please write the program for the following prompt in the.pdf
 
need help with code I wrote. This code is a maze gui, and i need hel.pdf
need help with code I wrote. This code is a maze gui, and i need hel.pdfneed help with code I wrote. This code is a maze gui, and i need hel.pdf
need help with code I wrote. This code is a maze gui, and i need hel.pdf
 
C++ manual Report Full
C++ manual Report FullC++ manual Report Full
C++ manual Report Full
 
Deep dive into deeplearn.js
Deep dive into deeplearn.jsDeep dive into deeplearn.js
Deep dive into deeplearn.js
 
i have written ths code as per your requirements with clear comments.pdf
i have written ths code as per your requirements with clear comments.pdfi have written ths code as per your requirements with clear comments.pdf
i have written ths code as per your requirements with clear comments.pdf
 
Graphics practical lab manual
Graphics practical lab manualGraphics practical lab manual
Graphics practical lab manual
 
Pointer
PointerPointer
Pointer
 
Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
This is Function Class public abstract class Function {    .pdf
This is Function Class public abstract class Function {    .pdfThis is Function Class public abstract class Function {    .pdf
This is Function Class public abstract class Function {    .pdf
 
DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDY
DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDYDATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDY
DATASTRUCTURES PPTS PREPARED BY M V BRAHMANANDA REDDY
 
SaveI need help with this maze gui that I wrote in java, I am tryi.pdf
SaveI need help with this maze gui that I wrote in java, I am tryi.pdfSaveI need help with this maze gui that I wrote in java, I am tryi.pdf
SaveI need help with this maze gui that I wrote in java, I am tryi.pdf
 

Recently uploaded

Discover the Beauty and Functionality of The Expert Remodeling Service
Discover the Beauty and Functionality of The Expert Remodeling ServiceDiscover the Beauty and Functionality of The Expert Remodeling Service
Discover the Beauty and Functionality of The Expert Remodeling Service
obriengroupinc04
 
Lukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptxLukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptx
pavelborek
 
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
concepsionchomo153
 
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdfRegistered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
dazzjoker
 
Kirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip GEM Royalty TNR Gold Copper PresentationKirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
Prescriptive analytics BA4206 Anna University PPT
Prescriptive analytics BA4206 Anna University PPTPrescriptive analytics BA4206 Anna University PPT
Prescriptive analytics BA4206 Anna University PPT
Freelance
 
Kalyan chart 6366249026 India satta Matta Matka 143 jodi fix
Kalyan chart 6366249026 India satta Matta Matka 143 jodi fixKalyan chart 6366249026 India satta Matta Matka 143 jodi fix
Kalyan chart 6366249026 India satta Matta Matka 143 jodi fix
satta Matta matka 143 Kalyan chart jodi 6366249026
 
list of states and organizations .pdf
list of  states  and  organizations .pdflist of  states  and  organizations .pdf
list of states and organizations .pdf
Rbc Rbcua
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...
Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...
Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...
Adani case
 
IMG_20240615_091110.pdf dpboss guessing
IMG_20240615_091110.pdf dpboss  guessingIMG_20240615_091110.pdf dpboss  guessing
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
taqyea
 
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian MatkaDpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian Matka
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
AI Transformation Playbook: Thinking AI-First for Your Business
AI Transformation Playbook: Thinking AI-First for Your BusinessAI Transformation Playbook: Thinking AI-First for Your Business
AI Transformation Playbook: Thinking AI-First for Your Business
Arijit Dutta
 
Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...
Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...
Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...
dpbossdpboss69
 
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Herman Kienhuis
 
Kirill Klip GEM Royalty TNR Gold Lithium Presentation
Kirill Klip GEM Royalty TNR Gold Lithium PresentationKirill Klip GEM Royalty TNR Gold Lithium Presentation
Kirill Klip GEM Royalty TNR Gold Lithium Presentation
Kirill Klip
 

Recently uploaded (20)

Discover the Beauty and Functionality of The Expert Remodeling Service
Discover the Beauty and Functionality of The Expert Remodeling ServiceDiscover the Beauty and Functionality of The Expert Remodeling Service
Discover the Beauty and Functionality of The Expert Remodeling Service
 
Lukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptxLukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptx
 
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
 
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdfRegistered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
 
Kirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip GEM Royalty TNR Gold Copper PresentationKirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip GEM Royalty TNR Gold Copper Presentation
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
Prescriptive analytics BA4206 Anna University PPT
Prescriptive analytics BA4206 Anna University PPTPrescriptive analytics BA4206 Anna University PPT
Prescriptive analytics BA4206 Anna University PPT
 
Kalyan chart 6366249026 India satta Matta Matka 143 jodi fix
Kalyan chart 6366249026 India satta Matta Matka 143 jodi fixKalyan chart 6366249026 India satta Matta Matka 143 jodi fix
Kalyan chart 6366249026 India satta Matta Matka 143 jodi fix
 
list of states and organizations .pdf
list of  states  and  organizations .pdflist of  states  and  organizations .pdf
list of states and organizations .pdf
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...
Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...
Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...
 
IMG_20240615_091110.pdf dpboss guessing
IMG_20240615_091110.pdf dpboss  guessingIMG_20240615_091110.pdf dpboss  guessing
IMG_20240615_091110.pdf dpboss guessing
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
一比一原版(QMUE毕业证书)英国爱丁堡玛格丽特女王大学毕业证文凭如何办理
 
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian MatkaDpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian Matka
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
AI Transformation Playbook: Thinking AI-First for Your Business
AI Transformation Playbook: Thinking AI-First for Your BusinessAI Transformation Playbook: Thinking AI-First for Your Business
AI Transformation Playbook: Thinking AI-First for Your Business
 
Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...
Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...
Call 8867766396 Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Indian M...
 
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
Presentation by Herman Kienhuis (Curiosity VC) on Investing in AI for ABS Alu...
 
Kirill Klip GEM Royalty TNR Gold Lithium Presentation
Kirill Klip GEM Royalty TNR Gold Lithium PresentationKirill Klip GEM Royalty TNR Gold Lithium Presentation
Kirill Klip GEM Royalty TNR Gold Lithium Presentation
 

AI Lab menu for ptu students and easy to use and best quality and help for 6th sem students

  • 1. ST. SOLDIER INSTITUTE OF ENGINEERING & TECHNOLOGY SESSION – (2019-2023) PRACTICAL FILE OF ARTIFICIAL INTELLIGENCE LAB (BTCS 605-18)
  • 2. INDEX SR. NO. AIM PAGE NO. SIGNATURE 1. Write a programme to conduct uninformed and informed search. 1 – 12 2. Write a programme to conduct game search. 13 – 17 3. Write a programme to construct a Bayesian network from given data. 18 – 23 4. Write a programme to infer from the Bayesian network. 24 – 28 5. Write a programme to run value and policy iteration in a grid world. 29 – 33 6. Write a programme to do reinforcement learning in a grid world 34 - 40
  • 3. EXPERIMENT N0. 1 AIM : Write a program to conduct uniformed and informed search. THEORY : Uninformed Search Algorithm Uninformed search is a class of general-purpose search algorithms which operates in brute force-way. Uninformed search algorithms do not have additional information about state or search space other than how to traverse the tree, so it is also called blind search. Following are the various types of uninformed search algorithms: 1.Breadth-first Search 2.Depth-first Search 1. Breadth-first Search: Breadth-first search is the most common search strategy for traversing a tree or graph. This algorithm searches breadthwise in a tree or graph, so it is called breadth-first search. BFS algorithm starts searching from the root node of the tree and expands all successor node at the current level before moving to nodes of next level. The breadth-first search algorithm is an example of a general-graph search algorithm. Breadth-first search implemented using FIFO queue data structure. Code for BFS : #include<bits/stdc++.h> using namespace std;
  • 4. // This class represents a directed graph using // adjacency list representation class Graph { int V; // No. of vertices // Pointer to an array containing adjacency // lists vector<list<int>> adj; public: Graph(int V); // Constructor // function to add an edge to graph void addEdge(int v, int w); // prints BFS traversal from a given source s void BFS(int s); }; Graph::Graph(int V) { this->V = V; adj.resize(V); }
  • 5. void Graph::addEdge(int v, int w) { adj*v+.push_back(w); // Add w to v’s list. } void Graph::BFS(int s) { // Mark all the vertices as not visited vector<bool> visited; visited.resize(V,false); // Create a queue for BFS list<int> queue; // Mark the current node as visited and enqueue it visited[s] = true; queue.push_back(s); while(!queue.empty()) { // Dequeue a vertex from queue and print it s = queue.front(); cout << s << " ";
  • 6. queue.pop_front(); // Get all adjacent vertices of the dequeued // vertex s. If a adjacent has not been visited, // then mark it visited and enqueue it for (auto adjecent: adj[s]) { if (!visited[adjecent]) { visited[adjecent] = true; queue.push_back(adjecent); } } } } // Driver program to test methods of graph class int main() { // Create a graph given in the above diagram Graph g(4);
  • 7. g.addEdge(0, 1); g.addEdge(0, 2); g.addEdge(1, 2); g.addEdge(2, 0); g.addEdge(2, 3); g.addEdge(3, 3); cout << "Following is Breadth First Traversal " << "(starting from vertex 2) n"; g.BFS(2); return 0; } 2. Depth-first Search Depth-first search isa recursive algorithm for traversing a tree or graph data structure. It is called the depth-first search because it starts from the root node and follows each path to its greatest depth node before moving to the next path. DFS uses a stack data structure for its implementation. The process of the DFS algorithm is similar to the BFS algorithm
  • 8. EXPERIMENT NO. 1 AIM: Write a program to conduct uniformed and informed search Output: Following is Breadth First Traversal (starting from vertex 2) 2 0 3 1
  • 9. Code for DFS: #include <bits/stdc++.h> using namespace std; // Graph class represents a directed graph // using adjacency list representation class Graph { public: map<int, bool> visited; map<int, list<int> > adj; // function to add an edge to graph void addEdge(int v, int w); // DFS traversal of the vertices // reachable from v void DFS(int v); }; void Graph::addEdge(int v, int w) { adj*v+.push_back(w); // Add w to v’s list. }
  • 10. void Graph::DFS(int v) { // Mark the current node as visited and // print it visited[v] = true; cout << v << " "; // Recurse for all the vertices adjacent // to this vertex list<int>::iterator i; for (i = adj[v].begin(); i != adj[v].end(); ++i) if (!visited[*i]) DFS(*i); } // Driver code int main() { // Create a graph given in the above diagram Graph g; g.addEdge(0, 1); g.addEdge(0, 2);
  • 11. g.addEdge(1, 2); g.addEdge(2, 0); g.addEdge(2, 3); g.addEdge(3, 3); cout << "Following is Depth First Traversal" " (starting from vertex 2) n"; g.DFS(2); return 0; } Informed Search: Informed Search algorithms have information on the goal state which helps in more efficient searching. This information is obtained by a function that estimates how close a state is to the goal state. In the informed search main algorithm which is given below: 1.A* Search Algorithm A* search is the most commonly known form of best-first search. It uses heuristic function h(n), and cost to reach the node n from the start state g(n). It has combined features of UCS and greedy best-first search, by which it solve the problem efficiently. A* search algorithm finds the shortest path through the search space using the heuristic function. This
  • 12. search algorithm expands less search tree and provides optimal result faster. A* algorithm is similar to UCS except that it uses g(n)+h(n) instead of g(n). Code for A* algorithm #include <iostream> #include "source/AStar.hpp" int main() { AStar::Generator generator; // Set 2d map size. generator.setWorldSize({25, 25}); // You can use a few heuristics : manhattan, euclidean or octagonal. generator.setHeuristic(AStar::Heuristic::euclidean); generator.setDiagonalMovement(true); std::cout << "Generate path ... n"; // This method returns vector of coordinates from target to source. auto path = generator.findPath({0, 0}, {20, 20}); for(auto& coordinate : path) { std::cout << coordinate.x << " " << coordinate.y << "n"; } }
  • 13. OUTPUT OF A* algorithm
  • 14. OUTPUT : Following is Depth First Traversal (starting from vertex 2) 2 0 1 3
  • 15. EXPERIMENT N0. 2 AIM :Write a program to conduct game search. THEORY :Game playing was one of the first tasks undertaken in Artificial Intelligence. Game theory has its history from 1950, almost from the days when computers became programmable. The very first game that is been tackled in AI is chess. Initiators in the field of game theory in AI were Konard Zuse (the inventor of the first programmable computer and the first programming language), Claude Shannon (the inventor of information theory), Norbert Wiener (the creator of modern control theory), and Alan Turing. Since then, there has been a steady progress in the standard of play, to the point that machines have defeated human champions (although not every time) in chess and backgammon, and are competitive in many other games. Types of Game 1. Perfect Information Game: In which player knows all the possible moves of himself and opponent and their results. E.g. Chess.2 2. Imperfect Information Game: In which player does not know all the possible moves of the opponent. E.g. Bridge since all the cards are not visible to player Mini-Max Algorithm in Artificial Intelligence: Mini-max algorithm is a recursive or backtracking algorithm which is used in decision-making and game theory. It provides an optimal move for the
  • 16. player assuming that opponent is also playing optimally. Mini-Max algorithm uses recursion to search through the game-tree. Min-Max algorithm is mostly used for game playing in AI. Such as Chess, Checkers, tic-tac-toe, go, and various tow-players game. This Algorithm computes the minimax decision for the current state .In this algorithm two players play the game, one is called MAX and other is called MIN. Both the players fight it as the opponent player gets the minimum benefit while they get the maximum benefit. Both Players of the game are opponent of each other, where MAX will select the maximized value and MIN will select the minimized value. The minimax algorithm performs a depth-first search algorithm for the exploration of the complete game tree. The minimax algorithm proceeds all the way down to the terminal node of the tree, then backtrack the tree as the recursion. Code for minmax algorithm : // A simple C++ program to find // maximum score that // maximizing player can get. #include<bits/stdc++.h> using namespace std; // Returns the optimal value a maximizer can obtain. // depth is current depth in game tree. // nodeIndex is index of current node in scores[]. // isMax is true if current move is // of maximizer, else false // scores[] stores leaves of Game tree. // h is maximum height of Game tree int minimax(int depth, int nodeIndex, bool isMax, int scores[], int h) { // Terminating condition. i.e
  • 17. // leaf node is reached if (depth == h) return scores[nodeIndex]; // If current move is maximizer, // find the maximum attainable // value if (isMax) return max(minimax(depth+1, nodeIndex*2, false, scores, h), minimax(depth+1, nodeIndex*2 + 1, false, scores, h)); // Else (If current move is Minimizer), find the minimum // attainable value else return min(minimax(depth+1, nodeIndex*2, true, scores, h), minimax(depth+1, nodeIndex*2 + 1, true, scores, h)); } // A utility function to find Log n in base 2 int log2(int n) { return (n==1)? 0 : 1 + log2(n/2); } // Driver code int main()
  • 18. { // The number of elements in scores must be // a power of 2. int scores[] = {3, 5, 2, 9, 12, 5, 23, 23}; int n = sizeof(scores)/sizeof(scores[0]); int h = log2(n); int res = minimax(0, 0, true, scores, h); cout << "The optimal value is : " << res << endl; return 0; }
  • 19. EXPERIMENT N0. 2 AIM : Write a program to conduct game search Output: The optimal value is: 12
  • 20. EXPERIMENT N0. 3 AIM :Write a program to construct a Bayesian network from given data THEORY:A Bayesian network is a directed acyclic graph in which each edge corresponds to a conditional dependency, and each node corresponds to a unique random variable. Bayesian network consists of two major parts: a directed acyclic graph and a set of conditional probability distributions 1. The directed acyclic graph is a set of random variables represented by nodes. 2. The conditional probability distribution of a node (random variable) is defined for every possible outcome of the preceding causal node(s). For illustration, consider the following example. Suppose we attempt to turn on our computer, but the computer does not start (observation/evidence). We would like to know which of the possible causes of computer failure is more likely. In this simplified illustration, we assume only two possible causes of this misfortune: electricity failure and computer malfunction. The corresponding directed acyclic graph is depicted in below figure causes evidence Electricity failure Computer malfunction Computer failure
  • 21. The goal is to calculate the posterior conditional probability distribution of each of the possible unobserved causes given the observed evidence, i.e. P [Cause | Evidence]. Data Set: Title: Heart Disease Databases The Cleveland database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The "Heartdisease" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. Database: 0 1 2 3 4 Total Cleveland: 164 55 36 35 13 303 Attribute Information: 1. age: age in years 2. sex: sex (1 = male; 0 = female) 3. cp: chest pain type -anginal pain
  • 22. 4. trestbps: resting blood pressure (in mm Hg on admission to the hospital) 5. chol: serum cholestoral in mg/dl 6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) 7. restecg: resting electrocardiographic results -T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) hypertrophy by Estes' criteria 8. thalach: maximum heart rate achieved 9. exang: exercise induced angina (1 = yes; 0 = no) 10. oldpeak = ST depression induced by exercise relative to rest 11.slope: the slope of the peak exercise ST segment ping 12. ca = number of major vessels (0-3) colored by flourosopy 13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
  • 23. 14.Heartdisease: It is integer valued from 0 (no presence) to 4. Diagnosis of heart disease (angiographic disease status) Some instance from the dataset: age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal Heart disease 63 1 1 145 1 1 2 150 0 2.3 3 0 6 0 67 1 4 160 4 0 2 108 1 1.5 2 3 3 2 67 1 4 120 4 0 2 129 1 2.6 2 2 7 1 41 0 2 130 2 0 2 172 0 1.4 1 O 3 0 62 0 4 140 4 0 2 160 0 3.6 3 2 3 3 60 1 4 130 206 0 2 132 1 2.4 2 2 7 4 Code : import numpy as np import csv import pandas as pd from pgmpy.models import BayesianModel from pgmpy.estimators import MaximumLikelihoodEstimator from pgmpy.inference import VariableElimination #read Cleveland Heart Disease data heartDisease = pd.read_csv('heart.csv') heartDisease = heartDisease.replace('?',np.nan) #display the data print('Few examples from the dataset are given below') print(heartDisease.head()) #Model Bayesian Network
  • 24. Model=BayesianModel([('age','trestbps'),('age','fbs'), ('sex','trestbps'),('exang','trestbps'),('trestbps','heartdise ase'),('fbs','heartdisease'),('heartdisease','restecg'), ('heartdisease','thalach'),('heartdisease','chol')]) #Learning CPDs using Maximum Likelihood Estimators print('n Learning CPD using Maximum likelihood estimators') model.fit(heartDisease,estimator=MaximumLikelihoodEstimator) # Inferencing with Bayesian Network print('n Inferencing with Bayesian Network:') HeartDisease_infer = VariableElimination(model) #computing the Probability of HeartDisease given Age print('n 1. Probability of HeartDisease given Age=30') q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'age':28}) print(q['heartdisease']) #computing the Probability of HeartDisease given cholesterol print('n 2. Probability of HeartDisease given cholesterol=100') q=HeartDisease_infer.query(variables=['heartdisease'],evidence ={'chol':100}) print(q['heartdisease'])
  • 25. EXPERIMENT N0. 3 AIM : Write a program to construct a Bayesian network from given data Output: Few examples from the dataset are given below Age sex cp trestbps ... slope ca thal heartdisease 0 63 1 1 145 … 3 0 6 0 1 67 1 4 160 … 2 3 3 2 2 67 1 4 120 … 2 2 7 1 3 37 1 3 130 … 3 0 3 0 4 41 0 2 130 … 1 0 3 0
  • 26. EXPERIMENT N0. 4 AIM:Write a program to infer from the Bayesian network. THEORY :An acyclic directed graph is used to create a Bayesian network, which is a probability model. It’s factored by utilizing a single conditional probability distribution for each variable in the model, whose distribution is based on the parents in the graph. The simple principle of probability underpins Bayesian models. So, first, let’s define conditional probability and joint probability distribution. Conditional Probability Conditional probability is a measure of the likelihood of an event occurring provided that another event has already occurred (through assumption, supposition, statement, or evidence). If A is the event of interest and B is known or considered to have occurred, the conditional probability of A given B is generally stated as P(A|B) or, less frequently, PB(A) if A is the event of interest and B is known or thought to have occurred. This can also be expressed as a percentage of the likelihood of B crossing with A: Joint Probability The chance of two (or more) events together is known as the joint probability. The sum of the probabilities of two or more random variables is the joint probability distribution. For example, the joint probability of events A and B is expressed formally as: The letter P is the first letter of the alphabet (A and B).
  • 27. The upside-down capital “U” operator or, in some situations, a comma “,” represents the “and” or conjunction. P(A ^ B) P(A, B) By multiplying the chance of event A by the likelihood of event B, the combined probability for occurrences A and B is calculated. Posterior Probability In Bayesian statistics, the conditional probability of a random occurrence or an ambiguous assertion is the conditional probability given the relevant data or background. “After taking into account the relevant evidence pertinent to the specific subject under consideration,” “posterior” means in this case. The probability distribution of an unknown quantity interpreted as a random variable based on data from an experiment or survey is known as the posterior probability distribution. Inferencing with Bayesian Network In this demonstration, we’ll use Bayesian Networks to solve the well- known Monty Hall Problem. Let me explain the Monty Hall problem to those of you who are unfamiliar with it: This problem entails a competition in which a contestant must choose one of three doors, one of which conceals a price. The show’s host (Monty) unlocks an empty door and asks the contestant if he wants to swap to the other door after the contestant has chosen one. The decision is whether to keep the current door or replace it with a new one. It is preferable to enter by the other door because the price is more likely to
  • 28. be higher. To come out from this ambiguity let’s model this with a Bayesian network. CODE : from pgmpy.models import BayesianNetwork from pgmpy.factors.discrete import TabularCPD import networkx as nx import pylab as plt # Defining Bayesian Structure model = BayesianNetwork([('Guest', 'Host'), ('Price', 'Host')]) # Defining the CPDs: cpd_guest = TabularCPD('Guest', 3, [[0.33], [0.33], [0.33]]) cpd_price = TabularCPD('Price', 3, [[0.33], [0.33], [0.33]]) cpd_host = TabularCPD('Host', 3, [[0, 0, 0, 0, 0.5, 1, 0, 1, 0.5], [0.5, 0, 1, 0, 0, 0, 1, 0, 0.5], [0.5, 1, 0, 1, 0.5, 0, 0, 0, 0]], evidence=['Guest', 'Price'], evidence_card=[3, 3]) # Associating the CPDs with the network structure. model.add_cpds(cpd_guest, cpd_price, cpd_host) model.check_model() # Infering the posterior probability from pgmpy.inference import VariableElimination
  • 29. infer = VariableElimination(model) posterior_p = infer.query(['Host'], evidence={'Guest': 2, 'Price': 2}) print(posterior_p)
  • 30. EXPERIMENT N0. 4 AIM: Write a program to infer from the Bayesian network OUTPUT :
  • 31. EXPERIMENT NO. 5 AIM:Write a program to run value and policy iteration in a grid world THEORY :Value Iteration With the tools we have explored until now, a new question arises: why do we need to consider an initial policy at all? The idea of the value iteration algorithm is that we can compute the value function without a policy. Instead of letting the policy, π, dictate which actions are selected, we will select those actions that maximize the expected reward: CODE FOR VALUE ITERATION : def valueIteration(self, gridWorld, gamma = 1): self.resetPolicy() # ensure empty policy before calling evaluatePolicy V_old = None V_new = np.repeat(0, gridWorld.size()) convergedCellIndices = np.zeros(0) while len(convergedCellIndices) != len(V_new): V_old = V_new
  • 32. V_new = self.evaluatePolicySweep(gridWorld, V_old, gamma, convergedCellIndices) convergedCellIndices = self.findConvergedCells(V_old, V_new) greedyPolicy = findGreedyPolicy(V_new, gridWorld, self.gameLogic) self.setPolicy(greedyPolicy) self.setWidth(gridWorld.getWidth()) self.setHeight(gridWorld.getHeight()) return(V_new) POLICY ITERATION : A simple strategy for this is a greedy algorithm that iterates over all the cells in the grid and then chooses the action that maximizes the expected reward according to the value function. This approach implicitly determines the action-value function, which is defined as Qπ(s,a)=∑s′Pass′*Rass′+γVπ(s′)+Qπ(s,a)=∑s′Pss′a*Rss′a+γVπ(s′)+ The improvePolicy function determines the value function of a policy (if it’s not available yet) and then calls findGreedyPolicy to identify the optimal action for every state: def improvePolicy(policy, gridWorld, gamma = 1): policy = copy.deepcopy(policy) # dont modify old policy if len(policy.values) == 0: # policy needs to be evaluated first policy.evaluatePolicy(gridWorld) greedyPolicy = findGreedyPolicy(policy.getValues(), gridWorld, policy.gameLogic, gamma) policy.setPolicy(greedyPolicy) return policy def findGreedyPolicy(values, gridWorld, gameLogic, gamma = 1): # create a greedy policy based on the values param
  • 33. stateGen = StateGenerator() greedyPolicy = [Action(Actions.NONE)] * len(values) for (i, cell) in enumerate(gridWorld.getCells()): gridWorld.setActor(cell) if not cell.canBeEntered(): continue maxPair = (Actions.NONE, -np.inf) for actionType in Actions: if actionType == Actions.NONE: continue proposedCell = gridWorld.proposeMove(actionType) if proposedCell is None: # action is nonsensical in this state continue Q = 0.0 # action-value function proposedStates = stateGen.generateState(gridWorld, actionType, cell) for proposedState in proposedStates: actorPos = proposedState.getIndex() transitionProb = gameLogic.getTransitionProbability(cell, proposedState, actionType, gridWorld) reward = gameLogic.R(cell, proposedState, actionType) expectedValue = transitionProb * (reward + gamma * values[actorPos]) Q += expectedValue if Q > maxPair[1]: maxPair = (actionType, Q) gridWorld.unsetActor(cell) # reset state greedyPolicy[i] = Action(maxPair[0]) return greedyPolicy
  • 34. EXPERIMENT NO. 5 AIM :Write a program to run value and policy iteration in a grid world. OUTPUT :
  • 36. EXPERIMENT NO : 6 AIM :Write a program to do reinforcement learning in a grid world . THEORY : Reinforcement Learning (RL) involves decision making under uncertainty which tries to maximize return over successive states.There are four main elements of a Reinforcement Learning system: a policy, a reward signal, a value function. The policy is a mapping from the states to actions or a probability distribution of actions. Every action the agent takes results in a numerical reward. The agent’s sole purpose is to maximize the reward in the long run. Reinforcement Learning involves decision making under uncertainty which tries to maximize return over successive states.There are four main elements of a Reinforcement Learning system: a policy, a reward signal, a value function. The policy is a mapping from the states to actions or a probability distribution of actions. Every action the agent takes results in a numerical reward. The agent’s sole purpose is to maximize the reward in the long run. The reward indicates the immediate return, a value function specifies the return in the long run. Value of a state is the expected reward that an agent can accrue The agent/robot takes an action in At in state St and moves to state S’t anf gets a reward Rt+1 as shown
  • 37. An agent will seek to maximize the overall return as it transition across states The expected return can be expressed as where is the expected return in time t and the discounted expected return in time t+1 A policy is a mapping from states to probabilities of selecting each possible action. If the agent is following policy at time t, then is the probability that = a if = s. The value function of a state s under a policy , denoted , is the expected return when starting in s and following thereafter This can be written as = Similarly the action value function gives the expected return when taking an action ‘a’ in state ‘s’ These are Bellman’s equation for the state value function The Bellman equations give the equation for each of the state The Bellman optimality equations give the optimal policy of choosing specific actions in specific states to achieve the maximum reward and reach the goal efficiently. They are given as The Bellman equations cannot be used directly in goal directed problems and dynamic programming is used instead where the value functions are computed iteratively In the problem below the Maze has 2 end states as shown in the corner. There are four possible actions in each state up, down, right and left. If an action in a state takes it out of the grid then the agent remains in the same state. All actions have a reward of -1 while the end states have a reward of 0
  • 38. This is shown as where the reward for any transition is Rt=−1Rt=−1 except the transition to the end states at the corner which have a reward of 0. The policy is a uniform policy with all actions being equi-probable with a probability of 1/4 or 0.25 1. Gridworld-1 In [1]: import numpy as np import random In [2]: gamma = 1 # discounting rate gridSize = 4 rewardValue = -1 terminationStates = [[0,0], [gridSize-1, gridSize-1]] actions = [[-1, 0], [1, 0], [0, 1], [0, -1]] numIterations = 1000 The action value provides the next state for a given action in a state and the accrued reward In [3]: def actionValue(initialPosition,action): if initialPosition in terminationStates: finalPosition = initialPosition reward=0 else: #Compute final position finalPosition = np.array(initialPosition) + np.array(action) reward= rewardValue # If the action moves the finalPosition out of the grid, stay in same cell if -1 in finalPosition or gridSize in finalPosition: finalPosition = initialPosition reward= rewardValue #print(finalPosition) return finalPosition, reward 1a. Bellman Update
  • 39. In [4]: # Initialize valueMap and valueMap1 valueMap = np.zeros((gridSize, gridSize)) valueMap1 = np.zeros((gridSize, gridSize)) states = [[i, j] for i in range(gridSize) for j in range(gridSize)] In [5]: def policy_evaluation(numIterations,gamma,theta,valueMap): for i in range(numIterations): delta=0 for state in states: weightedRewards=0 for action in actions: finalPosition,reward = actionValue(state,action) weightedRewards += 1/4* (reward + gamma * valueMap[finalPosition[0],finalPosition][1]) valueMap1[state[0],state[1]]=weightedRewards delta =max(delta,abs(weightedRewards-valueMap[state[0],state[1]])) valueMap = np.copy(valueMap1) if(delta < 0.01): print(valueMap) break In [6]: valueMap = np.zeros((gridSize, gridSize)) valueMap1 = np.zeros((gridSize, gridSize)) states = [[i, j] for i in range(gridSize) for j in range(gridSize)] policy_evaluation(1000,1,0.001,valueMap) [[ 0. -13.89528403 -19.84482978 -21.82635535] [-13.89528403 -17.86330422 -19.84586777 -19.84482978] [-19.84482978 -19.84586777 -17.86330422 -13.89528403] [-21.82635535 -19.84482978 -13.89528403 0. ]] In [7]: valueMap = np.zeros((gridSize, gridSize)) valueMap1 = np.zeros((gridSize, gridSize)) states = [[i, j] for i in range(gridSize) for j in range(gridSize)] pi = np.ones((gridSize,gridSize))/4 pi1 = np.chararray((gridSize, gridSize)) pi1[:] = 'a' In [8]: # Compute the value state function for the Grid def policy_evaluate(states,actions,gamma,valueMap): #print("iterations=",i) for state in states: weightedRewards=0 for action in actions: finalPosition,reward = actionValue(state,action) weightedRewards += 1/4* (reward + gamma * valueMap[finalPosition[0],finalPosition][1])
  • 40. # Set the computed weighted rewards to valueMap1 valueMap1[state[0],state[1]]=weightedRewards # Copy to original valueMap valueMap = np.copy(valueMap1) return(valueMap) In [9]: def argmax(q_values): idx=np.argmax(q_values) return(np.random.choice(np.where(a==a[idx])[0].tolist())) # Compute the best action in each state def greedify_policy(state,pi,pi1,gamma,valueMap): q_values=np.zeros(len(actions)) for idx,action in enumerate(actions): finalPosition,reward = actionValue(state,action) q_values[idx] += 1/4* (reward + gamma * valueMap[finalPosition[0],finalPosition][1]) # Find the index of the action for which the q_value is idx=q_values.argmax() pi[state[0],state[1]]=idx if(idx == 0): pi1[state[0],state[1]]='u' elif(idx == 1): pi1[state[0],state[1]]='d' elif(idx == 2): pi1[state[0],state[1]]='r' elif(idx == 3): pi1[state[0],state[1]]='l' In [10]: def improve_policy(pi, pi1,gamma,valueMap): policy_stable = True for state in states: old = pi[state].copy() # Greedify policy for state greedify_policy(state,pi,pi1,gamma,valueMap) if not np.array_equal(pi[state], old): policy_stable = False print(pi) print(pi1) return pi, pi1, policy_stable In [11]: def policy_iteration(gamma, theta): valueMap = np.zeros((gridSize, gridSize)) pi = np.ones((gridSize,gridSize))/4
  • 41. pi1 = np.chararray((gridSize, gridSize)) pi1[:] = 'a' policy_stable = False print("here") while not policy_stable: valueMap = policy_evaluate(states,actions,gamma,valueMap) pi,pi1, policy_stable = improve_policy(pi,pi1, gamma,valueMap) return valueMap, pi,pi1 In [12]: theta=0.1 valueMap, pi,pi1 = policy_iteration(gamma, theta) [[0. 3. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 1.] [0. 0. 2. 0.]] [[b'u' b'l' b'u' b'u'] [b'u' b'u' b'u' b'u'] [b'u' b'u' b'u' b'd'] [b'u' b'u' b'r' b'u']] [[0. 3. 3. 0.] [0. 0. 0. 1.] [0. 0. 1. 1.] [0. 2. 2. 0.]] [[b'u' b'l' b'l' b'u'] [b'u' b'u' b'u' b'd'] [b'u' b'u' b'd' b'd'] [b'u' b'r' b'r' b'u']] [[0. 3. 3. 1.] [0. 0. 1. 1.] [0. 0. 1. 1.] [0. 2. 2. 0.]] [[b'u' b'l' b'l' b'd'] [b'u' b'u' b'd' b'd'] [b'u' b'u' b'd' b'd'] [b'u' b'r' b'r' b'u']] [[0. 3. 3. 1.] [0. 0. 1. 1.] [0. 0. 1. 1.] [0. 2. 2. 0.]] [[b'u' b'l' b'l' b'd'] [b'u' b'u' b'd' b'd'] [b'u' b'u' b'd' b'd'] [b'u' b'r' b'r' b'u']]
  • 42. EXPERIMENT NO : 6 AIM :Write a program to do reinforcement learning in a grid world output: The valueMap shows the optimal path from any state