Learning in AI

LEARNING IN AI
Prof.Mrs.Minakshi P.Atre, PVGCOET,
SPPU

Basic Learning Model
 Learning agent’s components
 learning element -- the part of the agent responsible for
improving its performance
 performance element -- the part that chooses the actions
to take
 critic -- tells the learning element how the agent is doing
 problem generator -- suggests actions that could lead to
new, informative experiences (suboptimal from the point of
view of the performance element, but designed to improve
that element)

Issues in designing learning
system
 components -- which parts of the
performance element are to be improved
 representation of those components
 feedback available to the system
 prior information available to the system

All learning can be thought of as
learning the representation of a
function.

Types of
Learning
Speed up
learning
Learning
by taking
advice
Learning
from
example
Clustering
Learning
by
analogy
discovery

1. Speed up learning
 A type of deductive learning that requires no
additional input, but improves the agent's
performance over time. There are two kinds,
rote learning and generalization (e.g., EBL).
Data caching is an example of how it would be
used.

2. Learning by taking advice
 Deductive learning in which the system can
reason about new information added to its
knowledge base.
 McCarthy proposed the "advice taker" which
was such a system, and TEIRESIAS [Davis,
1976] was the first such system.

3. Learning from example
 Inductive learning in which concepts are
learned from sets of labeled instances.

4. Clustering
 Unsupervised, inductive learning in which
"natural classes" are found for data instances,
as well as ways of classifying them.
 Examples include COBWEB, AUTOCLASS.

5. Learning by Analogy
 Inductive learning in which a system transfers
knowledge from one database into a that of a
different domain.

6. Discovery
 Both inductive and deductive learning in which
an agent learns without help from a teacher.
 It is deductive if it proves theorems and
discovers concepts about those theorems;
 it is inductive when it raises conjectures.

What is Inductive Learning?
 Inductive learning is a kind of learning in which, given a
set of examples an agent tries to estimate or create an
evaluation function.
 Most inductive learning is supervised learning, in which
examples provided with classifications. (The alternative
is clustering.)
 More formally, an example is a pair (x, f(x)), where x is
the input and f(x) is the output of the function applied to
x.
 The task of pure inductive inference (or induction) is,

Bayesian Learning in Belief
Networks
 Bayesian learning maintains a number of
hypotheses about the data, each one weighted
its posterior probability when a prediction is
made
 The idea is that, rather than keeping only one
hypothesis, many are entertained, and
weighted based on their likelihoods.

 maintaining and reasoning with a large number of
hypotheses can be intractable
 most common approximation is to use a most
probable hypothesis, that is, an Hi of H that
maximizes P(Hi | D), where D is the data
 This is often called the maximum a posteriori
(MAP) hypothesis HMAP:
 P(X | D) ~= P(X | HMAP) x P(HMAP | D)

To find HMAP, we apply Bayes' rule:
 P(Hi | D) = [P(D | Hi) x P(Hi)] / P(D)
 Since P(D) is fixed across the hypotheses, we
only need to maximize the numerator
 The first term represents the probability that this
particular data set would be seen, given Hi as the
model of the world
 The second is the prior probability assigned to the
model.

Belief Network Learning
Problems
 Four kinds of belief networks
 depending upon whether the structure of the
network is known or unknown,
 and whether the variables in the network are
observable or hidden

Belief Networks
1. known structure, fully observable -- In this case
the only learnable part is the conditional probability
tables. These can be estimated directly using the
statistics of the sample data set.
2. unknown structure, fully observable -- Here the
problem is to reconstruct the network topology. The
problem can be thought of as a search through
structure space, and fitting data to each structure
reduces to the fixed-structure problem, so the MAP
or ML probability value can be used as a heuristic in
hill-climbing or SA search.

3. known structure, hidden variables -- This is
analagous to neural network learning.
4. unknown structure, hidden variables -- When
some variables are unobservable, it becomes
difficult to apply prior techniques for recovering
structure, but they require averaging over all
possible values of the unknown variables.
No good general algorithms are known for
handling this case

Comparison between NN and Belief
Networks
 Similarities
 Both kinds of network are attribute-based
representations
 Both can handle either discrete or continuous
output

Differences between NN and Belief
N/w

NN Belief N/W
 neural networks are
distributed
 nodes generally don't
represent specific
propositions, and the
calculations would not
treat them in a
semantically-
meaningful way
 belief networks are
localized
representations
 Belief network nodes
represent
propositions with
clearly defined
semantics and
relationships to other
nodes

NN Belief N/W
 effect is that human
beings can neither
construct nor
understand neural
network
representations
 both can be done
with belief networks

NN Belief N/W
 Neural network
outputs could be
values or
probabilities, but they
cannot handle both
simultaneously
 Belief networks
handle two kinds of
activation, both in
terms of the values
a proposition may
take, and the
probabilities
assigned to each

NN Belief N/W
 Trained feed-forward
neural network
inference can execute
in linear time
 a neural network may
have to be
exponentially larger to
represent the same
things that a belief
network can.
 where in belief
networks inference
is NP-hard

As for learning, belief networks have
the advantages
 being easier to give prior knowledge;
 also, since they represent propositions locally,
it may be easier for them to converge,
 since they are directly affected only by a small
number of other propositions.

What is the reinforcement
learning
 As opposed to supervised learning,
reinforcement learning takes place in an
environment where the agent cannot directly
compare the results of its action to a desired
result

Reinforcement learning
 it is given some reward or punishment that
relates to its actions
 It may win or lose a game, or be told it has
made a good move or a poor one
 job of reinforcement learning is to find a
successful function using these rewards

Where lies Reinforcement Learning
(RL)

Block Schematic and example of
RL

Supervised vs
Reinforcement Learning
 Supervised learning: has external supervisor
 supervisor has knowledge of the environment
and shares it with the agent to complete the
task
 there are some problems in which there are so
many combinations of subtasks that the agent
can perform to achieve the objective
 creating a “supervisor” is almost impractical

Example
 in a chess game, there are tens of thousands of moves that
can be played
 creating a knowledge base that can be played is a tedious
task
 In these problems, it is more feasible to learn from one’s own
experiences and gain knowledge from them
 This is the main difference that can be said of reinforcement
learning and supervised learning.
 In both supervised and reinforcement learning, there is a
mapping between input and output.
 But in reinforcement learning, there is a reward function
which acts as a feedback to the agent as opposed to

Unsupervised vs Reinforcement
Learning:
 In reinforcement learning, there’s a mapping
from input to output--not present in
unsupervised learning
 unsupervised learning, the main task is to find
the underlying patterns rather than the
mapping

Example
 if the task is to suggest a news article to a user,
an unsupervised learning algorithm will look at
similar articles which the person has previously
read and suggest anyone from them.
 Whereas a reinforcement learning algorithm will
get constant feedback from the user by
suggesting few news articles and then build a
“knowledge graph” of which articles will the
person like

Summarizing Reinforcement
Learning
 The reason reinforcement learning is harder
than supervised learning is that the agent is
never told what the right action is, only
whether it is doing well or poorly, and in some
cases (such as chess) it may only receive
feedback after a long string of actions

Two basic kinds of information an
agent can try to learn in RL
 utility function -- The agent learns the utility of
being in various states, and chooses actions to
maximize the expected utility of their outcomes.
This requires the agent keep a model of the
environment
 action-value -- The agent learns an action-value
function giving the expected utility of performing
an action in a given state. This is called Q-
learning. This is the model-free approach.

Passive Learning in a known
environment
 Def:
 Assuming an environment consisting of a set
of states, some terminal and some non-
terminal, and a model that specifies the
probabilities of transition from state to state, an
agent learns passively by observing a set of
training sequences, which consist of a set of
state transitions followed by a reward

 The goal is to use the reward information to
learn the expected utility of each of the non-
terminal states.
 An important simplifying assumption is
that the utility of a sequence is the sum of
the rewards accumulated in the states of
the sequence.
 That is, the utility function is additive

 A passive learning agent keeps an estimate U
of the utility of each state, a table N of how
many times each state was seen, and a table
M of transition probabilities.
 There are a variety of ways the agent can
update its table U

Two types of passive learning in
known environment
Passive
Learning
Naïve
Updating
Adaptive
Dynamic
Programming
Temporal
Difference
Learning

1. Naive Updating
 One simple updating method is the least mean
squares (LMS) approach [Widrow and Hoff,
1960].
 It assumes that the observed reward-to-go of a
state in a sequence provides direct evidence
of the actual reward-to-go.
 The approach is simply to keep the utility as a
running average of the rewards based upon
the number of times the state has been seen

 This approach minimizes the mean square
error with respect to the observed data
 This approach converges very slowly, because
it ignores the fact that the actual utility of a
state is the probability-weighted average of
its successors' utilities, plus its own
reward. LMS disregards these probabilities.

2.Adaptive Dynamic Programming
 If the transition probabilities and the rewards of
the states are known (which will usually
happen after a reasonably small set of training
examples), then the actual utilities can be
computed directly as
 U(i) = R(i) + SUMj MijU(j)
where U(i) is the utility of state i, R is its reward,
and Mij is the probability of transition from state i

 This is identical to a single value determination in
the policy iteration algorithm for Markov decision
processes.
 Adaptive dynamic programming is any kind of
reinforcement learning method that works by
solving the utility equations using a dynamic
programming algorithm.
 It is exact, but of course highly inefficient in large
state spaces

3. Temporal Difference Learning
 uses the difference in utility values between
successive states to adjust them from one epoch
to another
 key idea is to use the observed transitions to
adjust the values of the observed states so that
they agree with the ADP constraint equations
 Practically, this means updating the utility of state i
so that it agrees better with its successor j.

 This is done with the temporal-difference (TD)
equation:
 U(i) <- U(i) + a(R(i) + U(j) - U(i))
 where a is a learning rate parameter
Temporal difference learning is a way of
approximating the ADP constraint equations
without solving them for all possible states

 The idea generally is to define conditions that hold
over local transitions when the utility estimates are
correct, and then create update rules that nudge the
estimates toward this equation.
 This approach will cause U(i) to converge to the
correct value if the learning rate parameter decreases
with the number of times a state has been visited
[Dayan, 1992].
 In general, as the number of training sequences tends
to infinity, TD will converge on the same utilities as
ADP.

Passive Learning in an Unknown
Environment
 neither temporal difference learning nor LMS
actually use the model M of state transition
probabilities
 they will operate unchanged in an unknown
environment
 The ADP approach, however, updates its
estimated model of an unknown environment
after each step, and this model is used to
revise the utility estimates

 Any method for learning stochastic functions
can be used to learn the environment model;
 in particular, in a simple environment the
transition probability Mij is just the percentage
of times state i has transitioned to j

Basic difference between TD and
ADP:
 TD adjusts a state to agree with the observed
successor, while ADP makes a state agree with all
successors that might occur, weighted by their
probabilities
 ADP's adjustments may need to be propagated
across all of the utility equations, while TD's affect
only the current equation.
 TD is essentially a crude first approximation to

 A middle-ground can be found by bounding or
ordering the number of adjustments made in ADP,
beyond the simple one made in TD
 The prioritized-sweeping heuristic prefers only to
make adjustments to states whose likely
successors have just undergone large
adjustments in their utility estimates
 Such approximate ADP systems can be very
nearly as efficient as ADP in terms of
convergence, but operate much more quickly

Active Learning in an Unknown
Environment
 difference between active and passive agents is
that passive agents learn a fixed policy, while
the active agent must decide what action to
take and how it will affect its rewards
 To represent an active agent, the environment
model M is extended to give the probability of a
transition from a state i to a state j, given an action
a

 Utility is modified to be the reward of the state
plus the maximum utility expected depending
upon the agent's action:
 U(i) = R(i) + maxa x SUMj Ma
ijU(j)
 An ADP agent is extended to learn transition
probabilities given actions; this is simply another
dimension in its transition table
 A TD agent must similarly be extended to have a
model of the environment.

Learning with knowledge : Tree
Learning with
knowledge
Explanation
Based
Learning(EBL)
Relevance
Based Learning
Knowledge
Based Inductive
Learning

Learning with knowledge
 considering the kinds of logical constraints
placed upon different kinds of knowledge-
based learning, we can classify them more
clearly
 Examples are composed of Descriptions and
Classifications, and we are trying to find a
Hypothesis to explain the data

 Inductive learning can be characterized by the
following entailment constraint:
 Hypothesis ^ Descriptions |= Classifications
 given our hypothesis and descriptions of
problem instances, we want to generate
classifications
 This is inductive learning

Other kinds of learning that use prior
knowledge are:
1) Explanation based learning (EBL)
2) Relevance based learning
3) Knowledge based inductive learning

1) Explanation based
learning(EBL)
 this kind of learning occurs when the system finds
an explanation of an instance it has seen, and
generalizes the explanation
 The general rule follows logically from the
background knowledge possessed by the system
 The entailment constraints for EBL are
 Hypothesis ^ Descriptions |= Classification
 Background |= Hypothesis

 agent does not actually learn anything
factually new, since the hypothesis was
entailed by background knowledge
 This kind of learning is regarded as a way to
convert first principles into useful specialized
knowledge (converting problem-solving search
into pattern-matching search)

 basic idea is to construct an explanation of the
observed result, and then generalize the
explanation
 More specifically, while constructing a proof of the
solution, a parallel proof is performed, in which
each constant of the first is made into a variable
 Then a new rule is built in which the left-hand side
is the leaves of the proof tree, and the right-hand
side is the variabilized goal, up to any bindings
that must be made with the generalized proof

 Any conditions true regardless of the variables are
dropped
 Note that by pruning the tree before the leaves,
even more general rules may be learned
 However, the more general, the more computation
may be required to apply the rule
 One approach is to require the operationality of
the subgoals in the new rule -- that they be "easy"
to solve

2) Relevance Based Learning
 This is a kind of learning in which background
knowledge relates the relevance of a set of
features in an instance to the general goal
predicate
 For example, if I see men in the Forum in Rome
speaking Latin, and I know that if seeing someone
in a city speaking a language usually means all
people in the city speak that language, I can
conclude Romans speak Latin

 In general, background knowledge, together
with the observations, allows the agent to form
a new, general rule to explain the observations
 The entailment constraint for RBL is
 Hypothesis ^ Descriptions |= Classifications
 Background ^ Descriptions ^ Classifications |=
Hypothesis

 This is a deductive form of learning, because it cannot
produce hypotheses that go beyond the background
knowledge and observations
 We presume that our knowledge base has a set of
functional dependencies or determiners that support
the construction of hypotheses
 The learning algorithm then tries to find the minimal
consistent determination (e.g., a sentence of the form
"P determines Q," meaning that if the examples match
on P they match on Q)

3) Knowledge based inductive
learning
 This is a kind of learning in which our background
knowledge, together with our observations, lead
us to make a hypothesis that explains the
examples we see
 If I see the Old Man from Scene 24 on the Bridge
of Despair, and notice that he asks a simple
question of every other knight that attempts to
cross, I can hypothesize that only the odd-
numbered knights are able to cross the Gorge of
Eternal Peril

 The entailment constraint in this case is
 Background ^ Hypothesis ^ Descriptions |=
Classifications
 Such knowledge-based inductive learning has
been studied mainly in the field of inductive
logic programming

 Such systems reduce learning complexity in
two ways
 First, by requiring all new hypotheses to be
consistent with existing knowledge, they reduce
the search space of hypotheses
 Secondly, the more prior knowledge available,
the less new knowledge required in the
hypothesis to explain the observations

 Attribute-based learning algorithms are
incapable of learning predicates
 One of the advantages of ILP algorithms is
their much broader range of applicability

Background
 Storing and using specific instances improves
the performance of several supervised
learning algorithm
 Include algorithms that learn decision trees,
classification rules, and distributed networks
 IBL algorithms are derived from the nearest
neighbor pattern classifier

Instance based learning
 generates classification predictions using only
specific instances
 do not maintain a set of abstractions derived from
specific instances
 This approach extends the nearest neighbor
algorithm, which has large storage requirements
 storage requirements can be significantly reduced
with, at most, minor sacrifices in learning rate and
classification accuracy

 While the storage-reducing algorithm performs
well on several real world databases, its
performance degrades rapidly with the level of
attribute noise in training instances
 save and use only selected instances to
generate classification predictions

Using specific instances in
supervised learning algorithms
 decreases the costs incurred
 when updating concept descriptions, increases
learning rates,
 allows for the representation of probabilistic
concept descriptions,
 and focuses theory-based reasoning in real-
world applications

Instance-based learning algorithms
suffer from several problems
 they are computationally expensive classifiers since
they save all training instances,
 they are intolerant of attribute noise,
 they are intolerant of irrelevant attributes,
 they are sensitive to the choice of the algorithm's
similarity function,
 there is no natural way to work with nominal-valued
attributes or missing attributes, and
 they provide little usable information regarding the
structure of the data

Overview of IBL
 Learning task : supervised learning or learning
from examples
 Only input is a sequence of instances
 Each instance is assumed to be represented by a
set of attribute-value pairs (?? Next slide)
 All instances are assumed to be described by the
same set of n attributes, although this restriction is
not required by the paradigm itself (Aha, 1989c)
and missing attribute values are tolerated

What are attribute-value pairs?
 An action-value function assigns an expected
utility to the result of performing a given action in a
given state
 If Q(a, i) is the value of doing action a in state i,
then
 U(i) = maxa Q(a, i)
 The equations for Q-learning are similar to those
for state-based learning agents

 The difference is that Q-learning agents do not
need models of the world. The equilibrium
equation, which can be used directly (as with
ADP agents) is
 Q(a, i) = R(i) + SUMj Ma
ij maxa' Q(a', j)
 The temporal difference version does not
require that a model be learned; its update
equation is

About attributes
 set of attributes defines an n-dimensional instance
space
 Exactly one of these attributes corresponds to the
category attribute;
 the other attributes are predictor attributes
 A category is the set of all instances in an
instance space that have the same value for their
category attribute

IBL
 IBL algorithms can learn multiple, possibly
overlapping concept descriptions simultaneously
 primary output of IBL algorithms is a concept
description (or concept)
 This is a function that maps instances to
categories: given an instance drawn from the
instance space, it yields a classification, which is
the predicted value for this instance's category
attribute

 An instance-based concept description includes a
set of stored instances and, possibly, some
information concerning their past performances
during classification
 e.g., their number of correct and incorrect
classification predictions
 This set of instances can change after each
training instance is processed

 However, IBL algorithms do not construct
extensional concept descriptions
 Instead, concept descriptions are determined
by how the IBL algorithm's selected similarity
and classification functions use the current set
of saved instances

IBL framework components
 Similarity Function:
 This computes the similarity between a training
instance i and the instances in the concept
description
 Similarities are numeric-valued

 Classification Function:
 This receives the similarity function's results and
the classification performance records of the
instances in the concept description
 It yields a classification for i

 Concept Description Updater:
 This maintains records on classification
performance and decides which instances to
include in the concept description
 Inputs include i, the similarity results, the
classification results, and a current concept
description
 It yields the modified concept description.

 The similarity and classification functions
determine how the set of saved instances in
the concept description are used to predict
values for the category attribute
 Therefore, IBL concept descriptions not only
contain a set of instances, but also include
these two functions.

 IBL algorithms assume that similar instances have
similar classifications
 This leads to their local bias for classifying novel
instances according to their most similar neighbor's
classification
 IBL algorithms also assume that, without prior
knowledge, attributes will have equal relevance for
classification decisions (i.e., by having equal weight in
the similarity function)
 This bias is achieved by normalizing each attribute's
range of possible values

Summary
 IBL algorithms differ from most other supervised
learning methods:
 they don't construct explicit abstractions such as
decision trees or rules
 Most learning algorithms derive generalizations
from instances when they are presented and use
simple matching procedures to classify
subsequently presented instances

Performance Dimensions
 1) Generality: This is the class of concepts which
are describable by the representation and
learnable by the algorithm
 We will show that IBL algorithms can pac-learn
(Valiant, 1984) any concept whose boundary is a
union of a finite number of closed hyper-curves of
finite size
 2) Accuracy: This is the concept descriptions'
classification accuracy.

 3) Learning Rate: This is the speed at which
classification accuracy increases during training
 It is a more useful indicator of the performance of the
learning algorithm than is accuracy for finite-sized training
sets
 4) Incorporation Costs: These are incurred while
updating the concept descriptions with a single
training instance
 They include classification costs
 5) Storage Requirement: This is the size of the

Learning in AI

More Related Content

What's hot

Similar to Learning in AI

More from Minakshi Atre

Recently uploaded

Learning in AI