LEARNING IN AI
Prof.Mrs.Minakshi P.Atre, PVGCOET,
SPPU
Basic Learning Model
 Learning agent’s components
 learning element -- the part of the agent responsible for
improving its performance
 performance element -- the part that chooses the actions
to take
 critic -- tells the learning element how the agent is doing
 problem generator -- suggests actions that could lead to
new, informative experiences (suboptimal from the point of
view of the performance element, but designed to improve
that element)
Issues in designing learning
system
 components -- which parts of the
performance element are to be improved
 representation of those components
 feedback available to the system
 prior information available to the system
All learning can be thought of as
learning the representation of a
function.
Types of
Learning
Speed up
learning
Learning
by taking
advice
Learning
from
example
Clustering
Learning
by
analogy
discovery
1. Speed up learning
 A type of deductive learning that requires no
additional input, but improves the agent's
performance over time. There are two kinds,
rote learning and generalization (e.g., EBL).
Data caching is an example of how it would be
used.
2. Learning by taking advice
 Deductive learning in which the system can
reason about new information added to its
knowledge base.
 McCarthy proposed the "advice taker" which
was such a system, and TEIRESIAS [Davis,
1976] was the first such system.
3. Learning from example
 Inductive learning in which concepts are
learned from sets of labeled instances.
4. Clustering
 Unsupervised, inductive learning in which
"natural classes" are found for data instances,
as well as ways of classifying them.
 Examples include COBWEB, AUTOCLASS.
5. Learning by Analogy
 Inductive learning in which a system transfers
knowledge from one database into a that of a
different domain.
6. Discovery
 Both inductive and deductive learning in which
an agent learns without help from a teacher.
 It is deductive if it proves theorems and
discovers concepts about those theorems;
 it is inductive when it raises conjectures.
What is Inductive Learning?
 Inductive learning is a kind of learning in which, given a
set of examples an agent tries to estimate or create an
evaluation function.
 Most inductive learning is supervised learning, in which
examples provided with classifications. (The alternative
is clustering.)
 More formally, an example is a pair (x, f(x)), where x is
the input and f(x) is the output of the function applied to
x.
 The task of pure inductive inference (or induction) is,
Bayesian Learning in Belief
Networks
 Bayesian learning maintains a number of
hypotheses about the data, each one weighted
its posterior probability when a prediction is
made
 The idea is that, rather than keeping only one
hypothesis, many are entertained, and
weighted based on their likelihoods.
 maintaining and reasoning with a large number of
hypotheses can be intractable
 most common approximation is to use a most
probable hypothesis, that is, an Hi of H that
maximizes P(Hi | D), where D is the data
 This is often called the maximum a posteriori
(MAP) hypothesis HMAP:
 P(X | D) ~= P(X | HMAP) x P(HMAP | D)
To find HMAP, we apply Bayes' rule:
 P(Hi | D) = [P(D | Hi) x P(Hi)] / P(D)
 Since P(D) is fixed across the hypotheses, we
only need to maximize the numerator
 The first term represents the probability that this
particular data set would be seen, given Hi as the
model of the world
 The second is the prior probability assigned to the
model.
Belief Network Learning
Problems
 Four kinds of belief networks
 depending upon whether the structure of the
network is known or unknown,
 and whether the variables in the network are
observable or hidden
Belief Networks
1. known structure, fully observable -- In this case
the only learnable part is the conditional probability
tables. These can be estimated directly using the
statistics of the sample data set.
2. unknown structure, fully observable -- Here the
problem is to reconstruct the network topology. The
problem can be thought of as a search through
structure space, and fitting data to each structure
reduces to the fixed-structure problem, so the MAP
or ML probability value can be used as a heuristic in
hill-climbing or SA search.
3. known structure, hidden variables -- This is
analagous to neural network learning.
4. unknown structure, hidden variables -- When
some variables are unobservable, it becomes
difficult to apply prior techniques for recovering
structure, but they require averaging over all
possible values of the unknown variables.
No good general algorithms are known for
handling this case
Comparison between NN and Belief
Networks
 Similarities
 Both kinds of network are attribute-based
representations
 Both can handle either discrete or continuous
output
Differences between NN and Belief
N/w
NN Belief N/W
 neural networks are
distributed
 nodes generally don't
represent specific
propositions, and the
calculations would not
treat them in a
semantically-
meaningful way
 belief networks are
localized
representations
 Belief network nodes
represent
propositions with
clearly defined
semantics and
relationships to other
nodes
NN Belief N/W
 effect is that human
beings can neither
construct nor
understand neural
network
representations
 both can be done
with belief networks
NN Belief N/W
 Neural network
outputs could be
values or
probabilities, but they
cannot handle both
simultaneously
 Belief networks
handle two kinds of
activation, both in
terms of the values
a proposition may
take, and the
probabilities
assigned to each
NN Belief N/W
 Trained feed-forward
neural network
inference can execute
in linear time
 a neural network may
have to be
exponentially larger to
represent the same
things that a belief
network can.
 where in belief
networks inference
is NP-hard
As for learning, belief networks have
the advantages
 being easier to give prior knowledge;
 also, since they represent propositions locally,
it may be easier for them to converge,
 since they are directly affected only by a small
number of other propositions.
Reinforcement Learning
What is the reinforcement
learning
 As opposed to supervised learning,
reinforcement learning takes place in an
environment where the agent cannot directly
compare the results of its action to a desired
result
Reinforcement learning
 it is given some reward or punishment that
relates to its actions
 It may win or lose a game, or be told it has
made a good move or a poor one
 job of reinforcement learning is to find a
successful function using these rewards
Where lies Reinforcement Learning
(RL)
Block Schematic and example of
RL
Supervised vs
Reinforcement Learning
 Supervised learning: has external supervisor
 supervisor has knowledge of the environment
and shares it with the agent to complete the
task
 there are some problems in which there are so
many combinations of subtasks that the agent
can perform to achieve the objective
 creating a “supervisor” is almost impractical
Example
 in a chess game, there are tens of thousands of moves that
can be played
 creating a knowledge base that can be played is a tedious
task
 In these problems, it is more feasible to learn from one’s own
experiences and gain knowledge from them
 This is the main difference that can be said of reinforcement
learning and supervised learning.
 In both supervised and reinforcement learning, there is a
mapping between input and output.
 But in reinforcement learning, there is a reward function
which acts as a feedback to the agent as opposed to
Unsupervised vs Reinforcement
Learning:
 In reinforcement learning, there’s a mapping
from input to output--not present in
unsupervised learning
 unsupervised learning, the main task is to find
the underlying patterns rather than the
mapping
Example
 if the task is to suggest a news article to a user,
an unsupervised learning algorithm will look at
similar articles which the person has previously
read and suggest anyone from them.
 Whereas a reinforcement learning algorithm will
get constant feedback from the user by
suggesting few news articles and then build a
“knowledge graph” of which articles will the
person like
Summarizing Reinforcement
Learning
 The reason reinforcement learning is harder
than supervised learning is that the agent is
never told what the right action is, only
whether it is doing well or poorly, and in some
cases (such as chess) it may only receive
feedback after a long string of actions
Two basic kinds of information an
agent can try to learn in RL
 utility function -- The agent learns the utility of
being in various states, and chooses actions to
maximize the expected utility of their outcomes.
This requires the agent keep a model of the
environment
 action-value -- The agent learns an action-value
function giving the expected utility of performing
an action in a given state. This is called Q-
learning. This is the model-free approach.
Passive Learning in a known
environment
 Def:
 Assuming an environment consisting of a set
of states, some terminal and some non-
terminal, and a model that specifies the
probabilities of transition from state to state, an
agent learns passively by observing a set of
training sequences, which consist of a set of
state transitions followed by a reward
 The goal is to use the reward information to
learn the expected utility of each of the non-
terminal states.
 An important simplifying assumption is
that the utility of a sequence is the sum of
the rewards accumulated in the states of
the sequence.
 That is, the utility function is additive
 A passive learning agent keeps an estimate U
of the utility of each state, a table N of how
many times each state was seen, and a table
M of transition probabilities.
 There are a variety of ways the agent can
update its table U
Two types of passive learning in
known environment
Passive
Learning
Naïve
Updating
Adaptive
Dynamic
Programming
Temporal
Difference
Learning
1. Naive Updating
 One simple updating method is the least mean
squares (LMS) approach [Widrow and Hoff,
1960].
 It assumes that the observed reward-to-go of a
state in a sequence provides direct evidence
of the actual reward-to-go.
 The approach is simply to keep the utility as a
running average of the rewards based upon
the number of times the state has been seen
 This approach minimizes the mean square
error with respect to the observed data
 This approach converges very slowly, because
it ignores the fact that the actual utility of a
state is the probability-weighted average of
its successors' utilities, plus its own
reward. LMS disregards these probabilities.
2.Adaptive Dynamic Programming
 If the transition probabilities and the rewards of
the states are known (which will usually
happen after a reasonably small set of training
examples), then the actual utilities can be
computed directly as
 U(i) = R(i) + SUMj MijU(j)
where U(i) is the utility of state i, R is its reward,
and Mij is the probability of transition from state i
 This is identical to a single value determination in
the policy iteration algorithm for Markov decision
processes.
 Adaptive dynamic programming is any kind of
reinforcement learning method that works by
solving the utility equations using a dynamic
programming algorithm.
 It is exact, but of course highly inefficient in large
state spaces
3. Temporal Difference Learning
 uses the difference in utility values between
successive states to adjust them from one epoch
to another
 key idea is to use the observed transitions to
adjust the values of the observed states so that
they agree with the ADP constraint equations
 Practically, this means updating the utility of state i
so that it agrees better with its successor j.
 This is done with the temporal-difference (TD)
equation:
 U(i) <- U(i) + a(R(i) + U(j) - U(i))
 where a is a learning rate parameter
Temporal difference learning is a way of
approximating the ADP constraint equations
without solving them for all possible states
 The idea generally is to define conditions that hold
over local transitions when the utility estimates are
correct, and then create update rules that nudge the
estimates toward this equation.
 This approach will cause U(i) to converge to the
correct value if the learning rate parameter decreases
with the number of times a state has been visited
[Dayan, 1992].
 In general, as the number of training sequences tends
to infinity, TD will converge on the same utilities as
ADP.
Passive Learning in an Unknown
Environment
 neither temporal difference learning nor LMS
actually use the model M of state transition
probabilities
 they will operate unchanged in an unknown
environment
 The ADP approach, however, updates its
estimated model of an unknown environment
after each step, and this model is used to
revise the utility estimates
 Any method for learning stochastic functions
can be used to learn the environment model;
 in particular, in a simple environment the
transition probability Mij is just the percentage
of times state i has transitioned to j
Basic difference between TD and
ADP:
 TD adjusts a state to agree with the observed
successor, while ADP makes a state agree with all
successors that might occur, weighted by their
probabilities
 ADP's adjustments may need to be propagated
across all of the utility equations, while TD's affect
only the current equation.
 TD is essentially a crude first approximation to
 A middle-ground can be found by bounding or
ordering the number of adjustments made in ADP,
beyond the simple one made in TD
 The prioritized-sweeping heuristic prefers only to
make adjustments to states whose likely
successors have just undergone large
adjustments in their utility estimates
 Such approximate ADP systems can be very
nearly as efficient as ADP in terms of
convergence, but operate much more quickly
Active Learning in an Unknown
Environment
 difference between active and passive agents is
that passive agents learn a fixed policy, while
the active agent must decide what action to
take and how it will affect its rewards
 To represent an active agent, the environment
model M is extended to give the probability of a
transition from a state i to a state j, given an action
a
 Utility is modified to be the reward of the state
plus the maximum utility expected depending
upon the agent's action:
 U(i) = R(i) + maxa x SUMj Ma
ijU(j)
 An ADP agent is extended to learn transition
probabilities given actions; this is simply another
dimension in its transition table
 A TD agent must similarly be extended to have a
model of the environment.
Learning with Knowledge
Learning with knowledge : Tree
Learning with
knowledge
Explanation
Based
Learning(EBL)
Relevance
Based Learning
Knowledge
Based Inductive
Learning
Learning with knowledge
 considering the kinds of logical constraints
placed upon different kinds of knowledge-
based learning, we can classify them more
clearly
 Examples are composed of Descriptions and
Classifications, and we are trying to find a
Hypothesis to explain the data
 Inductive learning can be characterized by the
following entailment constraint:
 Hypothesis ^ Descriptions |= Classifications
 given our hypothesis and descriptions of
problem instances, we want to generate
classifications
 This is inductive learning
Other kinds of learning that use prior
knowledge are:
1) Explanation based learning (EBL)
2) Relevance based learning
3) Knowledge based inductive learning
1) Explanation based
learning(EBL)
 this kind of learning occurs when the system finds
an explanation of an instance it has seen, and
generalizes the explanation
 The general rule follows logically from the
background knowledge possessed by the system
 The entailment constraints for EBL are
 Hypothesis ^ Descriptions |= Classification
 Background |= Hypothesis
 agent does not actually learn anything
factually new, since the hypothesis was
entailed by background knowledge
 This kind of learning is regarded as a way to
convert first principles into useful specialized
knowledge (converting problem-solving search
into pattern-matching search)
 basic idea is to construct an explanation of the
observed result, and then generalize the
explanation
 More specifically, while constructing a proof of the
solution, a parallel proof is performed, in which
each constant of the first is made into a variable
 Then a new rule is built in which the left-hand side
is the leaves of the proof tree, and the right-hand
side is the variabilized goal, up to any bindings
that must be made with the generalized proof
 Any conditions true regardless of the variables are
dropped
 Note that by pruning the tree before the leaves,
even more general rules may be learned
 However, the more general, the more computation
may be required to apply the rule
 One approach is to require the operationality of
the subgoals in the new rule -- that they be "easy"
to solve
2) Relevance Based Learning
 This is a kind of learning in which background
knowledge relates the relevance of a set of
features in an instance to the general goal
predicate
 For example, if I see men in the Forum in Rome
speaking Latin, and I know that if seeing someone
in a city speaking a language usually means all
people in the city speak that language, I can
conclude Romans speak Latin
 In general, background knowledge, together
with the observations, allows the agent to form
a new, general rule to explain the observations
 The entailment constraint for RBL is
 Hypothesis ^ Descriptions |= Classifications
 Background ^ Descriptions ^ Classifications |=
Hypothesis
 This is a deductive form of learning, because it cannot
produce hypotheses that go beyond the background
knowledge and observations
 We presume that our knowledge base has a set of
functional dependencies or determiners that support
the construction of hypotheses
 The learning algorithm then tries to find the minimal
consistent determination (e.g., a sentence of the form
"P determines Q," meaning that if the examples match
on P they match on Q)
3) Knowledge based inductive
learning
 This is a kind of learning in which our background
knowledge, together with our observations, lead
us to make a hypothesis that explains the
examples we see
 If I see the Old Man from Scene 24 on the Bridge
of Despair, and notice that he asks a simple
question of every other knight that attempts to
cross, I can hypothesize that only the odd-
numbered knights are able to cross the Gorge of
Eternal Peril
 The entailment constraint in this case is
 Background ^ Hypothesis ^ Descriptions |=
Classifications
 Such knowledge-based inductive learning has
been studied mainly in the field of inductive
logic programming
 Such systems reduce learning complexity in
two ways
 First, by requiring all new hypotheses to be
consistent with existing knowledge, they reduce
the search space of hypotheses
 Secondly, the more prior knowledge available,
the less new knowledge required in the
hypothesis to explain the observations
 Attribute-based learning algorithms are
incapable of learning predicates
 One of the advantages of ILP algorithms is
their much broader range of applicability
Instance Based Learning
(IBL)
Background
 Storing and using specific instances improves
the performance of several supervised
learning algorithm
 Include algorithms that learn decision trees,
classification rules, and distributed networks
 IBL algorithms are derived from the nearest
neighbor pattern classifier
Instance based learning
 generates classification predictions using only
specific instances
 do not maintain a set of abstractions derived from
specific instances
 This approach extends the nearest neighbor
algorithm, which has large storage requirements
 storage requirements can be significantly reduced
with, at most, minor sacrifices in learning rate and
classification accuracy
 While the storage-reducing algorithm performs
well on several real world databases, its
performance degrades rapidly with the level of
attribute noise in training instances
 save and use only selected instances to
generate classification predictions
Using specific instances in
supervised learning algorithms
 decreases the costs incurred
 when updating concept descriptions, increases
learning rates,
 allows for the representation of probabilistic
concept descriptions,
 and focuses theory-based reasoning in real-
world applications
Instance-based learning algorithms
suffer from several problems
 they are computationally expensive classifiers since
they save all training instances,
 they are intolerant of attribute noise,
 they are intolerant of irrelevant attributes,
 they are sensitive to the choice of the algorithm's
similarity function,
 there is no natural way to work with nominal-valued
attributes or missing attributes, and
 they provide little usable information regarding the
structure of the data
Overview of IBL
 Learning task : supervised learning or learning
from examples
 Only input is a sequence of instances
 Each instance is assumed to be represented by a
set of attribute-value pairs (?? Next slide)
 All instances are assumed to be described by the
same set of n attributes, although this restriction is
not required by the paradigm itself (Aha, 1989c)
and missing attribute values are tolerated
What are attribute-value pairs?
 An action-value function assigns an expected
utility to the result of performing a given action in a
given state
 If Q(a, i) is the value of doing action a in state i,
then
 U(i) = maxa Q(a, i)
 The equations for Q-learning are similar to those
for state-based learning agents
 The difference is that Q-learning agents do not
need models of the world. The equilibrium
equation, which can be used directly (as with
ADP agents) is
 Q(a, i) = R(i) + SUMj Ma
ij maxa' Q(a', j)
 The temporal difference version does not
require that a model be learned; its update
equation is
About attributes
 set of attributes defines an n-dimensional instance
space
 Exactly one of these attributes corresponds to the
category attribute;
 the other attributes are predictor attributes
 A category is the set of all instances in an
instance space that have the same value for their
category attribute
IBL
 IBL algorithms can learn multiple, possibly
overlapping concept descriptions simultaneously
 primary output of IBL algorithms is a concept
description (or concept)
 This is a function that maps instances to
categories: given an instance drawn from the
instance space, it yields a classification, which is
the predicted value for this instance's category
attribute
 An instance-based concept description includes a
set of stored instances and, possibly, some
information concerning their past performances
during classification
 e.g., their number of correct and incorrect
classification predictions
 This set of instances can change after each
training instance is processed
 However, IBL algorithms do not construct
extensional concept descriptions
 Instead, concept descriptions are determined
by how the IBL algorithm's selected similarity
and classification functions use the current set
of saved instances
IBL framework components
 Similarity Function:
 This computes the similarity between a training
instance i and the instances in the concept
description
 Similarities are numeric-valued
 Classification Function:
 This receives the similarity function's results and
the classification performance records of the
instances in the concept description
 It yields a classification for i
 Concept Description Updater:
 This maintains records on classification
performance and decides which instances to
include in the concept description
 Inputs include i, the similarity results, the
classification results, and a current concept
description
 It yields the modified concept description.
 The similarity and classification functions
determine how the set of saved instances in
the concept description are used to predict
values for the category attribute
 Therefore, IBL concept descriptions not only
contain a set of instances, but also include
these two functions.
 IBL algorithms assume that similar instances have
similar classifications
 This leads to their local bias for classifying novel
instances according to their most similar neighbor's
classification
 IBL algorithms also assume that, without prior
knowledge, attributes will have equal relevance for
classification decisions (i.e., by having equal weight in
the similarity function)
 This bias is achieved by normalizing each attribute's
range of possible values
Summary
 IBL algorithms differ from most other supervised
learning methods:
 they don't construct explicit abstractions such as
decision trees or rules
 Most learning algorithms derive generalizations
from instances when they are presented and use
simple matching procedures to classify
subsequently presented instances
Performance Dimensions
 1) Generality: This is the class of concepts which
are describable by the representation and
learnable by the algorithm
 We will show that IBL algorithms can pac-learn
(Valiant, 1984) any concept whose boundary is a
union of a finite number of closed hyper-curves of
finite size
 2) Accuracy: This is the concept descriptions'
classification accuracy.
 3) Learning Rate: This is the speed at which
classification accuracy increases during training
 It is a more useful indicator of the performance of the
learning algorithm than is accuracy for finite-sized training
sets
 4) Incorporation Costs: These are incurred while
updating the concept descriptions with a single
training instance
 They include classification costs
 5) Storage Requirement: This is the size of the
IBL algorithm
THANK YOU

Learning in AI

  • 1.
    LEARNING IN AI Prof.Mrs.MinakshiP.Atre, PVGCOET, SPPU
  • 2.
    Basic Learning Model Learning agent’s components  learning element -- the part of the agent responsible for improving its performance  performance element -- the part that chooses the actions to take  critic -- tells the learning element how the agent is doing  problem generator -- suggests actions that could lead to new, informative experiences (suboptimal from the point of view of the performance element, but designed to improve that element)
  • 3.
    Issues in designinglearning system  components -- which parts of the performance element are to be improved  representation of those components  feedback available to the system  prior information available to the system
  • 4.
    All learning canbe thought of as learning the representation of a function.
  • 5.
    Types of Learning Speed up learning Learning bytaking advice Learning from example Clustering Learning by analogy discovery
  • 6.
    1. Speed uplearning  A type of deductive learning that requires no additional input, but improves the agent's performance over time. There are two kinds, rote learning and generalization (e.g., EBL). Data caching is an example of how it would be used.
  • 7.
    2. Learning bytaking advice  Deductive learning in which the system can reason about new information added to its knowledge base.  McCarthy proposed the "advice taker" which was such a system, and TEIRESIAS [Davis, 1976] was the first such system.
  • 8.
    3. Learning fromexample  Inductive learning in which concepts are learned from sets of labeled instances.
  • 9.
    4. Clustering  Unsupervised,inductive learning in which "natural classes" are found for data instances, as well as ways of classifying them.  Examples include COBWEB, AUTOCLASS.
  • 10.
    5. Learning byAnalogy  Inductive learning in which a system transfers knowledge from one database into a that of a different domain.
  • 11.
    6. Discovery  Bothinductive and deductive learning in which an agent learns without help from a teacher.  It is deductive if it proves theorems and discovers concepts about those theorems;  it is inductive when it raises conjectures.
  • 12.
    What is InductiveLearning?  Inductive learning is a kind of learning in which, given a set of examples an agent tries to estimate or create an evaluation function.  Most inductive learning is supervised learning, in which examples provided with classifications. (The alternative is clustering.)  More formally, an example is a pair (x, f(x)), where x is the input and f(x) is the output of the function applied to x.  The task of pure inductive inference (or induction) is,
  • 13.
    Bayesian Learning inBelief Networks  Bayesian learning maintains a number of hypotheses about the data, each one weighted its posterior probability when a prediction is made  The idea is that, rather than keeping only one hypothesis, many are entertained, and weighted based on their likelihoods.
  • 14.
     maintaining andreasoning with a large number of hypotheses can be intractable  most common approximation is to use a most probable hypothesis, that is, an Hi of H that maximizes P(Hi | D), where D is the data  This is often called the maximum a posteriori (MAP) hypothesis HMAP:  P(X | D) ~= P(X | HMAP) x P(HMAP | D)
  • 15.
    To find HMAP,we apply Bayes' rule:  P(Hi | D) = [P(D | Hi) x P(Hi)] / P(D)  Since P(D) is fixed across the hypotheses, we only need to maximize the numerator  The first term represents the probability that this particular data set would be seen, given Hi as the model of the world  The second is the prior probability assigned to the model.
  • 16.
    Belief Network Learning Problems Four kinds of belief networks  depending upon whether the structure of the network is known or unknown,  and whether the variables in the network are observable or hidden
  • 17.
    Belief Networks 1. knownstructure, fully observable -- In this case the only learnable part is the conditional probability tables. These can be estimated directly using the statistics of the sample data set. 2. unknown structure, fully observable -- Here the problem is to reconstruct the network topology. The problem can be thought of as a search through structure space, and fitting data to each structure reduces to the fixed-structure problem, so the MAP or ML probability value can be used as a heuristic in hill-climbing or SA search.
  • 18.
    3. known structure,hidden variables -- This is analagous to neural network learning. 4. unknown structure, hidden variables -- When some variables are unobservable, it becomes difficult to apply prior techniques for recovering structure, but they require averaging over all possible values of the unknown variables. No good general algorithms are known for handling this case
  • 19.
    Comparison between NNand Belief Networks  Similarities  Both kinds of network are attribute-based representations  Both can handle either discrete or continuous output
  • 20.
    Differences between NNand Belief N/w
  • 21.
    NN Belief N/W neural networks are distributed  nodes generally don't represent specific propositions, and the calculations would not treat them in a semantically- meaningful way  belief networks are localized representations  Belief network nodes represent propositions with clearly defined semantics and relationships to other nodes
  • 22.
    NN Belief N/W effect is that human beings can neither construct nor understand neural network representations  both can be done with belief networks
  • 23.
    NN Belief N/W Neural network outputs could be values or probabilities, but they cannot handle both simultaneously  Belief networks handle two kinds of activation, both in terms of the values a proposition may take, and the probabilities assigned to each
  • 24.
    NN Belief N/W Trained feed-forward neural network inference can execute in linear time  a neural network may have to be exponentially larger to represent the same things that a belief network can.  where in belief networks inference is NP-hard
  • 25.
    As for learning,belief networks have the advantages  being easier to give prior knowledge;  also, since they represent propositions locally, it may be easier for them to converge,  since they are directly affected only by a small number of other propositions.
  • 26.
  • 27.
    What is thereinforcement learning  As opposed to supervised learning, reinforcement learning takes place in an environment where the agent cannot directly compare the results of its action to a desired result
  • 28.
    Reinforcement learning  itis given some reward or punishment that relates to its actions  It may win or lose a game, or be told it has made a good move or a poor one  job of reinforcement learning is to find a successful function using these rewards
  • 29.
  • 30.
    Block Schematic andexample of RL
  • 31.
    Supervised vs Reinforcement Learning Supervised learning: has external supervisor  supervisor has knowledge of the environment and shares it with the agent to complete the task  there are some problems in which there are so many combinations of subtasks that the agent can perform to achieve the objective  creating a “supervisor” is almost impractical
  • 32.
    Example  in achess game, there are tens of thousands of moves that can be played  creating a knowledge base that can be played is a tedious task  In these problems, it is more feasible to learn from one’s own experiences and gain knowledge from them  This is the main difference that can be said of reinforcement learning and supervised learning.  In both supervised and reinforcement learning, there is a mapping between input and output.  But in reinforcement learning, there is a reward function which acts as a feedback to the agent as opposed to
  • 33.
    Unsupervised vs Reinforcement Learning: In reinforcement learning, there’s a mapping from input to output--not present in unsupervised learning  unsupervised learning, the main task is to find the underlying patterns rather than the mapping
  • 34.
    Example  if thetask is to suggest a news article to a user, an unsupervised learning algorithm will look at similar articles which the person has previously read and suggest anyone from them.  Whereas a reinforcement learning algorithm will get constant feedback from the user by suggesting few news articles and then build a “knowledge graph” of which articles will the person like
  • 35.
    Summarizing Reinforcement Learning  Thereason reinforcement learning is harder than supervised learning is that the agent is never told what the right action is, only whether it is doing well or poorly, and in some cases (such as chess) it may only receive feedback after a long string of actions
  • 36.
    Two basic kindsof information an agent can try to learn in RL  utility function -- The agent learns the utility of being in various states, and chooses actions to maximize the expected utility of their outcomes. This requires the agent keep a model of the environment  action-value -- The agent learns an action-value function giving the expected utility of performing an action in a given state. This is called Q- learning. This is the model-free approach.
  • 37.
    Passive Learning ina known environment  Def:  Assuming an environment consisting of a set of states, some terminal and some non- terminal, and a model that specifies the probabilities of transition from state to state, an agent learns passively by observing a set of training sequences, which consist of a set of state transitions followed by a reward
  • 38.
     The goalis to use the reward information to learn the expected utility of each of the non- terminal states.  An important simplifying assumption is that the utility of a sequence is the sum of the rewards accumulated in the states of the sequence.  That is, the utility function is additive
  • 39.
     A passivelearning agent keeps an estimate U of the utility of each state, a table N of how many times each state was seen, and a table M of transition probabilities.  There are a variety of ways the agent can update its table U
  • 40.
    Two types ofpassive learning in known environment Passive Learning Naïve Updating Adaptive Dynamic Programming Temporal Difference Learning
  • 41.
    1. Naive Updating One simple updating method is the least mean squares (LMS) approach [Widrow and Hoff, 1960].  It assumes that the observed reward-to-go of a state in a sequence provides direct evidence of the actual reward-to-go.  The approach is simply to keep the utility as a running average of the rewards based upon the number of times the state has been seen
  • 42.
     This approachminimizes the mean square error with respect to the observed data  This approach converges very slowly, because it ignores the fact that the actual utility of a state is the probability-weighted average of its successors' utilities, plus its own reward. LMS disregards these probabilities.
  • 43.
    2.Adaptive Dynamic Programming If the transition probabilities and the rewards of the states are known (which will usually happen after a reasonably small set of training examples), then the actual utilities can be computed directly as  U(i) = R(i) + SUMj MijU(j) where U(i) is the utility of state i, R is its reward, and Mij is the probability of transition from state i
  • 44.
     This isidentical to a single value determination in the policy iteration algorithm for Markov decision processes.  Adaptive dynamic programming is any kind of reinforcement learning method that works by solving the utility equations using a dynamic programming algorithm.  It is exact, but of course highly inefficient in large state spaces
  • 45.
    3. Temporal DifferenceLearning  uses the difference in utility values between successive states to adjust them from one epoch to another  key idea is to use the observed transitions to adjust the values of the observed states so that they agree with the ADP constraint equations  Practically, this means updating the utility of state i so that it agrees better with its successor j.
  • 46.
     This isdone with the temporal-difference (TD) equation:  U(i) <- U(i) + a(R(i) + U(j) - U(i))  where a is a learning rate parameter Temporal difference learning is a way of approximating the ADP constraint equations without solving them for all possible states
  • 47.
     The ideagenerally is to define conditions that hold over local transitions when the utility estimates are correct, and then create update rules that nudge the estimates toward this equation.  This approach will cause U(i) to converge to the correct value if the learning rate parameter decreases with the number of times a state has been visited [Dayan, 1992].  In general, as the number of training sequences tends to infinity, TD will converge on the same utilities as ADP.
  • 48.
    Passive Learning inan Unknown Environment  neither temporal difference learning nor LMS actually use the model M of state transition probabilities  they will operate unchanged in an unknown environment  The ADP approach, however, updates its estimated model of an unknown environment after each step, and this model is used to revise the utility estimates
  • 49.
     Any methodfor learning stochastic functions can be used to learn the environment model;  in particular, in a simple environment the transition probability Mij is just the percentage of times state i has transitioned to j
  • 50.
    Basic difference betweenTD and ADP:  TD adjusts a state to agree with the observed successor, while ADP makes a state agree with all successors that might occur, weighted by their probabilities  ADP's adjustments may need to be propagated across all of the utility equations, while TD's affect only the current equation.  TD is essentially a crude first approximation to
  • 51.
     A middle-groundcan be found by bounding or ordering the number of adjustments made in ADP, beyond the simple one made in TD  The prioritized-sweeping heuristic prefers only to make adjustments to states whose likely successors have just undergone large adjustments in their utility estimates  Such approximate ADP systems can be very nearly as efficient as ADP in terms of convergence, but operate much more quickly
  • 52.
    Active Learning inan Unknown Environment  difference between active and passive agents is that passive agents learn a fixed policy, while the active agent must decide what action to take and how it will affect its rewards  To represent an active agent, the environment model M is extended to give the probability of a transition from a state i to a state j, given an action a
  • 53.
     Utility ismodified to be the reward of the state plus the maximum utility expected depending upon the agent's action:  U(i) = R(i) + maxa x SUMj Ma ijU(j)  An ADP agent is extended to learn transition probabilities given actions; this is simply another dimension in its transition table  A TD agent must similarly be extended to have a model of the environment.
  • 54.
  • 55.
    Learning with knowledge: Tree Learning with knowledge Explanation Based Learning(EBL) Relevance Based Learning Knowledge Based Inductive Learning
  • 56.
    Learning with knowledge considering the kinds of logical constraints placed upon different kinds of knowledge- based learning, we can classify them more clearly  Examples are composed of Descriptions and Classifications, and we are trying to find a Hypothesis to explain the data
  • 57.
     Inductive learningcan be characterized by the following entailment constraint:  Hypothesis ^ Descriptions |= Classifications  given our hypothesis and descriptions of problem instances, we want to generate classifications  This is inductive learning
  • 58.
    Other kinds oflearning that use prior knowledge are: 1) Explanation based learning (EBL) 2) Relevance based learning 3) Knowledge based inductive learning
  • 59.
    1) Explanation based learning(EBL) this kind of learning occurs when the system finds an explanation of an instance it has seen, and generalizes the explanation  The general rule follows logically from the background knowledge possessed by the system  The entailment constraints for EBL are  Hypothesis ^ Descriptions |= Classification  Background |= Hypothesis
  • 60.
     agent doesnot actually learn anything factually new, since the hypothesis was entailed by background knowledge  This kind of learning is regarded as a way to convert first principles into useful specialized knowledge (converting problem-solving search into pattern-matching search)
  • 61.
     basic ideais to construct an explanation of the observed result, and then generalize the explanation  More specifically, while constructing a proof of the solution, a parallel proof is performed, in which each constant of the first is made into a variable  Then a new rule is built in which the left-hand side is the leaves of the proof tree, and the right-hand side is the variabilized goal, up to any bindings that must be made with the generalized proof
  • 62.
     Any conditionstrue regardless of the variables are dropped  Note that by pruning the tree before the leaves, even more general rules may be learned  However, the more general, the more computation may be required to apply the rule  One approach is to require the operationality of the subgoals in the new rule -- that they be "easy" to solve
  • 63.
    2) Relevance BasedLearning  This is a kind of learning in which background knowledge relates the relevance of a set of features in an instance to the general goal predicate  For example, if I see men in the Forum in Rome speaking Latin, and I know that if seeing someone in a city speaking a language usually means all people in the city speak that language, I can conclude Romans speak Latin
  • 64.
     In general,background knowledge, together with the observations, allows the agent to form a new, general rule to explain the observations  The entailment constraint for RBL is  Hypothesis ^ Descriptions |= Classifications  Background ^ Descriptions ^ Classifications |= Hypothesis
  • 65.
     This isa deductive form of learning, because it cannot produce hypotheses that go beyond the background knowledge and observations  We presume that our knowledge base has a set of functional dependencies or determiners that support the construction of hypotheses  The learning algorithm then tries to find the minimal consistent determination (e.g., a sentence of the form "P determines Q," meaning that if the examples match on P they match on Q)
  • 66.
    3) Knowledge basedinductive learning  This is a kind of learning in which our background knowledge, together with our observations, lead us to make a hypothesis that explains the examples we see  If I see the Old Man from Scene 24 on the Bridge of Despair, and notice that he asks a simple question of every other knight that attempts to cross, I can hypothesize that only the odd- numbered knights are able to cross the Gorge of Eternal Peril
  • 67.
     The entailmentconstraint in this case is  Background ^ Hypothesis ^ Descriptions |= Classifications  Such knowledge-based inductive learning has been studied mainly in the field of inductive logic programming
  • 68.
     Such systemsreduce learning complexity in two ways  First, by requiring all new hypotheses to be consistent with existing knowledge, they reduce the search space of hypotheses  Secondly, the more prior knowledge available, the less new knowledge required in the hypothesis to explain the observations
  • 69.
     Attribute-based learningalgorithms are incapable of learning predicates  One of the advantages of ILP algorithms is their much broader range of applicability
  • 70.
  • 71.
    Background  Storing andusing specific instances improves the performance of several supervised learning algorithm  Include algorithms that learn decision trees, classification rules, and distributed networks  IBL algorithms are derived from the nearest neighbor pattern classifier
  • 72.
    Instance based learning generates classification predictions using only specific instances  do not maintain a set of abstractions derived from specific instances  This approach extends the nearest neighbor algorithm, which has large storage requirements  storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy
  • 73.
     While thestorage-reducing algorithm performs well on several real world databases, its performance degrades rapidly with the level of attribute noise in training instances  save and use only selected instances to generate classification predictions
  • 74.
    Using specific instancesin supervised learning algorithms  decreases the costs incurred  when updating concept descriptions, increases learning rates,  allows for the representation of probabilistic concept descriptions,  and focuses theory-based reasoning in real- world applications
  • 75.
    Instance-based learning algorithms sufferfrom several problems  they are computationally expensive classifiers since they save all training instances,  they are intolerant of attribute noise,  they are intolerant of irrelevant attributes,  they are sensitive to the choice of the algorithm's similarity function,  there is no natural way to work with nominal-valued attributes or missing attributes, and  they provide little usable information regarding the structure of the data
  • 76.
    Overview of IBL Learning task : supervised learning or learning from examples  Only input is a sequence of instances  Each instance is assumed to be represented by a set of attribute-value pairs (?? Next slide)  All instances are assumed to be described by the same set of n attributes, although this restriction is not required by the paradigm itself (Aha, 1989c) and missing attribute values are tolerated
  • 77.
    What are attribute-valuepairs?  An action-value function assigns an expected utility to the result of performing a given action in a given state  If Q(a, i) is the value of doing action a in state i, then  U(i) = maxa Q(a, i)  The equations for Q-learning are similar to those for state-based learning agents
  • 78.
     The differenceis that Q-learning agents do not need models of the world. The equilibrium equation, which can be used directly (as with ADP agents) is  Q(a, i) = R(i) + SUMj Ma ij maxa' Q(a', j)  The temporal difference version does not require that a model be learned; its update equation is
  • 79.
    About attributes  setof attributes defines an n-dimensional instance space  Exactly one of these attributes corresponds to the category attribute;  the other attributes are predictor attributes  A category is the set of all instances in an instance space that have the same value for their category attribute
  • 80.
    IBL  IBL algorithmscan learn multiple, possibly overlapping concept descriptions simultaneously  primary output of IBL algorithms is a concept description (or concept)  This is a function that maps instances to categories: given an instance drawn from the instance space, it yields a classification, which is the predicted value for this instance's category attribute
  • 81.
     An instance-basedconcept description includes a set of stored instances and, possibly, some information concerning their past performances during classification  e.g., their number of correct and incorrect classification predictions  This set of instances can change after each training instance is processed
  • 82.
     However, IBLalgorithms do not construct extensional concept descriptions  Instead, concept descriptions are determined by how the IBL algorithm's selected similarity and classification functions use the current set of saved instances
  • 83.
    IBL framework components Similarity Function:  This computes the similarity between a training instance i and the instances in the concept description  Similarities are numeric-valued
  • 84.
     Classification Function: This receives the similarity function's results and the classification performance records of the instances in the concept description  It yields a classification for i
  • 85.
     Concept DescriptionUpdater:  This maintains records on classification performance and decides which instances to include in the concept description  Inputs include i, the similarity results, the classification results, and a current concept description  It yields the modified concept description.
  • 86.
     The similarityand classification functions determine how the set of saved instances in the concept description are used to predict values for the category attribute  Therefore, IBL concept descriptions not only contain a set of instances, but also include these two functions.
  • 87.
     IBL algorithmsassume that similar instances have similar classifications  This leads to their local bias for classifying novel instances according to their most similar neighbor's classification  IBL algorithms also assume that, without prior knowledge, attributes will have equal relevance for classification decisions (i.e., by having equal weight in the similarity function)  This bias is achieved by normalizing each attribute's range of possible values
  • 88.
    Summary  IBL algorithmsdiffer from most other supervised learning methods:  they don't construct explicit abstractions such as decision trees or rules  Most learning algorithms derive generalizations from instances when they are presented and use simple matching procedures to classify subsequently presented instances
  • 89.
    Performance Dimensions  1)Generality: This is the class of concepts which are describable by the representation and learnable by the algorithm  We will show that IBL algorithms can pac-learn (Valiant, 1984) any concept whose boundary is a union of a finite number of closed hyper-curves of finite size  2) Accuracy: This is the concept descriptions' classification accuracy.
  • 90.
     3) LearningRate: This is the speed at which classification accuracy increases during training  It is a more useful indicator of the performance of the learning algorithm than is accuracy for finite-sized training sets  4) Incorporation Costs: These are incurred while updating the concept descriptions with a single training instance  They include classification costs  5) Storage Requirement: This is the size of the
  • 91.
  • 93.