SlideShare a Scribd company logo
1 of 61
Artificial Intelligence                                  Amit purohit
Evidence of Artificial Intelligence folklore can be traced back to ancient Egypt, but with the development of the
electronic computer in 1941, the technology finally became available to create machine intelligence. The term
artificial intelligence was first coined in 1956, at the Dartmouth conference, and since then Artificial Intelligence has
expanded because of the theories and principles developed by its dedicated researchers. Through its short modern
history, advancement in the fields of AI have been slower than first estimated, progress continues to be made. From
its birth 4 decades ago, there have been a variety of AI programs, and they have impacted other technological
advancements.

Definition

AI is the science and engineering of making intelligent machines, especially intelligent computer
programs. It is related to the similar task of using computers to understand human intelligence,
but AI does not have to confine itself to methods that are biologically observable.

Intelligence is the computational part of the ability to achieve goals in the world. Varying kinds
and degrees of intelligence occur in people, many animals and some machines.

Objectives

1).To formally define AI.

2).To discuss the character features of AI.

3).To get the student acquainted with the essence of AI.

4).To be able to distinguish betwee the human intelligence and AI.

5).To give an overview of the applications where the AI technology can be used.

6).To import the knowledge about the representation schemes like Production System, Problem
Reduction.

Turing Test

Alan Turing's 1950 article Computing Machinery and Intelligence [Tur50] discussed conditions
for considering a machine to be intelligent. He argued that if the machine could successfully
pretend to be human to a knowledgeable observer then you certainly should consider it
intelligent. This test would satisfy most people but not all philosophers. The observer could
interact with the machine and a human by teletype (to avoid requiring that the machine imitate
the appearance or voice of the person), and the human would try to persuade the observer that it
was human and the machine would try to fool the observer.
The Turing test is a one-sided test. A machine that passes the test should certainly be considered
intelligent, but a machine could still be considered intelligent without knowing enough about
humans to imitate a human.

Daniel Dennett's book Brainchildren [Den98] has an excellent discussion of the Turing test and
the various partial Turing tests that have been implemented, i.e. with restrictions on the
observer's knowledge of AI and the subject matter of questioning. It turns out that some people
are easily led into believing that a rather dumb program is intelligent.

Background and History

Evidence of Artificial Intelligence folklore can be traced back to ancient Egypt, but with the
development of the electronic computer in 1941, the technology finally became available to
create machine intelligence. The term artificial intelligence was first coined in 1956, at the
Dartmouth conference, and since then Artificial Intelligence has expanded because of the
theories and principles developed by its dedicated researchers. Through its short modern history,
advancement in the fields of AI have been slower than first estimated, progress continues to be
made. From its birth 4 decades ago, there have been a variety of AI programs, and they have
impacted other technological advancements.

In 1941 an invention revolutionized every aspect of the storage and processing of information.
That invention, developed in both the US and Germany was the electronic computer. The first
computers required large, separate air-conditioned rooms, and were a programmers nightmare,
involving the separate configuration of thousands of wires to even get a program running.

The 1949 innovation, the stored program computer, made the job of entering a program easier,
and advancements in computer theory lead to computer science, and eventually Artificial
intelligence. With the invention of an electronic means of processing data, came a medium that
made AI possible.

Although the computer provided the technology necessary for AI, it was not until the early
1950's that the link between human intelligence and machines was really observed. Norbert
Wiener was one of the first Americans to make observations on the principle of feedback theory
feedback theory. The most familiar example of feedback theory is the thermostat: It controls the
temperature of an environment by gathering the actual temperature of the house, comparing it to
the desired temperature, and responding by turning the heat up or down. What was so important
about his research into feedback loops was that Wiener theorized that all intelligent behavior was
the result of feedback mechanisms. Mechanisms that could possibly be simulated by machines.
This discovery influenced much of early development of AI.

In late 1955, Newell and Simon developed The Logic Theorist, considered by many to be the
first AI program. The program, representing each problem as a tree model, would attempt to
solve it by selecting the branch that would most likely result in the correct conclusion. The
impact that the logic theorist made on both the public and the field of AI has made it a crucial
stepping stone in developing the AI field.
In 1956 John McCarthy regarded as the father of AI, organized a conference to draw the talent
and expertise of others interested in machine intelligence for a month of brainstorming. He
invited them to Vermont for "The Dartmouth summer research project on artificial intelligence."
From that point on, because of McCarthy, the field would be known as Artificial intelligence.
Although not a huge success, (explain) the Dartmouth conference did bring together the founders
in AI, and served to lay the groundwork for the future of AI research.

In the seven years after the conference, AI began to pick up momentum. Although the field was
still undefined, ideas formed at the conference were re-examined, and built upon. Centers for AI
research began forming at Carnegie Mellon and MIT, and a new challenges were faced: further
research was placed upon creating systems that could efficiently solve problems, by limiting the
search, such as the Logic Theorist. And second, making systems that could learn by themselves.

In 1957, the first version of a new program The General Problem Solver(GPS) was tested. The
program developed by the same pair which developed the Logic Theorist. The GPS was an
extension of Wiener's feedback principle, and was capable of solving a greater extent of common
sense problems. A couple of years after the GPS, IBM contracted a team to research artificial
intelligence. Herbert Gelerneter spent 3 years working on a program for solving geometry
theorems.

While more programs were being produced, McCarthy was busy developing a major
breakthrough in AI history. In 1958 McCarthy announced his new development; the LISP
language, which is still used today. LISP stands for LISt Processing, and was soon adopted as the
language of choice among most AI developers.

During the 1970's Many new methods in the development of AI were tested, notably Minsky's
frames theory. Also David Marr proposed new theories about machine vision, for example, how
it would be possible to distinguish an image based on the shading of an image, basic information
on shapes, color, edges, and texture. With analysis of this information, frames of what an image
might be could then be referenced. another development during this time was the PROLOGUE
language. The language was proposed for In 1972

During the 1980's AI was moving at a faster pace, and further into the corporate sector. In 1986,
US sales of AI-related hardware and software surged to $425 million. Expert systems in
particular demand because of their efficiency. Companies such as Digital Electronics were using
XCON, an expert system designed to program the large VAX computers. DuPont, General
Motors, and Boeing relied heavily on expert systems Indeed to keep up with the demand for the
computer experts, companies such as Teknowledge and Intellicorp specializing in creating
software to aid in producing expert systems formed. Other expert systems were designed to find
and correct flaws in existing expert systems.

Overview of AI Application Areas

Game Playing
You can buy machines that can play master level chess for a few hundred dollars. There is some
AI in them, but they play well against people mainly through brute force computation--looking at
hundreds of thousands of positions. To beat a world champion by brute force and known reliable
heuristics requires being able to look at 200 million positions per second.

Speech Recognition

In the 1990s, computer speech recognition reached a practical level for limited purposes. Thus
United Airlines has replaced its keyboard tree for flight information by a system using speech
recognition of flight numbers and city names. It is quite convenient. On the the other hand, while
it is possible to instruct some computers using speech, most users have gone back to the
keyboard and the mouse as still more convenient.

Understanding Natural Language

Just getting a sequence of words into a computer is not enough. Parsing sentences is not enough
either. The computer has to be provided with an understanding of the domain the text is about,
and this is presently possible only for very limited domains.

Computer Vision

The world is composed of three-dimensional objects, but the inputs to the human eye and
computers' TV cameras are two dimensional. Some useful programs can work solely in two
dimensions, but full computer vision requires partial three-dimensional information that is not
just a set of two-dimensional views. At present there are only limited ways of representing three-
dimensional information directly, and they are not as good as what humans evidently use.

Expert Systems

A "knowledge engineer" interviews experts in a certain domain and tries to embody their
knowledge in a computer program for carrying out some task. How well this works depends on
whether the intellectual mechanisms required for the task are within the present state of AI.
When this turned out not to be so, there were many disappointing results. One of the first expert
systems was MYCIN in 1974, which diagnosed bacterial infections of the blood and suggested
treatments. It did better than medical students or practicing doctors, provided its limitations were
observed. Namely, its ontology included bacteria, symptoms, and treatments and did not include
patients, doctors, hospitals, death, recovery, and events occurring in time. Its interactions
depended on a single patient being considered. Since the experts consulted by the knowledge
engineers knew about patients, doctors, death, recovery, etc., it is clear that the knowledge
engineers forced what the experts told them into a predetermined framework. In the present state
of AI, this has to be true. The usefulness of current expert systems depends on their users having
common sense.

Heuristic Classification
One of the most feasible kinds of expert system given the present knowledge of AI is to put some
information in one of a fixed set of categories using several sources of information. An example
is advising whether to accept a proposed credit card purchase. Information is available about the
owner of the credit card, his record of payment and also about the item he is buying and about
the establishment from which he is buying it (e.g., about whether there have been previous credit
card frauds at this establishment).

Production System

Production systems are applied to problem solving programs that must perform a wide-range of
seaches. Production ssytems are symbolic AI systems. The difference between these two terms is
only one of semantics. A symbolic AI system may not be restricted to the very definition of
production systems, but they can't be much different either.

Production systems are composed of three parts, a global database, production rules and a control
structure.

The global database is the system's short-term memory. These are collections of facts that are to
be analyzed. A part of the global database represents the current state of the system's
environment. In a game of chess, the current state could represent all the positions of the pieces
for example.

Production rules (or simply productions) are conditional if-then branches. In a production system
whenever a or condition in the system is satisfied, the system is allowed to execute or perform a
specific action which may be specified under that rule. If the rule is not fufilled, it may perform
another action. This can be simply paraphrased:

WHEN (condition) IS SATISFIED, PERFORM (action)

A Production System Algorithm

DATA (binded with initial global data base)
when DATA satisfies the halting condition do
begin
select some rule R that can be applied to DATA
return DATA (binded with the result of when R was applied to DATA)
end

Types of Production System

There are two basic types of production System:

   •   Commutative Production System
   •   Decomposable Production System

Commutative Production System
A production system is commutative if it has the following properties with respect to a database
D:

1. Each member of the set of rules applicable to D is also applicable to any database produced by
applying an applicable rule to D.

2. If the goal condition is satisfied by D, then it is also satisfied by any database produced by
applying any applicable rule to D.

3. The database that results by applying to D any sequence composed of rules that are applicable
to D is invariant under permutations of the sequence.

Decomposable Production System

Initial database can be decomposed or split into separate components that can be processed
independently.



Search Process

Searching is defined as a sequence of steps that transforms the initial state to the goal state. To
do a search process, the following are needed:

    •   The initial state description of the problem
    •   A set of legal operators that changes the state.
    •   The final or goal state.

The searching process in AI can be classified into two types:

1. Uniformed Search/ Blind Search
2. Heuristic Search/ Informed Search

Uniformed/ Blind Search

A uniformed search algorithm is one that do not have any domain specific knowledge. They use
information like initial state, final state and a set of logical operators. this search shoul proceed in
a systemic way by exploring nodes in some predetermined orders. It can be classified in to two
search technologies:

1. Breadth First search
2. Depth First Search
Depth First Search !

Depth first search works by taking a node, checking its neighbors, expanding the first node it
finds among the neighbors, checking if that expanded node is our destination, and if not,
continue exploring more nodes.

The above explanation is probably confusing if this is your first exposure to depth first search. I
hope the following demonstration will help more. Using our same search tree, let's find a path
between nodes A and F:




Step 0

Let's start with our root/goal node:




We will be using two lists to keep track of what we are doing - an Open list and a Closed List.
An Open list keeps track of what you need to do, and the Closed List keeps track of what you
have already done. Right now, we only have our starting point, node A. We haven't done
anything to it yet, so let's add it to our Open list.

Open List: A
Closed List: <empty>


Step 1
Now, let's explore the neighbors of our A node. To put another way, let's take the first item from
our Open list and explore its neighbors:




Node A's neighbors are the B and C nodes. Because we are now done with our A node, we can
remove it from our Open list and add it to our Closed List. You aren't done with this step though.
You now have two new nodes B and C that need exploring. Add those two nodes to our Open
list.

Our current Open and Closed Lists contain the following data:

Open List: B, C
Closed List: A


Step 2

Our Open list contains two items. For depth first search and breadth first search, you always
explore explore the first item from our Open list. The first item in our Open list is the B node. B
is not our destination, so let's explore its neighbors:




Because I have now expanded B, I am going to remove it from the Open list and add it to the
Closed List. Our new nodes are D and E, and we add these nodes to the beginning of our Open
list:

Open List: D, E, C
Closed List: A, B


Step 3

You should start to see a pattern forming. Because D is at the beginning of our Open List, we
expand it. D isn't our destination, and it does not contain any neighbors. All you do in this step is
remove D from our Open List and add it to our Closed List:
Open List: E, C
Closed List: A, B, D


Step 4

We now expand the E node from our Open list. E is not our destination, so we explore its
neighbors and find out that it contains the neighbors F and G. Remember, F is our target, but we
don't stop here though. Despite F being on our path, we only end when we are about to expand
our target Node - F in this case:




Our Open list will have the E node removed and the F and G nodes added. The removed E node
will be added to our Closed List:

Open List: F, G, C
Closed List: A, B, D, E


Step 5

We now expand the F node. Since it is our intended destination, we stop:




We remove F from our Open list and add it to our Closed List. Since we are at our destination,
there is no need to expand F in order to find its neighbors. Our final Open and Closed Lists
contain the following data:
Open List: G, C
Closed List: A, B, D, E, F

The final path taken by our depth first search method is what the final value of our Closed List
is: A, B, D, E, F.

Breadth First Search

In depth first search, newly explored nodes were added to the beginning of your Open list. In
breadth first search, newly explored nodes are added to the end of your Open list.

For example, here is our original search tree:

The above explanation is probably confusing if this is your first exposure to depth first search. I
hope the following demonstration will help more. Using our same search tree, let's find a path
between nodes A and F:




Step 0

Let's start with our root/goal node:




We will be using two lists to keep track of what we are doing - an Open list and a Closed List.
An Open list keeps track of what you need to do, and the Closed List keeps track of what you
have already done. Right now, we only have our starting point, node A. We haven't done
anything to it yet, so let's add it to our Open list.
Open List: A
Closed List: <empty>


Step 1

Now, let's explore the neighbors of our A node. To put another way, let's take the first item from
our Open list and explore its neighbors:




Node A's neighbors are the B and C nodes. Because we are now done with our A node, we can
remove it from our Open list and add it to our Closed List. You aren't done with this step though.
You now have two new nodes B and C that need exploring. Add those two nodes to our Open
list.

Our current Open and Closed Lists contain the following data:

Open List: B, C
Closed List: A


Step 2

Our Open list contains two items. For depth first search and breadth first search, you always
explore explore the first item from our Open list. The first item in our Open list is the B node. B
is not our destination, so let's explore its neighbors:




Because I have now expanded B, I am going to remove it from the Open list and add it to the
Closed List. Our new nodes are D and E, and we add these nodes to the beginning of our Open
list:

Open List: D, E, C
Closed List: A, B
Step 3

You should start to see a pattern forming. Because D is at the beginning of our Open List, we
expand it. D isn't our destination, and it does not contain any neighbors. All you do in this step is
remove D from our Open List and add it to our Closed List:

Open List: E, C
Closed List: A, B, D


Step 4

We now expand the E node from our Open list. E is not our destination, so we explore its
neighbors and find out that it contains the neighbors F and G. Remember, F is our target, but we
don't stop here though. Despite F being on our path, we only end when we are about to expand
our target Node - F in this case:




Our Open list will have the E node removed and the F and G nodes added. The removed E node
will be added to our Closed List:

Open List: F, G, C
Closed List: A, B, D, E


Step 5

We now expand the F node. Since it is our intended destination, we stop:
We remove F from our Open list and add it to our Closed List. Since we are at our destination,
there is no need to expand F in order to find its neighbors. Our final Open and Closed Lists
contain the following data:

Open List: G, C
Closed List: A, B, D, E, F

The final path taken by our depth first search method is what the final value of our Closed List
is: A, B, D, E, F.

iterative Deepening Depth-First Search !

Iterative deepening depth-first search (IDDFS) is a state space search strategy in which a depth-
limited search is run repeatedly, increasing the depth limit with each iteration until it reaches d,
the depth of the shallowest goal state. On each iteration, IDDFS visits the nodes in the search
tree in the same order as depth-first search, but the cumulative order in which nodes are first
visited, assuming no pruning, is effectively breadth-first.

IDDFS combines depth-first search's space-efficiency and breadth-first search's completeness
(when the branching factor is finite). It is optimal when the path cost is a non-decreasing
function of the depth of the node.

The space complexity of IDDFS is O(bd), where b is the branching factor and d is the depth of
shallowest goal. Since iterative deepening visits states multiple times, it may seem wasteful, but
it turns out to be not so costly, since in a tree most of the nodes are in the bottom level, so it does
not matter much if the upper levels are visited multiple times.

The main advantage of IDDFS in game tree searching is that the earlier searches tend to improve
the commonly used heuristics, such as the killer heuristic and alpha-beta pruning, so that a more
accurate estimate of the score of various nodes at the final depth search can occur, and the search
completes more quickly since it is done in a better order. For example, alpha-beta pruning is
most efficient if it searches the best moves first.

A second advantage is the responsiveness of the algorithm. Because early iterations use small
values for d, they execute extremely quickly. This allows the algorithm to supply early
indications of the result almost immediately, followed by refinements as d increases. When used
in an interactive setting, such as in a chess-playing program, this facility allows the program to
play at any time with the current best move found in the search it has completed so far. This is
not possible with a traditional depth-first search.

The time complexity of IDDFS in well-balanced trees works out to be the same as Depth-first
search: O(bd).

In an iterative deepening search, the nodes on the bottom level are expanded once, those on the
next to bottom level are expanded twice, and so on, up to the root of the search tree, which is
expanded d + 1 times.[1] So the total number of expansions in an iterative deepening search is




All together, an iterative deepening search from depth 1 to depth d expands only about 11%
more nodes than a single breadth-first or depth-limited search to depth d, when b = 10. The
higher the branching factor, the lower the overhead of repeatedly expanded states, but even when
the branching factor is 2, iterative deepening search only takes about twice as long as a complete
breadth-first search. This means that the time complexity of iterative deepening is still O(bd), and
the space complexity is O(bd). In general, iterative deepening is the preferred search method
when there is a large search space and the depth of the solution is not known.


Informed Search

It is not difficult to see that uninformed search will pursue options that lead away from the goal
as easily as it pursues options that lead to wards the goal. For any but the smallest problems this
leads to searches that take unacceptable amounts of time and/or space. Informed search tries to
reduce the amount of search that must be done by making intelligent choices for the nodes that
are selected for expansion. This implies the existence of some way of evaluating the likelyhood
that a given node is on the solution path. In general this is done using a heuristic function.

Hill Climbing

Hill climbing is a mathematical optimization technique which belongs to the family of local
search. It is relatively simple to implement, making it a popular first choice. Although more
advanced algorithms may give better results, in some situations hill climbing works just as well.

Hill climbing can be used to solve problems that have many solutions, some of which are better
than others. It starts with a random (potentially poor) solution, and iteratively makes small
changes to the solution, each time improving it a little. When the algorithm cannot see any
improvement anymore, it terminates. Ideally, at that point the current solution is close to optimal,
but it is not guaranteed that hill climbing will ever come close to the optimal solution.

For example, hill climbing can be applied to the traveling salesman problem. It is easy to find a
solution that visits all the cities but will be very poor compared to the optimal solution. The
algorithm starts with such a solution and makes small improvements to it, such as switching the
order in which two cities are visited. Eventually, a much better route is obtained.

Hill climbing is used widely in artificial intelligence, for reaching a goal state from a starting
node. Choice of next node and starting node can be varied to give a list of related algorithms.

Mathematical description

Hill climbing attempts to maximize (or minimize) a function f(x), where x are discrete states.
These states are typically represented by vertices in a graph, where edges in the graph encode
nearness or similarity of a graph. Hill climbing will follow the graph from vertex to vertex,
always locally increasing (or decreasing) the value of f, until a local maximum (or local
minimum) xm is reached. Hill climbing can also operate on a continuous space: in that case, the
algorithm is called gradient ascent (or gradient descent if the function is minimized).*.




Variants

In simple hill climbing, the first closer node is chosen, whereas in steepest ascent hill climbing
all successors are compared and the closest to the solution is chosen. Both forms fail if there is
no closer node, which may happen if there are local maxima in the search space which are not
solutions. Steepest ascent hill climbing is similar to best-first search, which tries all possible
extensions of the current path instead of only one.

Stochastic hill climbing does not examine all neighbors before deciding how to move. Rather, it
selects a neighbour at random, and decides (based on the amount of improvement in that
neighbour) whether to move to that neighbour or to examine another.
Random-restart hill climbing is a meta-algorithm built on top of the hill climbing algorithm. It is
also known as Shotgun hill climbing. It iteratively does hill-climbing, each time with a random
initial condition x0. The best xm is kept: if a new run of hill climbing produces a better xm than
the stored state, it replaces the stored state.

Random-restart hill climbing is a surprisingly effective algorithm in many cases. It turns out that
it is often better to spend CPU time exploring the space, than carefully optimizing from an initial
condition.

Local Maxima

A problem with hill climbing is that it will find only local maxima. Unless the heuristic is
convex, it may not reach a global maximum. Other local search algorithms try to overcome this
problem such as stochastic hill climbing, random walks and simulated annealing.




Ridges

A ridge is a curve in the search place that leads to a maximum, but the orientation of the ridge
compared to the available moves that are used to climb is such that each move will lead to a
smaller point. In other words, each point on a ridge looks to the algorithm like a local maximum,
even though the point is part of a curve leading to a better optimum.

Plateau

Another problem with hill climbing is that of a plateau, which occurs when we get to a "flat" part
of the search space, i.e. we have a path where the heuristics are all very close together. This kind
of flatness can cause the algorithm to cease progress and wander aimlessly.

Pseudocode

Hill Climbing Algorithm
currentNode = startNode;
loop do
L = NEIGHBORS(currentNode);
nextEval = -INF;
nextNode = NULL;
for all x in L
if (EVAL(x) > nextEval)
nextNode = x;
nextEval = EVAL(x);
if nextEval <= EVAL(currentNode)
//Return current node since no better neighbors exist
return currentNode;
currentNode = nextNode;

Best-First Search

Best-first search is a search algorithm which explores a graph by expanding the most promising
node chosen according to a specified rule.

Judea Pearl described best-first search as estimating the promise of node n by a "heuristic
evaluation function f(n) which, in general, may depend on the description of n, the description of
the goal, the information gathered by the search up to that point, and most important, on any
extra knowledge about the problem domain."

Some authors have used "best-first search" to refer specifically to a search with a heuristic that
attempts to predict how close the end of a path is to a solution, so that paths which are judged to
be closer to a solution are extended first. This specific type of search is called greedy best-first
search.

Efficient selection of the current best candidate for extension is typically implemented using a
priority queue.

Examples of best-first search algorithms include the A* search algorithm, and in turn, Dijkstra's
algorithm (which can be considered a specialization of A*). Best-first algorithms are often used
for path finding in combinatorial search.

Code

open = initial state
while open != null
do
1. Pick the best node on open.
2. Create open's successors
3. For each successor do:
a. If it has not been generated before: evaluate it, add it to OPEN, and record its parent
b. Otherwise: change the parent if this new path is better than previous one.
done
Syntax of Propositional Logic
Logic is used to represent properties of objects in the world about which we are going to reason.
When we say Miss Piggy is plump we are talking about the object Miss Piggy and a property
plump. Similarly when we say Kermit's voice is high-pitched then the object is Kermit's voice
and the property is high-pitched. It is normal to write these in logic as:



plump(misspiggy)

highpitched(voiceof(kermit))

So misspiggy and kermit are constants representing objects in our domain. Notice that plump and
highpitched is different from voiceof:

plump and highpitched are represent properties and so are boolean valued functions. They are
often called predicates or relations.

voiceof is a function that returns an object (not true/false). To help us differentiate we shall use
``of'' at the end of a function name.

The predicates plump and highpitched are unary predicates but of course we can have binary or
n-ary predicates; e.g. loves(misspiggy, voiceof(kermit))

Simple Sentences

The fundamental components of logic are

   •   object constants; e.g. misspiggy, kermit
   •   function constants; e.g. voiceof
   •   predicate constants; e.g. plump, highpitched, loved

Predicate and function constants take arguments which are objects in our domain. Predicate
constants are used to describe relationships concerning the objects and return the value true/false.
Function constants return values that are objects.

More Complex Sentences

We need to apply operators to construct more complex sentences from atoms.

Negation

  applied to an atom negates the atom:
loves(kermit, voiceof(misspiggy))

'Kermit does not love Miss Piggy's voice''

Conjunction

  combines two conjuncts:

loves(misspiggy, kermit)     loves(misspiggy, voiceof(kermit))

''Miss Piggy loves Kermit and Miss Piggy loves Kernit's voice''

Notice it is not correct syntax to write in logic

loves(misspiggy, kermit)     voiceof(kermit)

because we have tried to conjoin a sentence (truth valued) with an object. Logic operators must
apply to truth-valued sentences.

Disjunction

  combines two disjuncts:

loves(misspiggy, kermit)     loves(misspiggy, voiceof(kermit))

''Miss Piggy loves Kermit or Miss Piggy loves Kermit's voice''

Implication

   combines a condition and conclusion

loves(misspiggy, voiceof(kermit))       loves(misspiggy, kermit)

''If Miss Piggy loves Kermit's voice then Miss Piggy loves Kermit''

The language we have described so far contains atoms and the connectives , , and               .
This defines the syntax of propositional Logic. It is normal to represent atoms in propositional
logic as single upper-case letters but here we have used a more meaningful terminology for the
atoms that extends easily to Predicate Logic.

Semantics of Propositional Logic

We have defined the syntax of propositional Logic. However, this is of no use without talking
about the meaning, or semantics, of the sentences. Suppose our logic contained only atoms; e.g.
no logical connectives. This logic is very silly because any subset of these atoms is consistent;
e.g. beautiful(misspiggy) and ugly(misspiggy) are consistent because we cannot represent
ugly(misspiggy)          beautiful(misspiggy) So we now need a way in our logic to define which
sentences are true.

Example: Models Define Truth

Suppose a language contains only one object constant misspiggy and two relation constants ugly
and beautiful. The following models define different facts about Miss Piggy.



M=ø: In this model Miss Piggy is neither ugly nor beautiful.
M={ugly(misspiggy)}: In this model Miss Piggy is ugly and not beautiful.
M={beautiful(misspiggy)}: In this model Miss Piggy is beautiful and not ugly.
M={ugly(misspiggy), beautiful(misspiggy)}: In this model Miss Piggy is both ugly and
beautiful. The last statement is intuitively wrong but the model selected commits the truth of the
atoms in the language.

Compound Sentences

So far we have restricted our attention to the semantics of atoms: an atom is true if it is a member
of the model M; otherwise it is false. Extending the semantics to compound sentences is easy.
Notice that in the definitions below p and q do not need to be atoms because these definitions
work recursively until atoms are reached.

Conjunction

p   q is true in M iff p and q are true in M individually.

So the conjunct

loves(misspiggy, kermit)      loves(misspiggy, voiceof(kermit))

is true only when both

Miss Piggy loves Kermit; and
Miss Piggy loves Kermit's voice

Disjunction

p   q is true in M iff at least one of p or q is true in M.

So the disjunct

loves(misspiggy, kermit)      loves(misspiggy, voiceof(kermit))

is true whenever
Miss Piggy loves Kermit;
Miss Piggy loves Kermit's voice; or
Miss Piggy loves both Kermit and his voice.

Therefore the disjunction is weaker than either disjunct and the conjunction of these disjuncts.

Negation

    p is true in M iff p is not true in M.

Implication

p      q is true in M iff p is not true in M or q is true in M.

We have been careful about the definition of       . When people use an implication p        q they
normally imply that p causes q. So if p is true we are happy to say that p      q is true iff q is true.
But if p is false the causal link causes confusion because we can't tell whether q should be true or
not. Logic requires that the connectives are truth functional and so the truth of the compound
sentence must be determined from the truth of its component parts. Logic defines that if p is false
then p     q is true regardless of the truth of q.

So both of the following implications are true (provided you believe pigs do not fly!):

fly(pigs)     beautiful(misspiggy)
fly(pigs)       beautiful(misspiggy)

Example: Implications and Models

In which of the following models is

ugly(misspiggy)          beautiful(misspiggy) true?

M=Ø

Miss Piggy is not ugly and so the antecedent fails. Therefore the implication holds. (Miss Piggy
is also not beautiful in this model.)

M={beautiful(misspiggy)}

Again, Miss Piggy is not ugly and so the implication holds.

M={ugly(misspiggy)}

Miss Piggy is not beautiful and so the conclusion is valid and hence the implication holds.

M={ugly(misspiggy), beautiful(misspiggy)}
Miss Piggy is ugly and so the antecedent holds. But she is also beautiful and so
beautiful(misspiggy) is not true. Therefore the conclusion does not hold and so the implication
fails in this (and only this) case.

Truth Tables

Truth tables are often used to calculate the truth of complex propositional sentences. A truth
table represents all possible combinations of truths of the atoms and so contains all possible
models. A column is created for each of the atoms in the sentence, and all combinations of truth
values for these atoms are assigned one per row. So if there are $n$ atoms then there are $n$
initial columns and $2^n$ rows. The final column contains the truth of the sentence for each
combination of truths for the atoms. Intervening columns can be added to store intermediate truth
calculations. Below are two sample truth tables:




Equivalence

Two sentences are equivalence if they hold in exactly the same models.

Therefore we can determine equivalence by drawing truth tables that represent the sentences in
the various models. If the initial and final columns of the truth tables are identical then the
sentences are equivalent. Examples of equivalences include:
Unlike      and   ,   is not commutative:

loves(misspiggy, voiceof(kermit))        loves(misspiggy, kermit)

is very different from

loves(misspiggy, kermit)      loves(misspiggy, voiceof(kermit))

Similarly     is not associative.         )

Syntax & Semantics for Predicate Logic

Syntax of Predicate Logic

Propositional logic is fairly powerful but we must add variables and quantification to be able to
reason about objects in atoms and express properties of a set of objects without listing the atom
corresponding to each object.

We shall adopt the Prolog convention that variables have an initial capital letter. (This is contrary
to many Mathematical Logic books where variables are lower case and constants have an initial
capital.)

When we include variables we must specify their scope or quantification. The first quantifier we
want is the universal quantifier (for all).

                                         X.loves(misspiggy, X)

This allows X to range over all the objects and asserts that Miss Piggy loves each of them. We
have introduced one variable but any number is allowed:

                                              XY.loves(X, Y)

Each of the objects love all of the objects, even itself! Therefore XY. is the same as X. Y.
Quantifiers, like connectives, act on sentences. So if Miss Piggy loves all cute things (not just
Kermit!) we would write

                                    C.[cute(C) -> loves(misspiggy, C)]

rather than

                                     loves(misspiggy,   C.cute(C))

because the second argument to loves must be an object, not a sentence.
When the world contains a finite set of objects then a universally quantified sentence can be
converted into a sentence without the quantifier; e.g. X.loves(misspiggy, X) becomes

loves(misspiggy, misspiggy) loves(misspiggy, kermit)
loves(misspiggy, animal) ...

Contrast this with the infinite set of positive integers and the sentence

  N.[odd(N) $vee$ even(N)]

The other quantifier is the existential quantifier   (there exists).

  X.loves(misspiggy, X)

This allows X to range over all the objects and asserts that Miss Piggy loves (at least) one of
them. Similarly

  XY.loves(X, Y)

asserts that there is at least one loving couple (or self-loving object).

We shall be using First Order Predicate Logic where quantified variables range over object
constants only. We are defining Second Order Predicate Logic if we allow quantified variables to
range over functions or predicates as well; e.g.

  X.loves(misspiggy, X(kermit)) includes loves(misspiggy, voiceof(kermit))

  X.X(misspiggy, kermit) (there exists some relationship linking Miss Piggy and Kermit!)

Semantics of First Order Predicate Logic

Now we must deal with quantification.

  :

  X.p(X) holds in a model iff $p(z)$ holds for all objects $z$ in our domain.

  :

  X.p(X) holds in a model iff there is some object z in our domain so that p(z) holds.

Example: Available Objects affects Quantification

If misspiggy is the only object in our domain then

                    ugly(misspiggy)         beautiful(misspiggy) is equivalent to
X.ugly(X)        beautiful(X)

If there were other objects then there would be more atoms and so the set of models would be
larger; e.g. with objects misspiggy and kermit the possible models are all combinations of the
atoms ugly(misspiggy), beautiful(misspiggy) ugly(kermit), beautiful(kermit). Now the 2
sentences are no longer equivalent.

1). Although, every model in which

  X.ugly(X)         beautiful(X) holds,
ugly(misspiggy)        beautiful(misspiggy) also holds

2).There are models in which ugly(misspiggy)        beautiful(misspiggy) holds,
but X.ugly(X)         beautiful(X) does not hold; e.g.

M = {ugly(kermit), beautiful(kermit)}.

What about M = {ugly(misspiggy)}, beautiful(misspiggy)?

Clausal Form for Predicate Calculus !

In order to prove a formula in the predicate calculus by resolution,we

1.Negate the formula.

2.Put the negated formula into CNF, by doing the following:

i.Get rid of all   operators.

ii.Push the    operators in as far as possible.

iii.Rename variables as necessary (see the step below).

iv.Move all of the quantifiers to the left (the outside) of the expression using the following rules
(where Q is either or and G is a formula that does not contain x):
This leaves the formula in what is called prenex form which consists of a series of quantifiers
followed by a quantifier-free formula, called the matrix.

v.Remove all quantifiers from the formula. First we remove the existentially quantified variables
by using Skolemization. Each existentially quantified variable, say x is replaced by a function
term which begins with a new, n-ary function symbol, say f where n is the number of universally
quantified variables that occur before x is quantified in the formula. The arguments to the
function term are precisely these variables. For example, if we have the formula




then z would be replaced by a function term f(x,y) where f is a new function symbol. The result
is:




This new formula is satisfiable if and only if the original formula is satisfiable.

The new function symbol is called a Skolem function. If the existentially quantified variable has
no preceding universally quantified variables, then the function is a 0-ary function and is often
called a Skolem constant.

After removing all existential quantifiers, we simply drop all the universal quantifiers as we
assume that any variable appearing in a formula is universally quantified.

vi.The remaining formula (the matrix) is put in CNF by moving any          operators outside of any
   operations.

3.Finally, the CNF formula is written in clausal format by writing each conjunct as a set of
literals (a clause), and the whole formula as a set clauses (the clause set).

For example, if we begin with the proposition
we have:

1.Negate the theorem:



i.Push the   operators in. No change.

ii).Rename variables if necessary:



iii)Move the quantifiers to the outside: First, we have

                                     Then we get

iv)Remove the quantifiers, first by Skolemizing the existentially quantified variables. As these
have no universally quantified variables to their left, they are replaced by Skolem constants:



Drop the universal quantifiers:



v)Put the matrix into CNF. No change.

2.Write the formula in clausal form:



Inference Rules !

Complex deductive arguments can be judged valid or invalid based on whether or not the steps in
that argument follow the nine basic rules of inference. These rules of inference are all relatively
simple, although when presented in formal terms they can look overly complex.

Conjunction:

1. P
2. Q
3. Therefore, P and Q.

1. It is raining in New York.
2. It is raining in Boston
3. Therefore, it is raining in both New York and Boston
Simplification

1. P and Q.
2. Therefore, P.

1. It is raining in both New York and Boston.
2. Therefore, it is raining in New York.

Addition

1. P
2. Therefore, P or Q.

1. It is raining
2. Therefore, either either it is raining or the sun is shining.

Absorption

1. If P, then Q.
2. Therfore, If P then P and Q.

1. If it is raining, then I will get wet.
2. Therefore, if it is raining, then it is raining and I will get wet.

Modus Ponens

1. If P then Q.
2. P.
3. Therefore, Q.

1. If it is raining, then I will get wet.
2. It is raining.
3. Therefore, I will get wet.

Modus Tollens

1. If P then Q.
2. Not Q. (~Q).
3. Therefore, not P (~P).

1. If it had rained this morning, I would have gotten wet.
2. I did not get wet.
3. Therefore, it did not rain this morning.

Hypothetical Syllogism
1. If P then Q.
2. If Q then R.
3. Therefore, if P then R.

1. If it rains, then I will get wet.
2. If I get wet, then my shirt will be ruined.
3. If it rains, then my shirt will be ruined.

Disjunctive Syllogism

1. Either P or Q.
2. Not P (~P).
3. Therefore, Q.

1. Either it rained or I took a cab to the movies.
2. It did not rain.
3. Therefore, I took a cab to the movies.

Constructive Dilemma

1. (If P then Q) and (If R then S).
2. P or R.
3. Therefore, Q or S.

1. If it rains, then I will get wet and if it is sunny, then I will be dry.
2. Either it will rain or it will be sunny.
3. Therefore, either I will get wet or I will be dry.

The above rules of inference, when combined with the rules of replacement, mean that
propositional calculus is "complete." Propositional calculus is simply another name for formal
logic.

Resolution !

Resolution is a rule of inference leading to a refutation theorem-proving technique for sentences
in propositional logic and first-order logic. In other words, iteratively applying the resolution rule
in a suitable way allows for telling whether a propositional formula is satisfiable and for proving
that a first-order formula is unsatisfiable; this method may prove the satisfiability of a first-order
satisfiable formula, but not always, as it is the case for all methods for first-order logic.
Resolution was introduced by John Alan Robinson in 1965.

Resolution in propositional logic

The resolution rule in propositional logic is a single valid inference rule that produces a new
clause implied by two clauses containing complementary literals. A literal is a propositional
variable or the negation of a propositional variable. Two literals are said to be complements if
one is the negation of the other (in the following, ai is taken to be the complement to bj). The
resulting clause contains all the literals that do not have complements. Formally:




where

all as and bs are literals,
ai is the complement to bj, and
the dividing line stands for entails

The clause produced by the resolution rule is called the resolvent of the two input clauses.

When the two clauses contain more than one pair of complementary literals, the resolution rule
can be applied (independently) for each such pair. However, only the pair of literals that are
resolved upon can be removed: all other pair of literals remain in the resolvent clause.

A resolution technique

When coupled with a complete search algorithm, the resolution rule yields a sound and complete
algorithm for deciding the satisfiability of a propositional formula, and, by extension, the validity
of a sentence under a set of axioms.

This resolution technique uses proof by contradiction and is based on the fact that any sentence
in propositional logic can be transformed into an equivalent sentence in conjunctive normal
form. The steps are as follows:

1).All sentences in the knowledge base and the negation of the sentence to be proved (the
conjecture) are conjunctively connected.

2).The resulting sentence is transformed into a conjunctive normal form with the conjuncts
viewed as elements in a set, S, of clauses.

For example



would give rise to a set



3).The resolution rule is applied to all possible pairs of clauses that contain complementary
literals. After each application of the resolution rule, the resulting sentence is simplified by
removing repeated literals. If the sentence contains complementary literals, it is discarded (as a
tautology). If not, and if it is not yet present in the clause set S, it is added to S, and is considered
for further resolution inferences.

4).If after applying a resolution rule the empty clause is derived, the complete formula is
unsatisfiable (or contradictory), and hence it can be concluded that the initial conjecture follows
from the axioms.

5).If, on the other hand, the empty clause cannot be derived, and the resolution rule cannot be
applied to derive any more new clauses, the conjecture is not a theorem of the original
knowledge base.

One instance of this algorithm is the original Davis–Putnam algorithm that was later refined into
the DPLL algorithm that removed the need for explicit representation of the resolvents.

This description of the resolution technique uses a set S as the underlying data-structure to
represent resolution derivations. Lists, Trees and Directed Acyclic Graphs are other possible and
common alternatives. Tree representations are more faithful to the fact that the resolution rule is
binary. Together with a sequent notation for clauses, a tree representation also makes it clear to
see how the resolution rule is related to a special case of the cut-rule, restricted to atomic cut-
formulas. However, tree representations are not as compact as set or list representations, because
they explicitly show redundant subderivations of clauses that are used more than once in the
derivation of the empty clause. Graph representations can be as compact in the number of clauses
as list representations and they also store structural information regarding which clauses were
resolved to derive each resolvent.

Example




In English: if a or b is true, and a is false or c is true, then either b or c is true.

If a is true, then for the second premise to hold, c must be true. If a is false, then for the first
premise to hold, b must be true.

So regardless of a, if both premises hold, then b or c is true.

Unification

We also need some way of binding variables to values in a consistent way so that components of
sentences can be matched. This is the process of Unification.

Knowledge Representation
Network Representations

Networks are often used in artificial intelligence as schemes for representation. One of the
advantages of using a network representation is that theorists in computer science have studied
such structures in detail and there are a number of efficient and robust algorithms that may be
used to manipulate the representations.

Trees and Graphs

A tree is a collection of nodes in which each node may be expanded into one or more unique
subnodes until termination occurs. There may be no termination and an infinite tree results. A
graph is simply a tree in which non-unique nodes are generated; in other words, a tree is a graph
with no loops. The representation of the nodes and links is arbitrary. In a computer chess player,
for example, nodes might represent individual board positions and the links from each node the
legal moves from that position. This is a specific instance of a problem space. In general,
problem spaces are graphs in which the nodes represent states and the connections between states
represented by an operator that makes the state transformation.

IS-A Links and Semantic Networks

In constructing concept hierarchies, often the most important means of showing inclusion in a set
is to use what is called an IS-A link, in which X is a member in some more general set Y. For
example, a DOG ISA MAMMAL. As one travels up the link, the more general concept is
defined. This is generally the simplest type of link between concepts in concept or semantic
hierarchies. The combination of instances and classes connected by ISA links in a graph or tree
is generally known as a semantic network. Semantic networks are useful, in part, because they
provide a natural structure for inheritance. For instance, if a DOG ISA MAMMAL then those
properties that are true for MAMMALs and DOGs need not be specified for the DOG; instead
they may be derived via an inheritance procedure. This greatly reduces the amount of
information that must be stored explicitly although there is an increase in the time required to
access knowledge through the inheritance mechanism. Frames are a special type of semantic
network representation.

Associative Network

A means of representing relational knowledge as a labeled directed graph. Each vertex of the
graph represents a concept and each label represents a relation between concepts. Access and
updating procedures traverse and manipulate the graph. A semantic network is sometimes
regarded as a graphical notation for logical formulas.


Conceptual Graphs !

A conceptual graph (CG) is a graph representation for logic based on the semantic networks of
artificial intelligence.
A conceptual graph consists of concept nodes and relation nodes.

   •   The concept nodes represent entities, attributes, states, and events
   •   The relation nodes show how the concepts are interconnected

Conceptual Graphs are finite, connected, bipartite graphs.

Finite: because any graph (in 'human brain' or 'computer storage') can only have a finite number
of concepts and conceptual relations.

Connected: because two parts that are not connected would simply be called two conceptual
graphs.

Bipartite: because there are two different kinds of nodes: concepts and conceptual relations, and
every arc links a node of one kind to a node of another kind

Example

Following CG display form for John is going to Boston by bus.




The conceptual graph in Figure represents a typed or sorted version of logic. Each of the four
concepts has a type label, which represents the type of entity the concept refers to: Person, Go,
Boston, or Bus. Two of the concepts have names, which identify the referent: John or Boston.
Each of the three conceptual relations has a type label that represents the type of relation: agent
(Agnt), destination (Dest), or instrument (Inst). The CG as a whole indicates that the person John
is the agent of some instance of going, the city Boston is the destination, and a bus is the
instrument. Figure 1 can be translated to the following formula:




As this translation shows, the only logical operators used in Figure are conjunction and the
existential quantifier. Those two operators are the most common in translations from natural
languages, and many of the early semantic networks could not represent any others.
Structured Representation

Structure representation can be done in various ways like:

   •   Frames
   •   Scripts

Frames

A frame is a method of representation in which a particular class is defined by a number of
attributes (or slots) with certain values (the attributes are filled in for each instance). Thus,
frames are also known as slot-and-filler structures. Frame systems are also somewhat equivalent
to semantic networks although frames are usually associated with more defined structure than the
networks.

Like a semantic network, one of the chief properties of frames is that they provide a natural
structure for inheritance. ISA-Links connect classes to larger parent classes and properties of the
subclasses may be determined at both the level of the class itself and from parent classes.

This leads into the idea of defaults. Frames may indicate specific values for some attributes or
instead indicate a default. This is especially useful when values are not always known but can
generally be assumed to be true for most of the class. For example, the class BIRD may have a
default value of FLIES set to TRUE even though instances below it (say, for example, an
OSTRICH) have FLIES values of FALSE.

In addition, the values of particular attribute need not necessarily be filled with a value but may
also indicate a procedure to run to obtain a value. This is known as an attached procedure.
Attached procedures are especially useful when there is a high cost associated with computing a
particular value, when the value changes with time or when the expected access frequency is
low. Instead of computing the value for each instance, the values are computed only when
needed. However, this computation is run during execution (rather than during the establishment
of the frame network) and may be costly.

Scripts

A script is a remembered precedent, consisting of tightly coupled, expectation-suggesting
primitive-action and state-change frames.

A script is a structured representation describing a stereotyped sequence of events in a particular
context. That is, extend frames by explicitly representing expectations of actions and state
changes.

Why represent knowledge in this way?

1) Because real-world events do follow stereotyped patterns. Human beings use previous
experiences to understand verbal accounts; computers can use scripts instead.
2) Because people, when relating events, do leave large amounts of assumed detail out of their
accounts. People don't find it easy to converse with a system that can't fill in missing
conversational detail.

Min Max Algorithm

There are plenty of applications for AI, but games are the most interesting to the public.
Nowadays every major OS comes with some games. So it is no surprise that there are some
algorithms that were devised with games in mind.

The Min-Max algorithm is applied in two player games, such as tic-tac-toe, checkers, chess, go,
and so on. All these games have at least one thing in common, they are logic games. This means
that they can be described by a set of rules and premisses. With them, it is possible to know from
a given point in the game, what are the next available moves. So they also share other
characteristic, they are ‘full information games’. Each player knows everything about the
possible moves of the adversary.




Before explaining the algorithm, a brief introduction to search trees is required. Search trees are
a way to represent searches. The squares are known as nodes and they represent points of the
decision in the search. The nodes are connected with branches. The search starts at the root node,
the one at the top of the figure. At each decision point, nodes for the available search paths are
generated, until no more decisions are possible. The nodes that represent the end of the search
are known as leaf nodes.

There are two players involved, MAX and MIN. A search tree is generated, depth-first, starting
with the current game position upto the end game position. Then, the final game position is
evaluated from MAX’s point of view, as shown in Figure 1. Afterwards, the inner node values of
the tree are filled bottom-up with the evaluated values. The nodes that belong to the MAX player
receive the maximun value of it’s children. The nodes for the MIN player will select the
minimun value of it’s children.

MinMax (GamePosition game) {
return MaxMove (game);
}

MaxMove (GamePosition game) {
if (GameEnded(game)) {
return EvalGameState(game);
}
else {
best_move < - {};
moves <- GenerateMoves(game);
ForEach moves {
move <- MinMove(ApplyMove(game));
if (Value(move) > Value(best_move)) {
best_move < - move;
}
}
return best_move;
}
}

MinMove (GamePosition game) {
best_move <- {};
moves <- GenerateMoves(game);
ForEach moves {
move <- MaxMove(ApplyMove(game));
if (Value(move) > Value(best_move)) {
best_move < - move;
}
}

return best_move;
}

So what is happening here? The values represent how good a game move is. So the MAX player
will try to select the move with highest value in the end. But the MIN player also has something
to say about it and he will try to select the moves that are better to him, thus minimizing MAX’s
outcome.

Optimisation

However only very simple games can have their entire search tree generated in a short time. For
most games this isn’t possible, the universe would probably vanish first. So there are a few
optimizations to add to the algorithm.

First a word of caution, optimization comes with a price. When optimizing we are trading the full
information about the game’s events with probabilities and shortcuts. Instead of knowing the full
path that leads to victory, the decisions are made with the path that might lead to victory. If the
optimization isn’t well choosen, or it is badly applied, then we could end with a dumb AI. And it
would have been better to use random moves.
One basic optimization is to limit the depth of the search tree. Why does this help? Generating
the full tree could take ages. If a game has a branching factor of 3, which means that each node
has tree children, the tree will have the folling number of nodes per depth:




The sequence shows that at depth n the tree will have 3^n nodes. To know the total number of
generated nodes, we need to sum the node count at each level. So the total number of nodes for a
tree with depth n is sum (0, n, 3^n). For many games, like chess that have a very big branching
factor, this means that the tree might not fit into memory. Even if it did, it would take to long to
generate. If each node took 1s to be analyzed, that means that for the previous example, each
search tree would take sum (0, n, 3^n) * 1s. For a search tree with depth 5, that would mean
1+3+9+27+81+243 = 364 * 1 = 364s = 6m! This is too long for a game. The player would give
up playing the game, if he had to wait 6m for each move from the computer.

The second optimization is to use a function that evaluates the current game position from the
point of view of some player. It does this by giving a value to the current state of the game, like
counting the number of pieces in the board, for example. Or the number of moves left to the end
of the game, or anything else that we might use to give a value to the game position.

Instead of evaluating the current game position, the function might calculate how the current
game position might help ending the game. Or in another words, how probable is that given the
current game position we might win the game. In this case the function is known as an estimation
function.

This function will have to take into account some heuristics. Heuristics are knowledge that we
have about the game, and it can help generate better evaluation functions. For example, in
checkers, pieces at corners and sideways positions can’t be eaten. So we can create an evaluation
function that gives higher values to pieces that lie on those board positions thus giving higher
outcomes for game moves that place pieces in those positions.

One of the reasons that the evaluation function must be able to evalute game positions for both
players is that you don’t know to which player the limit depth belongs.

However having two functions can be avoided if the game is symetric. This means that the loss
of a player equals the gains of the other. Such games are also known as ZERO-SUM games. For
these games one evalution function is enough, one of the players just have to negate the return of
the function.
The revised algorithm is:

MinMax (GamePosition game) {
return MaxMove (game);
}

MaxMove (GamePosition game) {
if (GameEnded(game) || DepthLimitReached()) {
return EvalGameState(game, MAX);
}
else {
best_move < - {};
moves <- GenerateMoves(game);
ForEach moves {
move <- MinMove(ApplyMove(game));
if (Value(move) > Value(best_move)) {
best_move < - move;
}
}
return best_move;
}
}

MinMove (GamePosition game) {
if (GameEnded(game) || DepthLimitReached()) {
return EvalGameState(game, MIN);
}
else {
best_move <- {};
moves <- GenerateMoves(game);
ForEach moves {
move <- MaxMove(ApplyMove(game));
if (Value(move) > Value(best_move)) {
best_move < - move;
}
}
return best_move;
}
}

Even so the algorithm has a few flaw, some of them can be fixed while other can only be solved
by choosing another algorithm.

One of flaws is that if the game is too complex the answer will always take too long even with a
depth limit. One solution it limit the time for search. If the time runs out choose the best move
found until the moment.
A big flaw is the limited horizon problem. A game position that appears to be very good might
turn out very bad. This happens because the algorithm wasn’t able to see that a few game moves
ahead the adversary will be able to make a move that will bring him a great outcome. The
algorithm missed that fatal move because it was blinded by the depth limit.

Speeding the Algorithm

There are a few things can still be done to reduce the search time. Take a look at figure 2. The
value for node A is 3, and the first found value for the subtree starting at node B is 2. So since
the B node is at a MIN level, we know that the selected value for the B node must be less or
equal than 2. But we also know that the A node has the value 3, and both A and B nodes share
the same parent at a MAX level. This means that the game path starting at the B node wouldn’t
be selected because 3 is better than 2 for the MAX node. So it isn’t worth to pursue the search
for children of the B node, and we can safely ignore all the remaining children.




This all means that sometimes the search can be aborted because we find out that the search
subtree won’t lead us to any viable answer.

This optimization is know as alpha-beta cuttoffs and the algorithm is as follows:

1. Have two values passed around the tree nodes:
i)the alpha value which holds the best MAX value found;
ii)the beta value which holds the best MIN value found.

2. At MAX level, before evaluating each child path, compare the returned value with of the
previous path with the beta value. If the value is greater than it abort the search for the current
node;

3. At MIN level, before evaluating each child path, compare the returned value with of the
previous path with the alpha value. If the value is lesser than it abort the search for the current
node.

Full pseudocode for MinMax with alpha-beta cuttoffs.

MinMax (GamePosition game) {
return MaxMove (game);
}
MaxMove (GamePosition game, Integer alpha, Integer beta) {
if (GameEnded(game) || DepthLimitReached()) {
return EvalGameState(game, MAX);
}
else {
best_move < - {};
moves <- GenerateMoves(game);
ForEach moves {
move <- MinMove(ApplyMove(game), alpha, beta);
if (Value(move) > Value(best_move)) {
best_move < - move;
alpha <- Value(move);
}

// Ignore remaining moves
if (beta > alpha)
return best_move;
}
return best_move;
}
}

MinMove (GamePosition game) {
if (GameEnded(game) || DepthLimitReached()) {
return EvalGameState(game, MIN);
}
else {
best_move < - {};
moves <- GenerateMoves(game);
ForEach moves {
move <- MaxMove(ApplyMove(game), alpha, beta);
if (Value(move) > Value(best_move)) {
best_move < - move;
beta <- Value(move);
}

// Ignore remaining moves
if (beta < alpha)
return best_move;
}
return best_move;
}
}

How better does a MinMax with alpha-beta cuttoffs behave when compared with a normal
MinMax? It depends on the order the search is searched. If the way the game positions are
generated doesn’t create situations where the algorithm can take advantage of alpha-beta cutoffs
then the improvements won’t be noticible. However, if the evaluation function and the
generation of game positions leads to alpha-beta cuttoffs then the improvements might be great.

Alpha-Beta Cutoff

With all this talk about search speed many of you might be wondering what this is all about.
Well, the search speed is very important in AI because if an algorithm takes too long to give a
good answer the algorithm may not be suitable.

For example, a good MinMax algorithm implementation with an evaluation function capable to
give very good estimatives might be able to search 1000 positions a second. In tourament chess
each player has around 150 seconds to make a move. So it would probably be able to analyze
150 000 positions during that period. But in chess each move has around 35 possible branchs! In
the end the program would only be able to analyze around 3, to 4 moves ahead in the game. Even
humans with very few pratice in chess can do better than this.

But if we use MinMax with alpha-beta cutoffs, again a decent implementation with a good
evaluation function, the result behaviour might be much better. In this case, the program might
be able to double the number of analyzed positions and thus becoming a much toughter
adversary.

Example

Example of a board with the values estimated for each position.




The game uses MinMax with alpha-beta cutoffs for the computer moves. The evaluation function
is an weighted average of the positions occupied by the checker pieces. The figure shows the
values for each board position. The value of each board position is multiplied by the type of the
piece that rests on that position, described in first table.

Rule based Expert System
Expert System !

"An expert system is an interactive computer-based decision tool that uses both facts and
heuristics to solve difficult decision problems based on knowledge acquired from an expert."

An expert system is a computer program that simulates the thought process of a human expert to
solve complex decision problems in a specific domain. This chapter addresses the characteristics
of expert systems that make them different from conventional programming and traditional de-
cision support tools. The growth of expert systems is expected to continue for several years.
With the continuing growth, many new and exciting applications will emerge. An expert system
operates as an interactive system that responds to questions, asks for clarification, makes
recommendations, and generally aids the decision-making process. Expert systems provide
expert advice and guidance in a wide variety of activities, from computer diagnosis

An expert system may be viewed as a computer simulation of a human expert. Expert systems
are an emerging technology with many areas for po- tential applications. Past applications range
from MYCIN, used in the medical field to diagnose infectious blood diseases, to XCON, used to
configure com- puter systems. These expert systems have proven to be quite successful. Most
applications of expert systems will fall into one of the following categories:

   •   Interpreting and identifying
   •   Predicting
   •   Diagnosing
   •   Designing
   •   Planning
   •   Monitoring
   •   Debugging and testing
   •   Instructing and training
   •   Controlling

Applications that are computational or deterministic in nature are not good candidates for expert
systems. Traditional decision support systems such as spreadsheets are very mechanistic in the
way they solve problems. They operate under mathematical and Boolean operators in their
execution and arrive at one and only one static solution for a given set of data. Calculation
intensive applications with very exacting requirements are better handled by traditional decision
support tools or conventional programming. The best application candidates for expert systems
are those dealing with expert heuristics for solving problems. Conventional computer programs
are based on factual knowledge, an indisputable strength of computers. Humans, by contrast,
solve problems on the basis of a mixture of factual and heuristic knowledge. Heuristic
knowledge, composed of intuition, judgment, and logical inferences, is an indisputable strength
of humans. Successful expert systems will be those that combine facts and heuristics and thus
merge human knowledge with computer power in solving problems. To be effective, an expert
system must focus on a particular problem domain, as discussed below

Domain Specificity
Expert systems are typically very domain specific. For example, a diagnostic expert system for
troubleshooting computers must actually perform all the necessary data manipulation as a human
expert would. The developer of such a system must limit his or her scope of the system to just
what is needed to solve the target problem. Special tools or programming languages are often
needed to accomplish the specific objectives of the system.

Special Programming Languages

Expert systems are typically written in special programming languages. The use of languages
like LISP and PROLOG in the development of an expert system simplifies the coding process.
The major advantage of these languages, as compared to conventional programming languages,
is the simplicity of the addition, elimination, or substitution of new rules and memory
management capabilities. Some of the distinguishing characteristics of programming languages
needed for expert systems work are:

   •   Efficient mix of integer and real variables
   •   Good memory-management procedures
   •   Extensive data-manipulation routines
   •   Incremental compilation
   •   Tagged memory architecture
   •   Optimization of the systems environment
   •   Efficient search procedures

Architecture of Expert System !

Expert systems typically contain the following four components:

   •   Knowledge-Acquisition Interface
   •   User Interface
   •   Knowledge Base
   •   Inference Engine

This architecture differs considerably from traditional computer programs, resulting in several
characteristics of expert systems.
# Expert System Components #

Knowledge-Acquisition Interface

The knowledge-acquisition interface controls how the expert and knowledge engineer interact
with the program to incorporate knowledge into the knowledge base. It includes features to assist
experts in expressing their knowledge in a form suitable for reasoning by the computer.

This process of expressing knowledge in the knowledge base is called knowledge acquisition.
Knowledge acquisition turns out to be quite difficult in many cases--so difficult that some
authors refer to the knowledge acquisition bottleneck to indicate that it is this aspect of expert
system development which often requires the most time and effort.
Debugging faulty knowlege bases is facilitated by traces (lists of rules in the order they were
fired), probes (commands to find and edit specific rules, facts, and so on), and bookkeeping
functions and indexes (which keep track of various features of the knowledge base such as
variables and rules). Some rule-based expert system shells for personal computers monitor data
entry, checking the syntactic validity of rules. Expert systems are typically validated by testing
their preditions for several cases against those of human experts. Case facilities--permitting a file
of such cases to be stored and automatically evaluated after the program is revised--can greatly
speed the vaidation process. Many features that are useful for the user interface, such as on-
screen help and explanations, are also of benefit to the developer of expert systems and are also
part of knowledge-acquisition interfaces.

Expert systems in the literature demonstrate a wide range of modes of knowledge acquisition
(Buchanan, 1985). Expert system shells on microcomputers typically require the user to either
enter rules explicitly or enter several examples of cases with appropriate conclusions, from
which the program will infer a rule.

User Interface

The user interface is the part of the program that interacts with the user. It prompts the user for
information required to solve a problem, displays conclusions, and explains its reasoning.

Features of the user interface often include:

   •   Doesn't ask "dumb" questions
   •   Explains its reasoning on request
   •   Provides documentation and references
   •   Defines technical terms
   •   Permits sensitivity analyses, simulations, and what-if analyses
   •   Detailed report of recommendations
   •   Justifies recommendations
   •   Online help
   •   Graphical displays of information
   •   Trace or step through reasoning

The user interface can be judged by how well it reproduces the kind of interaction one might
expect between a human expert and someone consulting that expert.

Knowledge Base

The knowledge base consists of specific knowledge about some substantive domain. A
knowledge base differs from a data base in that the knowledge base includes both explicit
knowledge and implicit knowledge. Much of the knowledge in the knowledge base is not stated
explicitly, but inferred by the inference engine from explicit statements in the knowledge base.
This makes knowledge bases have more efficient data storage than data bases and gives them the
power to exhaustively represent all the knowledge implied by explicit statements of knowledge.
There are several important ways in which knowledge is represented in a knowledge base. For
more information, see knowledge representation strategies.

Knowledge bases can contain many different types of knowledge and the process of acquiring
knowledge for the knowledge base (this is often called knowledge acquisition) often needs to be
quite different depending on the type of knowledge sought.

Types of Knpwledge

There are many different kinds of knowledge considered in expert systems. Many of these form
dimensions of contrasting knowledge:

   •   explicit knowledge
   •   implicit knowledge
   •   domain knowledge
   •   common sense or world knowledge
   •   heuristics
   •   algorithms
   •   procedural knowledge
   •   declarative or semantic knowledge
   •   public knowledge
   •   private knowledge
   •   shallow knowledge
   •   deep knowledge
   •   metaknowledge

Inference Engine

The inference engine uses general rules of inference to reason from the knowledge base and
draw conclusions which are not explicitly stated but can be inferred from the knowledge base.

Inference engines are capable of symbolic reasoning, not just mathematical reasoning. Hence,
they expand the scope of fruitful applications of computer programs.

The specific forms of inference permitted by different inference engines varies, depending on
several factors, including the knowledge representation strategies employed by the expert
system.

Expert System Development !

Most expert systems are developed by a team of people, with the number of members varying
with the complexity and scope of the project. Of course, a single individual can develop a very
simple system. But usually at least two people are involved.

There are two essential roles that must filled by the development: knowledge engineer and
substantive expert.
•   The Knowledge Engineer
   •   The Substantive Expert

The Knowledge Engineer

Criteria for selecting the Knowledge Engineer

   •   Competent
   •   Organized
   •   Patient

Problem with Knowledge Engineer

   •   Technician with little social skill
   •   Sociable with low technical skill
   •   Disorganized
   •   Unwilling to challeng expert to produce clarity
   •   Unable to listen carefully to expert
   •   Undiplomatic when discussing flaws in system or expert's knowledge
   •   Unable to quickly understand diverse substantive areas

The Substantive Expert

Criteria for selecting the expert

   •   Competent
   •   Available
   •   Articulate
   •   Self-Confident
   •   Open-Minded

Varieties of experts

   •   No expert
   •   Multiple experts
   •   Book knowledge only
   •   The knowledge engineer is also the expert

Problem Experts

   •   The unavailable expert
   •   The reluctant expert
   •   The cynical expert
   •   The arrogant expert
   •   The rambling expert
   •   The uncommunicative expert
   •   The too-cooperative expert
•      The would-be-knowledge-engineer expert

Development Process

The systems development process often used for traditional software such as management
information systems often employs a process described as the "System Development Life Cycle"
or "Waterfall" Model. While this model identifies a number of important tasks in the
development process, many developers of expert systems have found it to be inadequate for
expert systems for a number of important reasons. Instead, many expert systems are developed
using a process called "Rapid Prototyping and Incremental Development."

System Development Life-Cycle

Problem Analysis

Is the problem solvable? Is it feasible with this approach? cost-benefit analysis

Requirement Specification

What are the desired features and goals of the proposed system? Who are the users? What
constraints must be considered? What development and delivery environments will be used?

Design

Preliminary Design - overall structure, data flow diagram, perhaps language

Detailed Design - details of each module

Implementation

Writing and debugging code, integrating modules, creating interfaces

Testing

Comparing system to its specifications and assessing validity

Maintenance

Corrections, modifications, enhancements

Managing Uncertainty in Expert Systems

Sources of uncertainty in Expert System

   •      Weak implication
   •      Imprecise language
•   Unknown data
   •   Difficulty in combining the views of different experts

Uncertainty in AI

   •   Information is partial
   •   Information is not fully reliable
   •   Representation language is inherently imprecise
   •   Information comes from multiple sources and it is conflicting
   •   Information is approximate
   •   Non-absolute cause-effect relationship exist

Representing uncertain information in Expert System

   •   Probabilistic
   •   Certainty factors
   •   Theory of evidence
   •   Fuzzy logic
   •   Neural Network
   •   GA
   •   Rough set

Bayesian Probability Theory

Bayesian probability is one of the most popular interpretations of the concept of probability. The
Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning
with uncertain statements. To evaluate the probability of a hypothesis, the Bayesian probabilist
specifies some prior probability, which is then updated in the light of new relevant data. The
Bayesian interpretation provides a standard set of procedures and formulae to perform this
calculation.

Bayesian probability interprets the concept of probability as "a measure of a state of knowledge",
in contrast to interpreting it as a frequency or a physical property of a system. Its name is derived
from the 18th century statistician Thomas Bayes, who pioneered some of the concepts. Broadly
speaking, there are two views on Bayesian probability that interpret the state of knowledge
concept in different ways. According to the objectivist view, the rules of Bayesian statistics can
be justified by requirements of rationality and consistency and interpreted as an extension of
logic. According to the subjectivist view, the state of knowledge measures a "personal belief".
Many modern machine learning methods are based on objectivist Bayesian principles. One of the
crucial features of the Bayesian view is that a probability is assigned to a hypothesis, whereas
under the frequentist view, a hypothesis is typically rejected or not rejected without directly
assigning a probability.

The probability of a hypothesis given the data (the posterior) is proportional to the product of the
likelihood times the prior probability (often just called the prior). The likelihood brings in the
effect of the data, while the prior specifies the belief in the hypothesis before the data was
observed.

More formally, Bayesian inference uses Bayes' formula for conditional probability:




where

H is a hypothesis, and D is the data.

P(H) is the prior probability of H: the probability that H is correct before the data D was seen.

P(D | H) is the conditional probability of seeing the data D given that the hypothesis H is true.
P(D | H) is called the likelihood.

P(D) is the marginal probability of D.

P(H | D) is the posterior probability: the probability that the hypothesis is true, given the data and
the previous state of belief about the hypothesis.

Stanford Certainty Factor !

Uncertainty is represented as a degree of belief in two steps:

   •    Express the degree of belief
   •    Manipulate the degrees of belief during the use of knowledge based systems

It is also based on evidence (or the expert’s assessment).

Form of certainty factors in ES

IF <evidence>
THEN <hypothesis>       {cf }


cf represents belief in hypothesis H given that evidence E has occurred

It is based on 2 functions
i) Measure of belief MB(H, E)
ii) Measure of disbelief MD(H, E)

Indicate the degree to which belief/disbelief of hypothesis H is increased if evidence E were
observed
Uncertain term and their intepretation




Total strength of belief and disbelief in a hypothesis:




Nonmonotonic logic and Reasoning with Beliefs

A non-monotonic logic is a formal logic whose consequence relation is not monotonic. Most
studied formal logics have a monotonic consequence relation, meaning that adding a formula to a
theory never produces a reduction of its set of consequences. Intuitively, monotonicity indicates
that learning a new piece of knowledge cannot reduce the set of what is known. A monotonic
logic cannot handle various reasoning tasks such as reasoning by default (consequences may be
derived only because of lack of evidence of the contrary), abductive reasoning (consequences are
only deduced as most likely explanations) and some important approaches to reasoning about
knowledge (the ignorance of a consequence must be retracted when the consequence becomes
known) and similarly belief revision (new knowledge may contradict old beliefs).

Default reasoning

An example of a default assumption is that the typical bird flies. As a result, if a given animal is
known to be a bird, and nothing else is known, it can be assumed to be able to fly. The default
assumption must however be retracted if it is later learned that the considered animal is a
penguin. This example shows that a logic that models default reasoning should not be
monotonic. Logics formalizing default reasoning can be roughly divided in two categories:
logics able to deal with arbitrary default assumptions (default logic, defeasible logic/defeasible
reasoning/argument (logic), and answer set programming) and logics that formalize the specific
default assumption that facts that are not known to be true can be assumed false by default
(closed world assumption and circumscription).

Abductive reasoning

Abductive reasoning is the process of deriving the most likely explanations of the known facts.
An abductive logic should not be monotonic because the most likely explanations are not
necessarily correct. For example, the most likely explanation for seeing wet grass is that it
rained; however, this explanation has to be retracted when learning that the real cause of the
grass being wet was a sprinkler. Since the old explanation (it rained) is retracted because of the
addition of a piece of knowledge (a sprinkler was active), any logic that models explanations is
non-monotonic.

Reasoning about knowledge

If a logic includes formulae that mean that something is not known, this logic should not be
monotonic. Indeed, learning something that was previously not known leads to the removal of
the formula specifying that this piece of knowledge is not known. This second change (a removal
caused by an addition) violates the condition of monotonicity. A logic for reasoning about
knowledge is the autoepistemic logic.

Belief revision

Belief revision is the process of changing beliefs to accommodate a new belief that might be
inconsistent with the old ones. In the assumption that the new belief is correct, some of the old
ones have to be retracted in order to maintain consistency. This retraction in response to an
addition of a new belief makes any logic for belief revision to be non-monotonic. The belief
revision approach is alternative to paraconsistent logics, which tolerate inconsistency rather than
attempting to remove it.

What makes belief revision non-trivial is that several different ways for performing this
operation may be possible. For example, if the current knowledge includes the three facts “A is
true”, “B is true” and “if A and B are true then C is true”, the introduction of the new information
“C is false” can be done preserving consistency only by removing at least one of the three facts.
In this case, there are at least three different ways for performing revision. In general, there may
be several different ways for changing knowledge.

Fuzzy Logic

The concept of Fuzzy Logic (FL) was conceived by Lotfi Zadeh, a professor at the University of
California at Berkley, and presented not as a control methodology, but as a way of processing
data by allowing partial set membership rather than crisp set membership or non-membership.
This approach to set theory was not applied to control systems until the 70's due to insufficient
small-computer capability prior to that time. Professor Zadeh reasoned that people do not require
precise, numerical information input, and yet they are capable of highly adaptive control. If
feedback controllers could be programmed to accept noisy, imprecise input, they would be much
more effective and perhaps easier to implement. Unfortunately, U.S. manufacturers have not
been so quick to embrace this technology while the Europeans and Japanese have been
aggressively building real products around it.

WHAT IS FUZZY LOGIC?

In this context, FL is a problem-solving control system methodology that lends itself to
implementation in systems ranging from simple, small, embedded micro-controllers to large,
networked, multi-channel PC or workstation-based data acquisition and control systems. It can
be implemented in hardware, software, or a combination of both. FL provides a simple way to
arrive at a definite conclusion based upon vague, ambiguous, imprecise, noisy, or missing input
information. FL's approach to control problems mimics how a person would make decisions,
only much faster.

HOW IS FL DIFFERENT FROM CONVENTIONAL CONTROL METHODS?

FL incorporates a simple, rule-based IF X AND Y THEN Z approach to a solving control
problem rather than attempting to model a system mathematically. The FL model is empirically-
based, relying on an operator's experience rather than their technical understanding of the
system. For example, rather than dealing with temperature control in terms such as "SP =500F",
"T <1000F", or "210C <TEMP <220C", terms like "IF (process is too cool) AND (process is
getting colder) THEN (add heat to the process)" or "IF (process is too hot) AND (process is
heating rapidly) THEN (cool the process quickly)" are used. These terms are imprecise and yet
very descriptive of what must actually happen. Consider what you do in the shower if the
temperature is too cold: you will make the water comfortable very quickly with little trouble. FL
is capable of mimicking this type of behavior but at very high rate.

HOW DOES FL WORK?

FL requires some numerical parameters in order to operate such as what is considered significant
error and significant rate-of-change-of-error, but exact values of these numbers are usually not
critical unless very responsive performance is required in which case empirical tuning would
determine them. For example, a simple temperature control system could use a single
temperature feedback sensor whose data is subtracted from the command signal to compute
"error" and then time-differentiated to yield the error slope or rate-of-change-of-error, hereafter
called "error-dot". Error might have units of degs F and a small error considered to be 2F while a
large error is 5F. The "error-dot" might then have units of degs/min with a small error-dot being
5F/min and a large one being 15F/min. These values don't have to be symmetrical and can be
"tweaked" once the system is operating in order to optimize performance. Generally, FL is so
forgiving that the system will probably work the first time without any tweaking.

Dempster/Shafer Theory

The Dempster-Shafer theory, also known as the theory of belief functions, is a generalization of
the Bayesian theory of subjective probability. Whereas the Bayesian theory requires probabilities
for each question of interest, belief functions allow us to base degrees of belief for one question
on probabilities for a related question. These degrees of belief may or may not have the
mathematical properties of probabilities; how much they differ from probabilities will depend on
how closely the two questions are related.

The Dempster-Shafer theory owes its name to work by A. P. Dempster (1968) and Glenn Shafer
(1976), but the kind of reasoning the theory uses can be found as far back as the seventeenth
century. The theory came to the attention of AI researchers in the early 1980s, when they were
trying to adapt probability theory to expert systems. Dempster-Shafer degrees of belief resemble
the certainty factors in MYCIN, and this resemblance suggested that they might combine the
rigor of probability theory with the flexibility of rule-based systems. Subsequent work has made
clear that the management of uncertainty inherently requires more structure than is available in
simple rule-based systems, but the Dempster-Shafer theory remains attractive because of its
relative flexibility.

The Dempster-Shafer theory is based on two ideas: the idea of obtaining degrees of belief for one
question from subjective probabilities for a related question, and Dempster's rule for combining
such degrees of belief when they are based on independent items of evidence.

To illustrate the idea of obtaining degrees of belief for one question from subjective probabilities
for another, suppose I have subjective probabilities for the reliability of my friend Jon. My
probability that he is reliable is 0.9, and my probability that he is unreliable is 0.1. Suppose he
tells me a limb fell on my car. This statement, which must true if she is reliable, is not
necessarily false if she is unreliable. So his testimony alone justifies a 0.9 degree of belief that a
limb fell on my car, but only a zero degree of belief (not a 0.1 degree of belief) that no limb fell
on my car. This zero does not mean that I am sure that no limb fell on my car, as a zero
probability would; it merely means that jon's testimony gives me no reason to believe that no
limb fell on my car. The 0.9 and the zero together constitute a belief function.

Knowledge Acquisition

Knowledge Acquisition is concerned with the development of knowledge bases based on the
expertise of a human expert. This requires to express knowledge in a formalism suitable for
automatic interpretation. Within this field, research at UNSW focusses on incremental
knowledge acquisition techniques, which allow a human expert to provide explanations of their
decisions that are automatically integrated into sophisticated knowledge bases.

Types of Learning

Learning is acquiring new knowledge, behaviors, skills, values, preferences or understanding,
and may involve synthesizing different types of information. The ability to learn is possessed by
humans, animals and some machines. Progress over time tends to follow learning curves.

Human learning may occur as part of education or personal development. It may be goal-
oriented and may be aided by motivation. The study of how learning occurs is part of
neuropsychology, educational psychology, learning theory, and pedagogy.
Learning may occur as a result of habituation or classical conditioning, seen in many animal
species, or as a result of more complex activities such as play, seen only in relatively intelligent
animals and humans. Learning may occur consciously or without conscious awareness. There is
evidence for human behavioral learning prenatally, in which habituation has been observed as
early as 32 weeks into gestation, indicating that the central nervous system is sufficiently
developed and primed for learning and memory to occur very early on in development.

Play has been approached by several theorists as the first form of learning. Children play,
experiment with the world, learn the rules, and learn to interact. Vygotsky agrees that play is
pivotal for children's development, since they make meaning of their environment through play.

Types of Learning

Habituation

In psychology, habituation is an example of non-associative learning in which there is a
progressive diminution of behavioral response probability with repetition of a stimulus. It is
another form of integration. An animal first responds to a stimulus, but if it is neither rewarding
nor harmful the animal reduces subsequent responses. One example of this can be seen in small
song birds - if a stuffed owl (or similar predator) is put into the cage, the birds initially react to it
as though it were a real predator. Soon the birds react less, showing habituation. If another
stuffed owl is introduced (or the same one removed and re-introduced), the birds react to it again
as though it were a predator, demonstrating that it is only a very specific stimulus that is
habituated to (namely, one particular unmoving owl in one place). Habituation has been shown
in essentially every species of animal, including the large protozoan Stentor Coeruleus.

Sensitization

Sensitization is an example of non-associative learning in which the progressive amplification of
a response follows repeated administrations of a stimulus (Bell et al., 1995). An everyday
example of this mechanism is the repeated tonic stimulation of peripheral nerves that will occur
if a person rubs his arm continuously. After a while, this stimulation will create a warm sensation
that will eventually turn painful. The pain is the result of the progressively amplified synaptic
response of the peripheral nerves warning the person that the stimulation is harmful.
Sensitization is thought to underlie both adaptive as well as maladaptive learning processes in the
organism.

Asociative learning

Associative learning is the process by which an element is learned through association with a
separate, pre-occurring element.

Operant conditioning

Operant conditioning is the use of consequences to modify the occurrence and form of behavior.
Operant conditioning is distinguished from Pavlovian conditioning in that operant conditioning
Whole basic
Whole basic
Whole basic
Whole basic
Whole basic
Whole basic

More Related Content

What's hot

Artificial Intelligence(AI)
Artificial Intelligence(AI)Artificial Intelligence(AI)
Artificial Intelligence(AI)Hari krishnan
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligenceGautam Sharma
 
Artificial intelligence original
Artificial intelligence originalArtificial intelligence original
Artificial intelligence originalSaila Sri
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceRk King
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligenceUTKARSH NATH
 
A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...
A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...
A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...guestac67362
 
presentation on Artificial intelligence by prince kumar kushwaha from rustamj...
presentation on Artificial intelligence by prince kumar kushwaha from rustamj...presentation on Artificial intelligence by prince kumar kushwaha from rustamj...
presentation on Artificial intelligence by prince kumar kushwaha from rustamj...Rustamji Institute of Technology
 
Artificial Intelligence Short Question and Answer
Artificial Intelligence Short Question and AnswerArtificial Intelligence Short Question and Answer
Artificial Intelligence Short Question and AnswerNaiyan Noor
 
best presentation Artitficial Intelligence
best presentation Artitficial Intelligencebest presentation Artitficial Intelligence
best presentation Artitficial Intelligencejennifer joe
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceAkshay Thakur
 
Introduction to Artificial Intelligence - Cybernetics Robo Academy
Introduction to Artificial Intelligence - Cybernetics Robo AcademyIntroduction to Artificial Intelligence - Cybernetics Robo Academy
Introduction to Artificial Intelligence - Cybernetics Robo AcademyTutulAhmed3
 
Timo Honkela: An Introduction to Artificial Intelligence
Timo Honkela: An Introduction to Artificial IntelligenceTimo Honkela: An Introduction to Artificial Intelligence
Timo Honkela: An Introduction to Artificial IntelligenceTimo Honkela
 

What's hot (20)

Artificial Intelligence(AI)
Artificial Intelligence(AI)Artificial Intelligence(AI)
Artificial Intelligence(AI)
 
Ai
AiAi
Ai
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Artificial intelligence original
Artificial intelligence originalArtificial intelligence original
Artificial intelligence original
 
AI Introduction
AI Introduction AI Introduction
AI Introduction
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Ai notes
Ai notesAi notes
Ai notes
 
A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...
A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...
A Paper Presentation On Artificial Intelligence And Global Risk Paper Present...
 
presentation on Artificial intelligence by prince kumar kushwaha from rustamj...
presentation on Artificial intelligence by prince kumar kushwaha from rustamj...presentation on Artificial intelligence by prince kumar kushwaha from rustamj...
presentation on Artificial intelligence by prince kumar kushwaha from rustamj...
 
Introduction to AI
Introduction to AIIntroduction to AI
Introduction to AI
 
Artificial Intelligence Short Question and Answer
Artificial Intelligence Short Question and AnswerArtificial Intelligence Short Question and Answer
Artificial Intelligence Short Question and Answer
 
best presentation Artitficial Intelligence
best presentation Artitficial Intelligencebest presentation Artitficial Intelligence
best presentation Artitficial Intelligence
 
Ai introduction
Ai introductionAi introduction
Ai introduction
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Artificial intelligence : what it is
Artificial intelligence : what it isArtificial intelligence : what it is
Artificial intelligence : what it is
 
Introduction to Artificial Intelligence - Cybernetics Robo Academy
Introduction to Artificial Intelligence - Cybernetics Robo AcademyIntroduction to Artificial Intelligence - Cybernetics Robo Academy
Introduction to Artificial Intelligence - Cybernetics Robo Academy
 
Timo Honkela: An Introduction to Artificial Intelligence
Timo Honkela: An Introduction to Artificial IntelligenceTimo Honkela: An Introduction to Artificial Intelligence
Timo Honkela: An Introduction to Artificial Intelligence
 
Introduction to Artificial Intelligence and few examples
Introduction to Artificial Intelligence and few examplesIntroduction to Artificial Intelligence and few examples
Introduction to Artificial Intelligence and few examples
 

Similar to Whole basic

AI Washington
AI Washington AI Washington
AI Washington OmGujar4
 
Artificial intelligence .pptx
Artificial intelligence  .pptxArtificial intelligence  .pptx
Artificial intelligence .pptxroshanrathod50
 
A Seminar Report on Artificial Intelligence
A Seminar Report on Artificial IntelligenceA Seminar Report on Artificial Intelligence
A Seminar Report on Artificial IntelligenceAvinash Kumar
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceBise Mond
 
Artificial intelligence-full -report.doc
Artificial intelligence-full -report.docArtificial intelligence-full -report.doc
Artificial intelligence-full -report.docdaksh Talsaniya
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceBise Mond
 
Presentation on Artificial Intelligence
Presentation on Artificial IntelligencePresentation on Artificial Intelligence
Presentation on Artificial IntelligenceIshwar Bulbule
 
introduction to Artificial Intelligence for computer science
introduction to Artificial Intelligence for computer scienceintroduction to Artificial Intelligence for computer science
introduction to Artificial Intelligence for computer scienceDawitTesfa4
 
The IOT Academy Training for Artificial Intelligence ( AI)
The IOT Academy Training for Artificial Intelligence ( AI)The IOT Academy Training for Artificial Intelligence ( AI)
The IOT Academy Training for Artificial Intelligence ( AI)The IOT Academy
 
AiArtificial Itelligence
AiArtificial ItelligenceAiArtificial Itelligence
AiArtificial ItelligenceAlisha Korpal
 
Rise of Artificial Intelligence (AI)
Rise of Artificial Intelligence (AI)Rise of Artificial Intelligence (AI)
Rise of Artificial Intelligence (AI)Harris Mubeen
 
Advanced Artificial Intelligence
Advanced Artificial IntelligenceAdvanced Artificial Intelligence
Advanced Artificial IntelligenceAshik Iqbal
 
Artificial Intelligence Report
Artificial Intelligence Report Artificial Intelligence Report
Artificial Intelligence Report Shubham Verma
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceAbbas Hashmi
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligencesaloni sharma
 

Similar to Whole basic (20)

AI Washington
AI Washington AI Washington
AI Washington
 
Artificial intelligence .pptx
Artificial intelligence  .pptxArtificial intelligence  .pptx
Artificial intelligence .pptx
 
Teja
TejaTeja
Teja
 
Artificial Intelligence and Humans
Artificial Intelligence and HumansArtificial Intelligence and Humans
Artificial Intelligence and Humans
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
A Seminar Report on Artificial Intelligence
A Seminar Report on Artificial IntelligenceA Seminar Report on Artificial Intelligence
A Seminar Report on Artificial Intelligence
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Artificial intelligence-full -report.doc
Artificial intelligence-full -report.docArtificial intelligence-full -report.doc
Artificial intelligence-full -report.doc
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Presentation on Artificial Intelligence
Presentation on Artificial IntelligencePresentation on Artificial Intelligence
Presentation on Artificial Intelligence
 
Binder4
Binder4Binder4
Binder4
 
Artificial intelligence research
Artificial intelligence researchArtificial intelligence research
Artificial intelligence research
 
introduction to Artificial Intelligence for computer science
introduction to Artificial Intelligence for computer scienceintroduction to Artificial Intelligence for computer science
introduction to Artificial Intelligence for computer science
 
The IOT Academy Training for Artificial Intelligence ( AI)
The IOT Academy Training for Artificial Intelligence ( AI)The IOT Academy Training for Artificial Intelligence ( AI)
The IOT Academy Training for Artificial Intelligence ( AI)
 
AiArtificial Itelligence
AiArtificial ItelligenceAiArtificial Itelligence
AiArtificial Itelligence
 
Rise of Artificial Intelligence (AI)
Rise of Artificial Intelligence (AI)Rise of Artificial Intelligence (AI)
Rise of Artificial Intelligence (AI)
 
Advanced Artificial Intelligence
Advanced Artificial IntelligenceAdvanced Artificial Intelligence
Advanced Artificial Intelligence
 
Artificial Intelligence Report
Artificial Intelligence Report Artificial Intelligence Report
Artificial Intelligence Report
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 

Whole basic

  • 1. Artificial Intelligence Amit purohit Evidence of Artificial Intelligence folklore can be traced back to ancient Egypt, but with the development of the electronic computer in 1941, the technology finally became available to create machine intelligence. The term artificial intelligence was first coined in 1956, at the Dartmouth conference, and since then Artificial Intelligence has expanded because of the theories and principles developed by its dedicated researchers. Through its short modern history, advancement in the fields of AI have been slower than first estimated, progress continues to be made. From its birth 4 decades ago, there have been a variety of AI programs, and they have impacted other technological advancements. Definition AI is the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable. Intelligence is the computational part of the ability to achieve goals in the world. Varying kinds and degrees of intelligence occur in people, many animals and some machines. Objectives 1).To formally define AI. 2).To discuss the character features of AI. 3).To get the student acquainted with the essence of AI. 4).To be able to distinguish betwee the human intelligence and AI. 5).To give an overview of the applications where the AI technology can be used. 6).To import the knowledge about the representation schemes like Production System, Problem Reduction. Turing Test Alan Turing's 1950 article Computing Machinery and Intelligence [Tur50] discussed conditions for considering a machine to be intelligent. He argued that if the machine could successfully pretend to be human to a knowledgeable observer then you certainly should consider it intelligent. This test would satisfy most people but not all philosophers. The observer could interact with the machine and a human by teletype (to avoid requiring that the machine imitate the appearance or voice of the person), and the human would try to persuade the observer that it was human and the machine would try to fool the observer.
  • 2. The Turing test is a one-sided test. A machine that passes the test should certainly be considered intelligent, but a machine could still be considered intelligent without knowing enough about humans to imitate a human. Daniel Dennett's book Brainchildren [Den98] has an excellent discussion of the Turing test and the various partial Turing tests that have been implemented, i.e. with restrictions on the observer's knowledge of AI and the subject matter of questioning. It turns out that some people are easily led into believing that a rather dumb program is intelligent. Background and History Evidence of Artificial Intelligence folklore can be traced back to ancient Egypt, but with the development of the electronic computer in 1941, the technology finally became available to create machine intelligence. The term artificial intelligence was first coined in 1956, at the Dartmouth conference, and since then Artificial Intelligence has expanded because of the theories and principles developed by its dedicated researchers. Through its short modern history, advancement in the fields of AI have been slower than first estimated, progress continues to be made. From its birth 4 decades ago, there have been a variety of AI programs, and they have impacted other technological advancements. In 1941 an invention revolutionized every aspect of the storage and processing of information. That invention, developed in both the US and Germany was the electronic computer. The first computers required large, separate air-conditioned rooms, and were a programmers nightmare, involving the separate configuration of thousands of wires to even get a program running. The 1949 innovation, the stored program computer, made the job of entering a program easier, and advancements in computer theory lead to computer science, and eventually Artificial intelligence. With the invention of an electronic means of processing data, came a medium that made AI possible. Although the computer provided the technology necessary for AI, it was not until the early 1950's that the link between human intelligence and machines was really observed. Norbert Wiener was one of the first Americans to make observations on the principle of feedback theory feedback theory. The most familiar example of feedback theory is the thermostat: It controls the temperature of an environment by gathering the actual temperature of the house, comparing it to the desired temperature, and responding by turning the heat up or down. What was so important about his research into feedback loops was that Wiener theorized that all intelligent behavior was the result of feedback mechanisms. Mechanisms that could possibly be simulated by machines. This discovery influenced much of early development of AI. In late 1955, Newell and Simon developed The Logic Theorist, considered by many to be the first AI program. The program, representing each problem as a tree model, would attempt to solve it by selecting the branch that would most likely result in the correct conclusion. The impact that the logic theorist made on both the public and the field of AI has made it a crucial stepping stone in developing the AI field.
  • 3. In 1956 John McCarthy regarded as the father of AI, organized a conference to draw the talent and expertise of others interested in machine intelligence for a month of brainstorming. He invited them to Vermont for "The Dartmouth summer research project on artificial intelligence." From that point on, because of McCarthy, the field would be known as Artificial intelligence. Although not a huge success, (explain) the Dartmouth conference did bring together the founders in AI, and served to lay the groundwork for the future of AI research. In the seven years after the conference, AI began to pick up momentum. Although the field was still undefined, ideas formed at the conference were re-examined, and built upon. Centers for AI research began forming at Carnegie Mellon and MIT, and a new challenges were faced: further research was placed upon creating systems that could efficiently solve problems, by limiting the search, such as the Logic Theorist. And second, making systems that could learn by themselves. In 1957, the first version of a new program The General Problem Solver(GPS) was tested. The program developed by the same pair which developed the Logic Theorist. The GPS was an extension of Wiener's feedback principle, and was capable of solving a greater extent of common sense problems. A couple of years after the GPS, IBM contracted a team to research artificial intelligence. Herbert Gelerneter spent 3 years working on a program for solving geometry theorems. While more programs were being produced, McCarthy was busy developing a major breakthrough in AI history. In 1958 McCarthy announced his new development; the LISP language, which is still used today. LISP stands for LISt Processing, and was soon adopted as the language of choice among most AI developers. During the 1970's Many new methods in the development of AI were tested, notably Minsky's frames theory. Also David Marr proposed new theories about machine vision, for example, how it would be possible to distinguish an image based on the shading of an image, basic information on shapes, color, edges, and texture. With analysis of this information, frames of what an image might be could then be referenced. another development during this time was the PROLOGUE language. The language was proposed for In 1972 During the 1980's AI was moving at a faster pace, and further into the corporate sector. In 1986, US sales of AI-related hardware and software surged to $425 million. Expert systems in particular demand because of their efficiency. Companies such as Digital Electronics were using XCON, an expert system designed to program the large VAX computers. DuPont, General Motors, and Boeing relied heavily on expert systems Indeed to keep up with the demand for the computer experts, companies such as Teknowledge and Intellicorp specializing in creating software to aid in producing expert systems formed. Other expert systems were designed to find and correct flaws in existing expert systems. Overview of AI Application Areas Game Playing
  • 4. You can buy machines that can play master level chess for a few hundred dollars. There is some AI in them, but they play well against people mainly through brute force computation--looking at hundreds of thousands of positions. To beat a world champion by brute force and known reliable heuristics requires being able to look at 200 million positions per second. Speech Recognition In the 1990s, computer speech recognition reached a practical level for limited purposes. Thus United Airlines has replaced its keyboard tree for flight information by a system using speech recognition of flight numbers and city names. It is quite convenient. On the the other hand, while it is possible to instruct some computers using speech, most users have gone back to the keyboard and the mouse as still more convenient. Understanding Natural Language Just getting a sequence of words into a computer is not enough. Parsing sentences is not enough either. The computer has to be provided with an understanding of the domain the text is about, and this is presently possible only for very limited domains. Computer Vision The world is composed of three-dimensional objects, but the inputs to the human eye and computers' TV cameras are two dimensional. Some useful programs can work solely in two dimensions, but full computer vision requires partial three-dimensional information that is not just a set of two-dimensional views. At present there are only limited ways of representing three- dimensional information directly, and they are not as good as what humans evidently use. Expert Systems A "knowledge engineer" interviews experts in a certain domain and tries to embody their knowledge in a computer program for carrying out some task. How well this works depends on whether the intellectual mechanisms required for the task are within the present state of AI. When this turned out not to be so, there were many disappointing results. One of the first expert systems was MYCIN in 1974, which diagnosed bacterial infections of the blood and suggested treatments. It did better than medical students or practicing doctors, provided its limitations were observed. Namely, its ontology included bacteria, symptoms, and treatments and did not include patients, doctors, hospitals, death, recovery, and events occurring in time. Its interactions depended on a single patient being considered. Since the experts consulted by the knowledge engineers knew about patients, doctors, death, recovery, etc., it is clear that the knowledge engineers forced what the experts told them into a predetermined framework. In the present state of AI, this has to be true. The usefulness of current expert systems depends on their users having common sense. Heuristic Classification
  • 5. One of the most feasible kinds of expert system given the present knowledge of AI is to put some information in one of a fixed set of categories using several sources of information. An example is advising whether to accept a proposed credit card purchase. Information is available about the owner of the credit card, his record of payment and also about the item he is buying and about the establishment from which he is buying it (e.g., about whether there have been previous credit card frauds at this establishment). Production System Production systems are applied to problem solving programs that must perform a wide-range of seaches. Production ssytems are symbolic AI systems. The difference between these two terms is only one of semantics. A symbolic AI system may not be restricted to the very definition of production systems, but they can't be much different either. Production systems are composed of three parts, a global database, production rules and a control structure. The global database is the system's short-term memory. These are collections of facts that are to be analyzed. A part of the global database represents the current state of the system's environment. In a game of chess, the current state could represent all the positions of the pieces for example. Production rules (or simply productions) are conditional if-then branches. In a production system whenever a or condition in the system is satisfied, the system is allowed to execute or perform a specific action which may be specified under that rule. If the rule is not fufilled, it may perform another action. This can be simply paraphrased: WHEN (condition) IS SATISFIED, PERFORM (action) A Production System Algorithm DATA (binded with initial global data base) when DATA satisfies the halting condition do begin select some rule R that can be applied to DATA return DATA (binded with the result of when R was applied to DATA) end Types of Production System There are two basic types of production System: • Commutative Production System • Decomposable Production System Commutative Production System
  • 6. A production system is commutative if it has the following properties with respect to a database D: 1. Each member of the set of rules applicable to D is also applicable to any database produced by applying an applicable rule to D. 2. If the goal condition is satisfied by D, then it is also satisfied by any database produced by applying any applicable rule to D. 3. The database that results by applying to D any sequence composed of rules that are applicable to D is invariant under permutations of the sequence. Decomposable Production System Initial database can be decomposed or split into separate components that can be processed independently. Search Process Searching is defined as a sequence of steps that transforms the initial state to the goal state. To do a search process, the following are needed: • The initial state description of the problem • A set of legal operators that changes the state. • The final or goal state. The searching process in AI can be classified into two types: 1. Uniformed Search/ Blind Search 2. Heuristic Search/ Informed Search Uniformed/ Blind Search A uniformed search algorithm is one that do not have any domain specific knowledge. They use information like initial state, final state and a set of logical operators. this search shoul proceed in a systemic way by exploring nodes in some predetermined orders. It can be classified in to two search technologies: 1. Breadth First search 2. Depth First Search
  • 7. Depth First Search ! Depth first search works by taking a node, checking its neighbors, expanding the first node it finds among the neighbors, checking if that expanded node is our destination, and if not, continue exploring more nodes. The above explanation is probably confusing if this is your first exposure to depth first search. I hope the following demonstration will help more. Using our same search tree, let's find a path between nodes A and F: Step 0 Let's start with our root/goal node: We will be using two lists to keep track of what we are doing - an Open list and a Closed List. An Open list keeps track of what you need to do, and the Closed List keeps track of what you have already done. Right now, we only have our starting point, node A. We haven't done anything to it yet, so let's add it to our Open list. Open List: A Closed List: <empty> Step 1
  • 8. Now, let's explore the neighbors of our A node. To put another way, let's take the first item from our Open list and explore its neighbors: Node A's neighbors are the B and C nodes. Because we are now done with our A node, we can remove it from our Open list and add it to our Closed List. You aren't done with this step though. You now have two new nodes B and C that need exploring. Add those two nodes to our Open list. Our current Open and Closed Lists contain the following data: Open List: B, C Closed List: A Step 2 Our Open list contains two items. For depth first search and breadth first search, you always explore explore the first item from our Open list. The first item in our Open list is the B node. B is not our destination, so let's explore its neighbors: Because I have now expanded B, I am going to remove it from the Open list and add it to the Closed List. Our new nodes are D and E, and we add these nodes to the beginning of our Open list: Open List: D, E, C Closed List: A, B Step 3 You should start to see a pattern forming. Because D is at the beginning of our Open List, we expand it. D isn't our destination, and it does not contain any neighbors. All you do in this step is remove D from our Open List and add it to our Closed List:
  • 9. Open List: E, C Closed List: A, B, D Step 4 We now expand the E node from our Open list. E is not our destination, so we explore its neighbors and find out that it contains the neighbors F and G. Remember, F is our target, but we don't stop here though. Despite F being on our path, we only end when we are about to expand our target Node - F in this case: Our Open list will have the E node removed and the F and G nodes added. The removed E node will be added to our Closed List: Open List: F, G, C Closed List: A, B, D, E Step 5 We now expand the F node. Since it is our intended destination, we stop: We remove F from our Open list and add it to our Closed List. Since we are at our destination, there is no need to expand F in order to find its neighbors. Our final Open and Closed Lists contain the following data:
  • 10. Open List: G, C Closed List: A, B, D, E, F The final path taken by our depth first search method is what the final value of our Closed List is: A, B, D, E, F. Breadth First Search In depth first search, newly explored nodes were added to the beginning of your Open list. In breadth first search, newly explored nodes are added to the end of your Open list. For example, here is our original search tree: The above explanation is probably confusing if this is your first exposure to depth first search. I hope the following demonstration will help more. Using our same search tree, let's find a path between nodes A and F: Step 0 Let's start with our root/goal node: We will be using two lists to keep track of what we are doing - an Open list and a Closed List. An Open list keeps track of what you need to do, and the Closed List keeps track of what you have already done. Right now, we only have our starting point, node A. We haven't done anything to it yet, so let's add it to our Open list.
  • 11. Open List: A Closed List: <empty> Step 1 Now, let's explore the neighbors of our A node. To put another way, let's take the first item from our Open list and explore its neighbors: Node A's neighbors are the B and C nodes. Because we are now done with our A node, we can remove it from our Open list and add it to our Closed List. You aren't done with this step though. You now have two new nodes B and C that need exploring. Add those two nodes to our Open list. Our current Open and Closed Lists contain the following data: Open List: B, C Closed List: A Step 2 Our Open list contains two items. For depth first search and breadth first search, you always explore explore the first item from our Open list. The first item in our Open list is the B node. B is not our destination, so let's explore its neighbors: Because I have now expanded B, I am going to remove it from the Open list and add it to the Closed List. Our new nodes are D and E, and we add these nodes to the beginning of our Open list: Open List: D, E, C Closed List: A, B
  • 12. Step 3 You should start to see a pattern forming. Because D is at the beginning of our Open List, we expand it. D isn't our destination, and it does not contain any neighbors. All you do in this step is remove D from our Open List and add it to our Closed List: Open List: E, C Closed List: A, B, D Step 4 We now expand the E node from our Open list. E is not our destination, so we explore its neighbors and find out that it contains the neighbors F and G. Remember, F is our target, but we don't stop here though. Despite F being on our path, we only end when we are about to expand our target Node - F in this case: Our Open list will have the E node removed and the F and G nodes added. The removed E node will be added to our Closed List: Open List: F, G, C Closed List: A, B, D, E Step 5 We now expand the F node. Since it is our intended destination, we stop:
  • 13. We remove F from our Open list and add it to our Closed List. Since we are at our destination, there is no need to expand F in order to find its neighbors. Our final Open and Closed Lists contain the following data: Open List: G, C Closed List: A, B, D, E, F The final path taken by our depth first search method is what the final value of our Closed List is: A, B, D, E, F. iterative Deepening Depth-First Search ! Iterative deepening depth-first search (IDDFS) is a state space search strategy in which a depth- limited search is run repeatedly, increasing the depth limit with each iteration until it reaches d, the depth of the shallowest goal state. On each iteration, IDDFS visits the nodes in the search tree in the same order as depth-first search, but the cumulative order in which nodes are first visited, assuming no pruning, is effectively breadth-first. IDDFS combines depth-first search's space-efficiency and breadth-first search's completeness (when the branching factor is finite). It is optimal when the path cost is a non-decreasing function of the depth of the node. The space complexity of IDDFS is O(bd), where b is the branching factor and d is the depth of shallowest goal. Since iterative deepening visits states multiple times, it may seem wasteful, but it turns out to be not so costly, since in a tree most of the nodes are in the bottom level, so it does not matter much if the upper levels are visited multiple times. The main advantage of IDDFS in game tree searching is that the earlier searches tend to improve the commonly used heuristics, such as the killer heuristic and alpha-beta pruning, so that a more accurate estimate of the score of various nodes at the final depth search can occur, and the search completes more quickly since it is done in a better order. For example, alpha-beta pruning is most efficient if it searches the best moves first. A second advantage is the responsiveness of the algorithm. Because early iterations use small values for d, they execute extremely quickly. This allows the algorithm to supply early indications of the result almost immediately, followed by refinements as d increases. When used
  • 14. in an interactive setting, such as in a chess-playing program, this facility allows the program to play at any time with the current best move found in the search it has completed so far. This is not possible with a traditional depth-first search. The time complexity of IDDFS in well-balanced trees works out to be the same as Depth-first search: O(bd). In an iterative deepening search, the nodes on the bottom level are expanded once, those on the next to bottom level are expanded twice, and so on, up to the root of the search tree, which is expanded d + 1 times.[1] So the total number of expansions in an iterative deepening search is All together, an iterative deepening search from depth 1 to depth d expands only about 11% more nodes than a single breadth-first or depth-limited search to depth d, when b = 10. The higher the branching factor, the lower the overhead of repeatedly expanded states, but even when the branching factor is 2, iterative deepening search only takes about twice as long as a complete breadth-first search. This means that the time complexity of iterative deepening is still O(bd), and the space complexity is O(bd). In general, iterative deepening is the preferred search method when there is a large search space and the depth of the solution is not known. Informed Search It is not difficult to see that uninformed search will pursue options that lead away from the goal as easily as it pursues options that lead to wards the goal. For any but the smallest problems this leads to searches that take unacceptable amounts of time and/or space. Informed search tries to reduce the amount of search that must be done by making intelligent choices for the nodes that are selected for expansion. This implies the existence of some way of evaluating the likelyhood that a given node is on the solution path. In general this is done using a heuristic function. Hill Climbing Hill climbing is a mathematical optimization technique which belongs to the family of local search. It is relatively simple to implement, making it a popular first choice. Although more advanced algorithms may give better results, in some situations hill climbing works just as well. Hill climbing can be used to solve problems that have many solutions, some of which are better than others. It starts with a random (potentially poor) solution, and iteratively makes small
  • 15. changes to the solution, each time improving it a little. When the algorithm cannot see any improvement anymore, it terminates. Ideally, at that point the current solution is close to optimal, but it is not guaranteed that hill climbing will ever come close to the optimal solution. For example, hill climbing can be applied to the traveling salesman problem. It is easy to find a solution that visits all the cities but will be very poor compared to the optimal solution. The algorithm starts with such a solution and makes small improvements to it, such as switching the order in which two cities are visited. Eventually, a much better route is obtained. Hill climbing is used widely in artificial intelligence, for reaching a goal state from a starting node. Choice of next node and starting node can be varied to give a list of related algorithms. Mathematical description Hill climbing attempts to maximize (or minimize) a function f(x), where x are discrete states. These states are typically represented by vertices in a graph, where edges in the graph encode nearness or similarity of a graph. Hill climbing will follow the graph from vertex to vertex, always locally increasing (or decreasing) the value of f, until a local maximum (or local minimum) xm is reached. Hill climbing can also operate on a continuous space: in that case, the algorithm is called gradient ascent (or gradient descent if the function is minimized).*. Variants In simple hill climbing, the first closer node is chosen, whereas in steepest ascent hill climbing all successors are compared and the closest to the solution is chosen. Both forms fail if there is no closer node, which may happen if there are local maxima in the search space which are not solutions. Steepest ascent hill climbing is similar to best-first search, which tries all possible extensions of the current path instead of only one. Stochastic hill climbing does not examine all neighbors before deciding how to move. Rather, it selects a neighbour at random, and decides (based on the amount of improvement in that neighbour) whether to move to that neighbour or to examine another.
  • 16. Random-restart hill climbing is a meta-algorithm built on top of the hill climbing algorithm. It is also known as Shotgun hill climbing. It iteratively does hill-climbing, each time with a random initial condition x0. The best xm is kept: if a new run of hill climbing produces a better xm than the stored state, it replaces the stored state. Random-restart hill climbing is a surprisingly effective algorithm in many cases. It turns out that it is often better to spend CPU time exploring the space, than carefully optimizing from an initial condition. Local Maxima A problem with hill climbing is that it will find only local maxima. Unless the heuristic is convex, it may not reach a global maximum. Other local search algorithms try to overcome this problem such as stochastic hill climbing, random walks and simulated annealing. Ridges A ridge is a curve in the search place that leads to a maximum, but the orientation of the ridge compared to the available moves that are used to climb is such that each move will lead to a smaller point. In other words, each point on a ridge looks to the algorithm like a local maximum, even though the point is part of a curve leading to a better optimum. Plateau Another problem with hill climbing is that of a plateau, which occurs when we get to a "flat" part of the search space, i.e. we have a path where the heuristics are all very close together. This kind of flatness can cause the algorithm to cease progress and wander aimlessly. Pseudocode Hill Climbing Algorithm currentNode = startNode; loop do L = NEIGHBORS(currentNode);
  • 17. nextEval = -INF; nextNode = NULL; for all x in L if (EVAL(x) > nextEval) nextNode = x; nextEval = EVAL(x); if nextEval <= EVAL(currentNode) //Return current node since no better neighbors exist return currentNode; currentNode = nextNode; Best-First Search Best-first search is a search algorithm which explores a graph by expanding the most promising node chosen according to a specified rule. Judea Pearl described best-first search as estimating the promise of node n by a "heuristic evaluation function f(n) which, in general, may depend on the description of n, the description of the goal, the information gathered by the search up to that point, and most important, on any extra knowledge about the problem domain." Some authors have used "best-first search" to refer specifically to a search with a heuristic that attempts to predict how close the end of a path is to a solution, so that paths which are judged to be closer to a solution are extended first. This specific type of search is called greedy best-first search. Efficient selection of the current best candidate for extension is typically implemented using a priority queue. Examples of best-first search algorithms include the A* search algorithm, and in turn, Dijkstra's algorithm (which can be considered a specialization of A*). Best-first algorithms are often used for path finding in combinatorial search. Code open = initial state while open != null do 1. Pick the best node on open. 2. Create open's successors 3. For each successor do: a. If it has not been generated before: evaluate it, add it to OPEN, and record its parent b. Otherwise: change the parent if this new path is better than previous one. done
  • 18. Syntax of Propositional Logic Logic is used to represent properties of objects in the world about which we are going to reason. When we say Miss Piggy is plump we are talking about the object Miss Piggy and a property plump. Similarly when we say Kermit's voice is high-pitched then the object is Kermit's voice and the property is high-pitched. It is normal to write these in logic as: plump(misspiggy) highpitched(voiceof(kermit)) So misspiggy and kermit are constants representing objects in our domain. Notice that plump and highpitched is different from voiceof: plump and highpitched are represent properties and so are boolean valued functions. They are often called predicates or relations. voiceof is a function that returns an object (not true/false). To help us differentiate we shall use ``of'' at the end of a function name. The predicates plump and highpitched are unary predicates but of course we can have binary or n-ary predicates; e.g. loves(misspiggy, voiceof(kermit)) Simple Sentences The fundamental components of logic are • object constants; e.g. misspiggy, kermit • function constants; e.g. voiceof • predicate constants; e.g. plump, highpitched, loved Predicate and function constants take arguments which are objects in our domain. Predicate constants are used to describe relationships concerning the objects and return the value true/false. Function constants return values that are objects. More Complex Sentences We need to apply operators to construct more complex sentences from atoms. Negation applied to an atom negates the atom:
  • 19. loves(kermit, voiceof(misspiggy)) 'Kermit does not love Miss Piggy's voice'' Conjunction combines two conjuncts: loves(misspiggy, kermit) loves(misspiggy, voiceof(kermit)) ''Miss Piggy loves Kermit and Miss Piggy loves Kernit's voice'' Notice it is not correct syntax to write in logic loves(misspiggy, kermit) voiceof(kermit) because we have tried to conjoin a sentence (truth valued) with an object. Logic operators must apply to truth-valued sentences. Disjunction combines two disjuncts: loves(misspiggy, kermit) loves(misspiggy, voiceof(kermit)) ''Miss Piggy loves Kermit or Miss Piggy loves Kermit's voice'' Implication combines a condition and conclusion loves(misspiggy, voiceof(kermit)) loves(misspiggy, kermit) ''If Miss Piggy loves Kermit's voice then Miss Piggy loves Kermit'' The language we have described so far contains atoms and the connectives , , and . This defines the syntax of propositional Logic. It is normal to represent atoms in propositional logic as single upper-case letters but here we have used a more meaningful terminology for the atoms that extends easily to Predicate Logic. Semantics of Propositional Logic We have defined the syntax of propositional Logic. However, this is of no use without talking about the meaning, or semantics, of the sentences. Suppose our logic contained only atoms; e.g. no logical connectives. This logic is very silly because any subset of these atoms is consistent; e.g. beautiful(misspiggy) and ugly(misspiggy) are consistent because we cannot represent
  • 20. ugly(misspiggy) beautiful(misspiggy) So we now need a way in our logic to define which sentences are true. Example: Models Define Truth Suppose a language contains only one object constant misspiggy and two relation constants ugly and beautiful. The following models define different facts about Miss Piggy. M=ø: In this model Miss Piggy is neither ugly nor beautiful. M={ugly(misspiggy)}: In this model Miss Piggy is ugly and not beautiful. M={beautiful(misspiggy)}: In this model Miss Piggy is beautiful and not ugly. M={ugly(misspiggy), beautiful(misspiggy)}: In this model Miss Piggy is both ugly and beautiful. The last statement is intuitively wrong but the model selected commits the truth of the atoms in the language. Compound Sentences So far we have restricted our attention to the semantics of atoms: an atom is true if it is a member of the model M; otherwise it is false. Extending the semantics to compound sentences is easy. Notice that in the definitions below p and q do not need to be atoms because these definitions work recursively until atoms are reached. Conjunction p q is true in M iff p and q are true in M individually. So the conjunct loves(misspiggy, kermit) loves(misspiggy, voiceof(kermit)) is true only when both Miss Piggy loves Kermit; and Miss Piggy loves Kermit's voice Disjunction p q is true in M iff at least one of p or q is true in M. So the disjunct loves(misspiggy, kermit) loves(misspiggy, voiceof(kermit)) is true whenever
  • 21. Miss Piggy loves Kermit; Miss Piggy loves Kermit's voice; or Miss Piggy loves both Kermit and his voice. Therefore the disjunction is weaker than either disjunct and the conjunction of these disjuncts. Negation p is true in M iff p is not true in M. Implication p q is true in M iff p is not true in M or q is true in M. We have been careful about the definition of . When people use an implication p q they normally imply that p causes q. So if p is true we are happy to say that p q is true iff q is true. But if p is false the causal link causes confusion because we can't tell whether q should be true or not. Logic requires that the connectives are truth functional and so the truth of the compound sentence must be determined from the truth of its component parts. Logic defines that if p is false then p q is true regardless of the truth of q. So both of the following implications are true (provided you believe pigs do not fly!): fly(pigs) beautiful(misspiggy) fly(pigs) beautiful(misspiggy) Example: Implications and Models In which of the following models is ugly(misspiggy) beautiful(misspiggy) true? M=Ø Miss Piggy is not ugly and so the antecedent fails. Therefore the implication holds. (Miss Piggy is also not beautiful in this model.) M={beautiful(misspiggy)} Again, Miss Piggy is not ugly and so the implication holds. M={ugly(misspiggy)} Miss Piggy is not beautiful and so the conclusion is valid and hence the implication holds. M={ugly(misspiggy), beautiful(misspiggy)}
  • 22. Miss Piggy is ugly and so the antecedent holds. But she is also beautiful and so beautiful(misspiggy) is not true. Therefore the conclusion does not hold and so the implication fails in this (and only this) case. Truth Tables Truth tables are often used to calculate the truth of complex propositional sentences. A truth table represents all possible combinations of truths of the atoms and so contains all possible models. A column is created for each of the atoms in the sentence, and all combinations of truth values for these atoms are assigned one per row. So if there are $n$ atoms then there are $n$ initial columns and $2^n$ rows. The final column contains the truth of the sentence for each combination of truths for the atoms. Intervening columns can be added to store intermediate truth calculations. Below are two sample truth tables: Equivalence Two sentences are equivalence if they hold in exactly the same models. Therefore we can determine equivalence by drawing truth tables that represent the sentences in the various models. If the initial and final columns of the truth tables are identical then the sentences are equivalent. Examples of equivalences include:
  • 23. Unlike and , is not commutative: loves(misspiggy, voiceof(kermit)) loves(misspiggy, kermit) is very different from loves(misspiggy, kermit) loves(misspiggy, voiceof(kermit)) Similarly is not associative. ) Syntax & Semantics for Predicate Logic Syntax of Predicate Logic Propositional logic is fairly powerful but we must add variables and quantification to be able to reason about objects in atoms and express properties of a set of objects without listing the atom corresponding to each object. We shall adopt the Prolog convention that variables have an initial capital letter. (This is contrary to many Mathematical Logic books where variables are lower case and constants have an initial capital.) When we include variables we must specify their scope or quantification. The first quantifier we want is the universal quantifier (for all). X.loves(misspiggy, X) This allows X to range over all the objects and asserts that Miss Piggy loves each of them. We have introduced one variable but any number is allowed: XY.loves(X, Y) Each of the objects love all of the objects, even itself! Therefore XY. is the same as X. Y. Quantifiers, like connectives, act on sentences. So if Miss Piggy loves all cute things (not just Kermit!) we would write C.[cute(C) -> loves(misspiggy, C)] rather than loves(misspiggy, C.cute(C)) because the second argument to loves must be an object, not a sentence.
  • 24. When the world contains a finite set of objects then a universally quantified sentence can be converted into a sentence without the quantifier; e.g. X.loves(misspiggy, X) becomes loves(misspiggy, misspiggy) loves(misspiggy, kermit) loves(misspiggy, animal) ... Contrast this with the infinite set of positive integers and the sentence N.[odd(N) $vee$ even(N)] The other quantifier is the existential quantifier (there exists). X.loves(misspiggy, X) This allows X to range over all the objects and asserts that Miss Piggy loves (at least) one of them. Similarly XY.loves(X, Y) asserts that there is at least one loving couple (or self-loving object). We shall be using First Order Predicate Logic where quantified variables range over object constants only. We are defining Second Order Predicate Logic if we allow quantified variables to range over functions or predicates as well; e.g. X.loves(misspiggy, X(kermit)) includes loves(misspiggy, voiceof(kermit)) X.X(misspiggy, kermit) (there exists some relationship linking Miss Piggy and Kermit!) Semantics of First Order Predicate Logic Now we must deal with quantification. : X.p(X) holds in a model iff $p(z)$ holds for all objects $z$ in our domain. : X.p(X) holds in a model iff there is some object z in our domain so that p(z) holds. Example: Available Objects affects Quantification If misspiggy is the only object in our domain then ugly(misspiggy) beautiful(misspiggy) is equivalent to
  • 25. X.ugly(X) beautiful(X) If there were other objects then there would be more atoms and so the set of models would be larger; e.g. with objects misspiggy and kermit the possible models are all combinations of the atoms ugly(misspiggy), beautiful(misspiggy) ugly(kermit), beautiful(kermit). Now the 2 sentences are no longer equivalent. 1). Although, every model in which X.ugly(X) beautiful(X) holds, ugly(misspiggy) beautiful(misspiggy) also holds 2).There are models in which ugly(misspiggy) beautiful(misspiggy) holds, but X.ugly(X) beautiful(X) does not hold; e.g. M = {ugly(kermit), beautiful(kermit)}. What about M = {ugly(misspiggy)}, beautiful(misspiggy)? Clausal Form for Predicate Calculus ! In order to prove a formula in the predicate calculus by resolution,we 1.Negate the formula. 2.Put the negated formula into CNF, by doing the following: i.Get rid of all operators. ii.Push the operators in as far as possible. iii.Rename variables as necessary (see the step below). iv.Move all of the quantifiers to the left (the outside) of the expression using the following rules (where Q is either or and G is a formula that does not contain x):
  • 26. This leaves the formula in what is called prenex form which consists of a series of quantifiers followed by a quantifier-free formula, called the matrix. v.Remove all quantifiers from the formula. First we remove the existentially quantified variables by using Skolemization. Each existentially quantified variable, say x is replaced by a function term which begins with a new, n-ary function symbol, say f where n is the number of universally quantified variables that occur before x is quantified in the formula. The arguments to the function term are precisely these variables. For example, if we have the formula then z would be replaced by a function term f(x,y) where f is a new function symbol. The result is: This new formula is satisfiable if and only if the original formula is satisfiable. The new function symbol is called a Skolem function. If the existentially quantified variable has no preceding universally quantified variables, then the function is a 0-ary function and is often called a Skolem constant. After removing all existential quantifiers, we simply drop all the universal quantifiers as we assume that any variable appearing in a formula is universally quantified. vi.The remaining formula (the matrix) is put in CNF by moving any operators outside of any operations. 3.Finally, the CNF formula is written in clausal format by writing each conjunct as a set of literals (a clause), and the whole formula as a set clauses (the clause set). For example, if we begin with the proposition
  • 27. we have: 1.Negate the theorem: i.Push the operators in. No change. ii).Rename variables if necessary: iii)Move the quantifiers to the outside: First, we have Then we get iv)Remove the quantifiers, first by Skolemizing the existentially quantified variables. As these have no universally quantified variables to their left, they are replaced by Skolem constants: Drop the universal quantifiers: v)Put the matrix into CNF. No change. 2.Write the formula in clausal form: Inference Rules ! Complex deductive arguments can be judged valid or invalid based on whether or not the steps in that argument follow the nine basic rules of inference. These rules of inference are all relatively simple, although when presented in formal terms they can look overly complex. Conjunction: 1. P 2. Q 3. Therefore, P and Q. 1. It is raining in New York. 2. It is raining in Boston 3. Therefore, it is raining in both New York and Boston
  • 28. Simplification 1. P and Q. 2. Therefore, P. 1. It is raining in both New York and Boston. 2. Therefore, it is raining in New York. Addition 1. P 2. Therefore, P or Q. 1. It is raining 2. Therefore, either either it is raining or the sun is shining. Absorption 1. If P, then Q. 2. Therfore, If P then P and Q. 1. If it is raining, then I will get wet. 2. Therefore, if it is raining, then it is raining and I will get wet. Modus Ponens 1. If P then Q. 2. P. 3. Therefore, Q. 1. If it is raining, then I will get wet. 2. It is raining. 3. Therefore, I will get wet. Modus Tollens 1. If P then Q. 2. Not Q. (~Q). 3. Therefore, not P (~P). 1. If it had rained this morning, I would have gotten wet. 2. I did not get wet. 3. Therefore, it did not rain this morning. Hypothetical Syllogism
  • 29. 1. If P then Q. 2. If Q then R. 3. Therefore, if P then R. 1. If it rains, then I will get wet. 2. If I get wet, then my shirt will be ruined. 3. If it rains, then my shirt will be ruined. Disjunctive Syllogism 1. Either P or Q. 2. Not P (~P). 3. Therefore, Q. 1. Either it rained or I took a cab to the movies. 2. It did not rain. 3. Therefore, I took a cab to the movies. Constructive Dilemma 1. (If P then Q) and (If R then S). 2. P or R. 3. Therefore, Q or S. 1. If it rains, then I will get wet and if it is sunny, then I will be dry. 2. Either it will rain or it will be sunny. 3. Therefore, either I will get wet or I will be dry. The above rules of inference, when combined with the rules of replacement, mean that propositional calculus is "complete." Propositional calculus is simply another name for formal logic. Resolution ! Resolution is a rule of inference leading to a refutation theorem-proving technique for sentences in propositional logic and first-order logic. In other words, iteratively applying the resolution rule in a suitable way allows for telling whether a propositional formula is satisfiable and for proving that a first-order formula is unsatisfiable; this method may prove the satisfiability of a first-order satisfiable formula, but not always, as it is the case for all methods for first-order logic. Resolution was introduced by John Alan Robinson in 1965. Resolution in propositional logic The resolution rule in propositional logic is a single valid inference rule that produces a new clause implied by two clauses containing complementary literals. A literal is a propositional variable or the negation of a propositional variable. Two literals are said to be complements if
  • 30. one is the negation of the other (in the following, ai is taken to be the complement to bj). The resulting clause contains all the literals that do not have complements. Formally: where all as and bs are literals, ai is the complement to bj, and the dividing line stands for entails The clause produced by the resolution rule is called the resolvent of the two input clauses. When the two clauses contain more than one pair of complementary literals, the resolution rule can be applied (independently) for each such pair. However, only the pair of literals that are resolved upon can be removed: all other pair of literals remain in the resolvent clause. A resolution technique When coupled with a complete search algorithm, the resolution rule yields a sound and complete algorithm for deciding the satisfiability of a propositional formula, and, by extension, the validity of a sentence under a set of axioms. This resolution technique uses proof by contradiction and is based on the fact that any sentence in propositional logic can be transformed into an equivalent sentence in conjunctive normal form. The steps are as follows: 1).All sentences in the knowledge base and the negation of the sentence to be proved (the conjecture) are conjunctively connected. 2).The resulting sentence is transformed into a conjunctive normal form with the conjuncts viewed as elements in a set, S, of clauses. For example would give rise to a set 3).The resolution rule is applied to all possible pairs of clauses that contain complementary literals. After each application of the resolution rule, the resulting sentence is simplified by removing repeated literals. If the sentence contains complementary literals, it is discarded (as a
  • 31. tautology). If not, and if it is not yet present in the clause set S, it is added to S, and is considered for further resolution inferences. 4).If after applying a resolution rule the empty clause is derived, the complete formula is unsatisfiable (or contradictory), and hence it can be concluded that the initial conjecture follows from the axioms. 5).If, on the other hand, the empty clause cannot be derived, and the resolution rule cannot be applied to derive any more new clauses, the conjecture is not a theorem of the original knowledge base. One instance of this algorithm is the original Davis–Putnam algorithm that was later refined into the DPLL algorithm that removed the need for explicit representation of the resolvents. This description of the resolution technique uses a set S as the underlying data-structure to represent resolution derivations. Lists, Trees and Directed Acyclic Graphs are other possible and common alternatives. Tree representations are more faithful to the fact that the resolution rule is binary. Together with a sequent notation for clauses, a tree representation also makes it clear to see how the resolution rule is related to a special case of the cut-rule, restricted to atomic cut- formulas. However, tree representations are not as compact as set or list representations, because they explicitly show redundant subderivations of clauses that are used more than once in the derivation of the empty clause. Graph representations can be as compact in the number of clauses as list representations and they also store structural information regarding which clauses were resolved to derive each resolvent. Example In English: if a or b is true, and a is false or c is true, then either b or c is true. If a is true, then for the second premise to hold, c must be true. If a is false, then for the first premise to hold, b must be true. So regardless of a, if both premises hold, then b or c is true. Unification We also need some way of binding variables to values in a consistent way so that components of sentences can be matched. This is the process of Unification. Knowledge Representation
  • 32. Network Representations Networks are often used in artificial intelligence as schemes for representation. One of the advantages of using a network representation is that theorists in computer science have studied such structures in detail and there are a number of efficient and robust algorithms that may be used to manipulate the representations. Trees and Graphs A tree is a collection of nodes in which each node may be expanded into one or more unique subnodes until termination occurs. There may be no termination and an infinite tree results. A graph is simply a tree in which non-unique nodes are generated; in other words, a tree is a graph with no loops. The representation of the nodes and links is arbitrary. In a computer chess player, for example, nodes might represent individual board positions and the links from each node the legal moves from that position. This is a specific instance of a problem space. In general, problem spaces are graphs in which the nodes represent states and the connections between states represented by an operator that makes the state transformation. IS-A Links and Semantic Networks In constructing concept hierarchies, often the most important means of showing inclusion in a set is to use what is called an IS-A link, in which X is a member in some more general set Y. For example, a DOG ISA MAMMAL. As one travels up the link, the more general concept is defined. This is generally the simplest type of link between concepts in concept or semantic hierarchies. The combination of instances and classes connected by ISA links in a graph or tree is generally known as a semantic network. Semantic networks are useful, in part, because they provide a natural structure for inheritance. For instance, if a DOG ISA MAMMAL then those properties that are true for MAMMALs and DOGs need not be specified for the DOG; instead they may be derived via an inheritance procedure. This greatly reduces the amount of information that must be stored explicitly although there is an increase in the time required to access knowledge through the inheritance mechanism. Frames are a special type of semantic network representation. Associative Network A means of representing relational knowledge as a labeled directed graph. Each vertex of the graph represents a concept and each label represents a relation between concepts. Access and updating procedures traverse and manipulate the graph. A semantic network is sometimes regarded as a graphical notation for logical formulas. Conceptual Graphs ! A conceptual graph (CG) is a graph representation for logic based on the semantic networks of artificial intelligence.
  • 33. A conceptual graph consists of concept nodes and relation nodes. • The concept nodes represent entities, attributes, states, and events • The relation nodes show how the concepts are interconnected Conceptual Graphs are finite, connected, bipartite graphs. Finite: because any graph (in 'human brain' or 'computer storage') can only have a finite number of concepts and conceptual relations. Connected: because two parts that are not connected would simply be called two conceptual graphs. Bipartite: because there are two different kinds of nodes: concepts and conceptual relations, and every arc links a node of one kind to a node of another kind Example Following CG display form for John is going to Boston by bus. The conceptual graph in Figure represents a typed or sorted version of logic. Each of the four concepts has a type label, which represents the type of entity the concept refers to: Person, Go, Boston, or Bus. Two of the concepts have names, which identify the referent: John or Boston. Each of the three conceptual relations has a type label that represents the type of relation: agent (Agnt), destination (Dest), or instrument (Inst). The CG as a whole indicates that the person John is the agent of some instance of going, the city Boston is the destination, and a bus is the instrument. Figure 1 can be translated to the following formula: As this translation shows, the only logical operators used in Figure are conjunction and the existential quantifier. Those two operators are the most common in translations from natural languages, and many of the early semantic networks could not represent any others.
  • 34. Structured Representation Structure representation can be done in various ways like: • Frames • Scripts Frames A frame is a method of representation in which a particular class is defined by a number of attributes (or slots) with certain values (the attributes are filled in for each instance). Thus, frames are also known as slot-and-filler structures. Frame systems are also somewhat equivalent to semantic networks although frames are usually associated with more defined structure than the networks. Like a semantic network, one of the chief properties of frames is that they provide a natural structure for inheritance. ISA-Links connect classes to larger parent classes and properties of the subclasses may be determined at both the level of the class itself and from parent classes. This leads into the idea of defaults. Frames may indicate specific values for some attributes or instead indicate a default. This is especially useful when values are not always known but can generally be assumed to be true for most of the class. For example, the class BIRD may have a default value of FLIES set to TRUE even though instances below it (say, for example, an OSTRICH) have FLIES values of FALSE. In addition, the values of particular attribute need not necessarily be filled with a value but may also indicate a procedure to run to obtain a value. This is known as an attached procedure. Attached procedures are especially useful when there is a high cost associated with computing a particular value, when the value changes with time or when the expected access frequency is low. Instead of computing the value for each instance, the values are computed only when needed. However, this computation is run during execution (rather than during the establishment of the frame network) and may be costly. Scripts A script is a remembered precedent, consisting of tightly coupled, expectation-suggesting primitive-action and state-change frames. A script is a structured representation describing a stereotyped sequence of events in a particular context. That is, extend frames by explicitly representing expectations of actions and state changes. Why represent knowledge in this way? 1) Because real-world events do follow stereotyped patterns. Human beings use previous experiences to understand verbal accounts; computers can use scripts instead.
  • 35. 2) Because people, when relating events, do leave large amounts of assumed detail out of their accounts. People don't find it easy to converse with a system that can't fill in missing conversational detail. Min Max Algorithm There are plenty of applications for AI, but games are the most interesting to the public. Nowadays every major OS comes with some games. So it is no surprise that there are some algorithms that were devised with games in mind. The Min-Max algorithm is applied in two player games, such as tic-tac-toe, checkers, chess, go, and so on. All these games have at least one thing in common, they are logic games. This means that they can be described by a set of rules and premisses. With them, it is possible to know from a given point in the game, what are the next available moves. So they also share other characteristic, they are ‘full information games’. Each player knows everything about the possible moves of the adversary. Before explaining the algorithm, a brief introduction to search trees is required. Search trees are a way to represent searches. The squares are known as nodes and they represent points of the decision in the search. The nodes are connected with branches. The search starts at the root node, the one at the top of the figure. At each decision point, nodes for the available search paths are generated, until no more decisions are possible. The nodes that represent the end of the search are known as leaf nodes. There are two players involved, MAX and MIN. A search tree is generated, depth-first, starting with the current game position upto the end game position. Then, the final game position is evaluated from MAX’s point of view, as shown in Figure 1. Afterwards, the inner node values of the tree are filled bottom-up with the evaluated values. The nodes that belong to the MAX player receive the maximun value of it’s children. The nodes for the MIN player will select the minimun value of it’s children. MinMax (GamePosition game) { return MaxMove (game); } MaxMove (GamePosition game) { if (GameEnded(game)) { return EvalGameState(game);
  • 36. } else { best_move < - {}; moves <- GenerateMoves(game); ForEach moves { move <- MinMove(ApplyMove(game)); if (Value(move) > Value(best_move)) { best_move < - move; } } return best_move; } } MinMove (GamePosition game) { best_move <- {}; moves <- GenerateMoves(game); ForEach moves { move <- MaxMove(ApplyMove(game)); if (Value(move) > Value(best_move)) { best_move < - move; } } return best_move; } So what is happening here? The values represent how good a game move is. So the MAX player will try to select the move with highest value in the end. But the MIN player also has something to say about it and he will try to select the moves that are better to him, thus minimizing MAX’s outcome. Optimisation However only very simple games can have their entire search tree generated in a short time. For most games this isn’t possible, the universe would probably vanish first. So there are a few optimizations to add to the algorithm. First a word of caution, optimization comes with a price. When optimizing we are trading the full information about the game’s events with probabilities and shortcuts. Instead of knowing the full path that leads to victory, the decisions are made with the path that might lead to victory. If the optimization isn’t well choosen, or it is badly applied, then we could end with a dumb AI. And it would have been better to use random moves.
  • 37. One basic optimization is to limit the depth of the search tree. Why does this help? Generating the full tree could take ages. If a game has a branching factor of 3, which means that each node has tree children, the tree will have the folling number of nodes per depth: The sequence shows that at depth n the tree will have 3^n nodes. To know the total number of generated nodes, we need to sum the node count at each level. So the total number of nodes for a tree with depth n is sum (0, n, 3^n). For many games, like chess that have a very big branching factor, this means that the tree might not fit into memory. Even if it did, it would take to long to generate. If each node took 1s to be analyzed, that means that for the previous example, each search tree would take sum (0, n, 3^n) * 1s. For a search tree with depth 5, that would mean 1+3+9+27+81+243 = 364 * 1 = 364s = 6m! This is too long for a game. The player would give up playing the game, if he had to wait 6m for each move from the computer. The second optimization is to use a function that evaluates the current game position from the point of view of some player. It does this by giving a value to the current state of the game, like counting the number of pieces in the board, for example. Or the number of moves left to the end of the game, or anything else that we might use to give a value to the game position. Instead of evaluating the current game position, the function might calculate how the current game position might help ending the game. Or in another words, how probable is that given the current game position we might win the game. In this case the function is known as an estimation function. This function will have to take into account some heuristics. Heuristics are knowledge that we have about the game, and it can help generate better evaluation functions. For example, in checkers, pieces at corners and sideways positions can’t be eaten. So we can create an evaluation function that gives higher values to pieces that lie on those board positions thus giving higher outcomes for game moves that place pieces in those positions. One of the reasons that the evaluation function must be able to evalute game positions for both players is that you don’t know to which player the limit depth belongs. However having two functions can be avoided if the game is symetric. This means that the loss of a player equals the gains of the other. Such games are also known as ZERO-SUM games. For these games one evalution function is enough, one of the players just have to negate the return of the function.
  • 38. The revised algorithm is: MinMax (GamePosition game) { return MaxMove (game); } MaxMove (GamePosition game) { if (GameEnded(game) || DepthLimitReached()) { return EvalGameState(game, MAX); } else { best_move < - {}; moves <- GenerateMoves(game); ForEach moves { move <- MinMove(ApplyMove(game)); if (Value(move) > Value(best_move)) { best_move < - move; } } return best_move; } } MinMove (GamePosition game) { if (GameEnded(game) || DepthLimitReached()) { return EvalGameState(game, MIN); } else { best_move <- {}; moves <- GenerateMoves(game); ForEach moves { move <- MaxMove(ApplyMove(game)); if (Value(move) > Value(best_move)) { best_move < - move; } } return best_move; } } Even so the algorithm has a few flaw, some of them can be fixed while other can only be solved by choosing another algorithm. One of flaws is that if the game is too complex the answer will always take too long even with a depth limit. One solution it limit the time for search. If the time runs out choose the best move found until the moment.
  • 39. A big flaw is the limited horizon problem. A game position that appears to be very good might turn out very bad. This happens because the algorithm wasn’t able to see that a few game moves ahead the adversary will be able to make a move that will bring him a great outcome. The algorithm missed that fatal move because it was blinded by the depth limit. Speeding the Algorithm There are a few things can still be done to reduce the search time. Take a look at figure 2. The value for node A is 3, and the first found value for the subtree starting at node B is 2. So since the B node is at a MIN level, we know that the selected value for the B node must be less or equal than 2. But we also know that the A node has the value 3, and both A and B nodes share the same parent at a MAX level. This means that the game path starting at the B node wouldn’t be selected because 3 is better than 2 for the MAX node. So it isn’t worth to pursue the search for children of the B node, and we can safely ignore all the remaining children. This all means that sometimes the search can be aborted because we find out that the search subtree won’t lead us to any viable answer. This optimization is know as alpha-beta cuttoffs and the algorithm is as follows: 1. Have two values passed around the tree nodes: i)the alpha value which holds the best MAX value found; ii)the beta value which holds the best MIN value found. 2. At MAX level, before evaluating each child path, compare the returned value with of the previous path with the beta value. If the value is greater than it abort the search for the current node; 3. At MIN level, before evaluating each child path, compare the returned value with of the previous path with the alpha value. If the value is lesser than it abort the search for the current node. Full pseudocode for MinMax with alpha-beta cuttoffs. MinMax (GamePosition game) { return MaxMove (game); }
  • 40. MaxMove (GamePosition game, Integer alpha, Integer beta) { if (GameEnded(game) || DepthLimitReached()) { return EvalGameState(game, MAX); } else { best_move < - {}; moves <- GenerateMoves(game); ForEach moves { move <- MinMove(ApplyMove(game), alpha, beta); if (Value(move) > Value(best_move)) { best_move < - move; alpha <- Value(move); } // Ignore remaining moves if (beta > alpha) return best_move; } return best_move; } } MinMove (GamePosition game) { if (GameEnded(game) || DepthLimitReached()) { return EvalGameState(game, MIN); } else { best_move < - {}; moves <- GenerateMoves(game); ForEach moves { move <- MaxMove(ApplyMove(game), alpha, beta); if (Value(move) > Value(best_move)) { best_move < - move; beta <- Value(move); } // Ignore remaining moves if (beta < alpha) return best_move; } return best_move; } } How better does a MinMax with alpha-beta cuttoffs behave when compared with a normal MinMax? It depends on the order the search is searched. If the way the game positions are
  • 41. generated doesn’t create situations where the algorithm can take advantage of alpha-beta cutoffs then the improvements won’t be noticible. However, if the evaluation function and the generation of game positions leads to alpha-beta cuttoffs then the improvements might be great. Alpha-Beta Cutoff With all this talk about search speed many of you might be wondering what this is all about. Well, the search speed is very important in AI because if an algorithm takes too long to give a good answer the algorithm may not be suitable. For example, a good MinMax algorithm implementation with an evaluation function capable to give very good estimatives might be able to search 1000 positions a second. In tourament chess each player has around 150 seconds to make a move. So it would probably be able to analyze 150 000 positions during that period. But in chess each move has around 35 possible branchs! In the end the program would only be able to analyze around 3, to 4 moves ahead in the game. Even humans with very few pratice in chess can do better than this. But if we use MinMax with alpha-beta cutoffs, again a decent implementation with a good evaluation function, the result behaviour might be much better. In this case, the program might be able to double the number of analyzed positions and thus becoming a much toughter adversary. Example Example of a board with the values estimated for each position. The game uses MinMax with alpha-beta cutoffs for the computer moves. The evaluation function is an weighted average of the positions occupied by the checker pieces. The figure shows the values for each board position. The value of each board position is multiplied by the type of the piece that rests on that position, described in first table. Rule based Expert System
  • 42. Expert System ! "An expert system is an interactive computer-based decision tool that uses both facts and heuristics to solve difficult decision problems based on knowledge acquired from an expert." An expert system is a computer program that simulates the thought process of a human expert to solve complex decision problems in a specific domain. This chapter addresses the characteristics of expert systems that make them different from conventional programming and traditional de- cision support tools. The growth of expert systems is expected to continue for several years. With the continuing growth, many new and exciting applications will emerge. An expert system operates as an interactive system that responds to questions, asks for clarification, makes recommendations, and generally aids the decision-making process. Expert systems provide expert advice and guidance in a wide variety of activities, from computer diagnosis An expert system may be viewed as a computer simulation of a human expert. Expert systems are an emerging technology with many areas for po- tential applications. Past applications range from MYCIN, used in the medical field to diagnose infectious blood diseases, to XCON, used to configure com- puter systems. These expert systems have proven to be quite successful. Most applications of expert systems will fall into one of the following categories: • Interpreting and identifying • Predicting • Diagnosing • Designing • Planning • Monitoring • Debugging and testing • Instructing and training • Controlling Applications that are computational or deterministic in nature are not good candidates for expert systems. Traditional decision support systems such as spreadsheets are very mechanistic in the way they solve problems. They operate under mathematical and Boolean operators in their execution and arrive at one and only one static solution for a given set of data. Calculation intensive applications with very exacting requirements are better handled by traditional decision support tools or conventional programming. The best application candidates for expert systems are those dealing with expert heuristics for solving problems. Conventional computer programs are based on factual knowledge, an indisputable strength of computers. Humans, by contrast, solve problems on the basis of a mixture of factual and heuristic knowledge. Heuristic knowledge, composed of intuition, judgment, and logical inferences, is an indisputable strength of humans. Successful expert systems will be those that combine facts and heuristics and thus merge human knowledge with computer power in solving problems. To be effective, an expert system must focus on a particular problem domain, as discussed below Domain Specificity
  • 43. Expert systems are typically very domain specific. For example, a diagnostic expert system for troubleshooting computers must actually perform all the necessary data manipulation as a human expert would. The developer of such a system must limit his or her scope of the system to just what is needed to solve the target problem. Special tools or programming languages are often needed to accomplish the specific objectives of the system. Special Programming Languages Expert systems are typically written in special programming languages. The use of languages like LISP and PROLOG in the development of an expert system simplifies the coding process. The major advantage of these languages, as compared to conventional programming languages, is the simplicity of the addition, elimination, or substitution of new rules and memory management capabilities. Some of the distinguishing characteristics of programming languages needed for expert systems work are: • Efficient mix of integer and real variables • Good memory-management procedures • Extensive data-manipulation routines • Incremental compilation • Tagged memory architecture • Optimization of the systems environment • Efficient search procedures Architecture of Expert System ! Expert systems typically contain the following four components: • Knowledge-Acquisition Interface • User Interface • Knowledge Base • Inference Engine This architecture differs considerably from traditional computer programs, resulting in several characteristics of expert systems.
  • 44. # Expert System Components # Knowledge-Acquisition Interface The knowledge-acquisition interface controls how the expert and knowledge engineer interact with the program to incorporate knowledge into the knowledge base. It includes features to assist experts in expressing their knowledge in a form suitable for reasoning by the computer. This process of expressing knowledge in the knowledge base is called knowledge acquisition. Knowledge acquisition turns out to be quite difficult in many cases--so difficult that some authors refer to the knowledge acquisition bottleneck to indicate that it is this aspect of expert system development which often requires the most time and effort.
  • 45. Debugging faulty knowlege bases is facilitated by traces (lists of rules in the order they were fired), probes (commands to find and edit specific rules, facts, and so on), and bookkeeping functions and indexes (which keep track of various features of the knowledge base such as variables and rules). Some rule-based expert system shells for personal computers monitor data entry, checking the syntactic validity of rules. Expert systems are typically validated by testing their preditions for several cases against those of human experts. Case facilities--permitting a file of such cases to be stored and automatically evaluated after the program is revised--can greatly speed the vaidation process. Many features that are useful for the user interface, such as on- screen help and explanations, are also of benefit to the developer of expert systems and are also part of knowledge-acquisition interfaces. Expert systems in the literature demonstrate a wide range of modes of knowledge acquisition (Buchanan, 1985). Expert system shells on microcomputers typically require the user to either enter rules explicitly or enter several examples of cases with appropriate conclusions, from which the program will infer a rule. User Interface The user interface is the part of the program that interacts with the user. It prompts the user for information required to solve a problem, displays conclusions, and explains its reasoning. Features of the user interface often include: • Doesn't ask "dumb" questions • Explains its reasoning on request • Provides documentation and references • Defines technical terms • Permits sensitivity analyses, simulations, and what-if analyses • Detailed report of recommendations • Justifies recommendations • Online help • Graphical displays of information • Trace or step through reasoning The user interface can be judged by how well it reproduces the kind of interaction one might expect between a human expert and someone consulting that expert. Knowledge Base The knowledge base consists of specific knowledge about some substantive domain. A knowledge base differs from a data base in that the knowledge base includes both explicit knowledge and implicit knowledge. Much of the knowledge in the knowledge base is not stated explicitly, but inferred by the inference engine from explicit statements in the knowledge base. This makes knowledge bases have more efficient data storage than data bases and gives them the power to exhaustively represent all the knowledge implied by explicit statements of knowledge.
  • 46. There are several important ways in which knowledge is represented in a knowledge base. For more information, see knowledge representation strategies. Knowledge bases can contain many different types of knowledge and the process of acquiring knowledge for the knowledge base (this is often called knowledge acquisition) often needs to be quite different depending on the type of knowledge sought. Types of Knpwledge There are many different kinds of knowledge considered in expert systems. Many of these form dimensions of contrasting knowledge: • explicit knowledge • implicit knowledge • domain knowledge • common sense or world knowledge • heuristics • algorithms • procedural knowledge • declarative or semantic knowledge • public knowledge • private knowledge • shallow knowledge • deep knowledge • metaknowledge Inference Engine The inference engine uses general rules of inference to reason from the knowledge base and draw conclusions which are not explicitly stated but can be inferred from the knowledge base. Inference engines are capable of symbolic reasoning, not just mathematical reasoning. Hence, they expand the scope of fruitful applications of computer programs. The specific forms of inference permitted by different inference engines varies, depending on several factors, including the knowledge representation strategies employed by the expert system. Expert System Development ! Most expert systems are developed by a team of people, with the number of members varying with the complexity and scope of the project. Of course, a single individual can develop a very simple system. But usually at least two people are involved. There are two essential roles that must filled by the development: knowledge engineer and substantive expert.
  • 47. The Knowledge Engineer • The Substantive Expert The Knowledge Engineer Criteria for selecting the Knowledge Engineer • Competent • Organized • Patient Problem with Knowledge Engineer • Technician with little social skill • Sociable with low technical skill • Disorganized • Unwilling to challeng expert to produce clarity • Unable to listen carefully to expert • Undiplomatic when discussing flaws in system or expert's knowledge • Unable to quickly understand diverse substantive areas The Substantive Expert Criteria for selecting the expert • Competent • Available • Articulate • Self-Confident • Open-Minded Varieties of experts • No expert • Multiple experts • Book knowledge only • The knowledge engineer is also the expert Problem Experts • The unavailable expert • The reluctant expert • The cynical expert • The arrogant expert • The rambling expert • The uncommunicative expert • The too-cooperative expert
  • 48. The would-be-knowledge-engineer expert Development Process The systems development process often used for traditional software such as management information systems often employs a process described as the "System Development Life Cycle" or "Waterfall" Model. While this model identifies a number of important tasks in the development process, many developers of expert systems have found it to be inadequate for expert systems for a number of important reasons. Instead, many expert systems are developed using a process called "Rapid Prototyping and Incremental Development." System Development Life-Cycle Problem Analysis Is the problem solvable? Is it feasible with this approach? cost-benefit analysis Requirement Specification What are the desired features and goals of the proposed system? Who are the users? What constraints must be considered? What development and delivery environments will be used? Design Preliminary Design - overall structure, data flow diagram, perhaps language Detailed Design - details of each module Implementation Writing and debugging code, integrating modules, creating interfaces Testing Comparing system to its specifications and assessing validity Maintenance Corrections, modifications, enhancements Managing Uncertainty in Expert Systems Sources of uncertainty in Expert System • Weak implication • Imprecise language
  • 49. Unknown data • Difficulty in combining the views of different experts Uncertainty in AI • Information is partial • Information is not fully reliable • Representation language is inherently imprecise • Information comes from multiple sources and it is conflicting • Information is approximate • Non-absolute cause-effect relationship exist Representing uncertain information in Expert System • Probabilistic • Certainty factors • Theory of evidence • Fuzzy logic • Neural Network • GA • Rough set Bayesian Probability Theory Bayesian probability is one of the most popular interpretations of the concept of probability. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with uncertain statements. To evaluate the probability of a hypothesis, the Bayesian probabilist specifies some prior probability, which is then updated in the light of new relevant data. The Bayesian interpretation provides a standard set of procedures and formulae to perform this calculation. Bayesian probability interprets the concept of probability as "a measure of a state of knowledge", in contrast to interpreting it as a frequency or a physical property of a system. Its name is derived from the 18th century statistician Thomas Bayes, who pioneered some of the concepts. Broadly speaking, there are two views on Bayesian probability that interpret the state of knowledge concept in different ways. According to the objectivist view, the rules of Bayesian statistics can be justified by requirements of rationality and consistency and interpreted as an extension of logic. According to the subjectivist view, the state of knowledge measures a "personal belief". Many modern machine learning methods are based on objectivist Bayesian principles. One of the crucial features of the Bayesian view is that a probability is assigned to a hypothesis, whereas under the frequentist view, a hypothesis is typically rejected or not rejected without directly assigning a probability. The probability of a hypothesis given the data (the posterior) is proportional to the product of the likelihood times the prior probability (often just called the prior). The likelihood brings in the
  • 50. effect of the data, while the prior specifies the belief in the hypothesis before the data was observed. More formally, Bayesian inference uses Bayes' formula for conditional probability: where H is a hypothesis, and D is the data. P(H) is the prior probability of H: the probability that H is correct before the data D was seen. P(D | H) is the conditional probability of seeing the data D given that the hypothesis H is true. P(D | H) is called the likelihood. P(D) is the marginal probability of D. P(H | D) is the posterior probability: the probability that the hypothesis is true, given the data and the previous state of belief about the hypothesis. Stanford Certainty Factor ! Uncertainty is represented as a degree of belief in two steps: • Express the degree of belief • Manipulate the degrees of belief during the use of knowledge based systems It is also based on evidence (or the expert’s assessment). Form of certainty factors in ES IF <evidence> THEN <hypothesis> {cf } cf represents belief in hypothesis H given that evidence E has occurred It is based on 2 functions i) Measure of belief MB(H, E) ii) Measure of disbelief MD(H, E) Indicate the degree to which belief/disbelief of hypothesis H is increased if evidence E were observed
  • 51. Uncertain term and their intepretation Total strength of belief and disbelief in a hypothesis: Nonmonotonic logic and Reasoning with Beliefs A non-monotonic logic is a formal logic whose consequence relation is not monotonic. Most studied formal logics have a monotonic consequence relation, meaning that adding a formula to a theory never produces a reduction of its set of consequences. Intuitively, monotonicity indicates that learning a new piece of knowledge cannot reduce the set of what is known. A monotonic logic cannot handle various reasoning tasks such as reasoning by default (consequences may be derived only because of lack of evidence of the contrary), abductive reasoning (consequences are only deduced as most likely explanations) and some important approaches to reasoning about knowledge (the ignorance of a consequence must be retracted when the consequence becomes known) and similarly belief revision (new knowledge may contradict old beliefs). Default reasoning An example of a default assumption is that the typical bird flies. As a result, if a given animal is known to be a bird, and nothing else is known, it can be assumed to be able to fly. The default assumption must however be retracted if it is later learned that the considered animal is a penguin. This example shows that a logic that models default reasoning should not be monotonic. Logics formalizing default reasoning can be roughly divided in two categories: logics able to deal with arbitrary default assumptions (default logic, defeasible logic/defeasible reasoning/argument (logic), and answer set programming) and logics that formalize the specific
  • 52. default assumption that facts that are not known to be true can be assumed false by default (closed world assumption and circumscription). Abductive reasoning Abductive reasoning is the process of deriving the most likely explanations of the known facts. An abductive logic should not be monotonic because the most likely explanations are not necessarily correct. For example, the most likely explanation for seeing wet grass is that it rained; however, this explanation has to be retracted when learning that the real cause of the grass being wet was a sprinkler. Since the old explanation (it rained) is retracted because of the addition of a piece of knowledge (a sprinkler was active), any logic that models explanations is non-monotonic. Reasoning about knowledge If a logic includes formulae that mean that something is not known, this logic should not be monotonic. Indeed, learning something that was previously not known leads to the removal of the formula specifying that this piece of knowledge is not known. This second change (a removal caused by an addition) violates the condition of monotonicity. A logic for reasoning about knowledge is the autoepistemic logic. Belief revision Belief revision is the process of changing beliefs to accommodate a new belief that might be inconsistent with the old ones. In the assumption that the new belief is correct, some of the old ones have to be retracted in order to maintain consistency. This retraction in response to an addition of a new belief makes any logic for belief revision to be non-monotonic. The belief revision approach is alternative to paraconsistent logics, which tolerate inconsistency rather than attempting to remove it. What makes belief revision non-trivial is that several different ways for performing this operation may be possible. For example, if the current knowledge includes the three facts “A is true”, “B is true” and “if A and B are true then C is true”, the introduction of the new information “C is false” can be done preserving consistency only by removing at least one of the three facts. In this case, there are at least three different ways for performing revision. In general, there may be several different ways for changing knowledge. Fuzzy Logic The concept of Fuzzy Logic (FL) was conceived by Lotfi Zadeh, a professor at the University of California at Berkley, and presented not as a control methodology, but as a way of processing data by allowing partial set membership rather than crisp set membership or non-membership. This approach to set theory was not applied to control systems until the 70's due to insufficient small-computer capability prior to that time. Professor Zadeh reasoned that people do not require precise, numerical information input, and yet they are capable of highly adaptive control. If feedback controllers could be programmed to accept noisy, imprecise input, they would be much
  • 53. more effective and perhaps easier to implement. Unfortunately, U.S. manufacturers have not been so quick to embrace this technology while the Europeans and Japanese have been aggressively building real products around it. WHAT IS FUZZY LOGIC? In this context, FL is a problem-solving control system methodology that lends itself to implementation in systems ranging from simple, small, embedded micro-controllers to large, networked, multi-channel PC or workstation-based data acquisition and control systems. It can be implemented in hardware, software, or a combination of both. FL provides a simple way to arrive at a definite conclusion based upon vague, ambiguous, imprecise, noisy, or missing input information. FL's approach to control problems mimics how a person would make decisions, only much faster. HOW IS FL DIFFERENT FROM CONVENTIONAL CONTROL METHODS? FL incorporates a simple, rule-based IF X AND Y THEN Z approach to a solving control problem rather than attempting to model a system mathematically. The FL model is empirically- based, relying on an operator's experience rather than their technical understanding of the system. For example, rather than dealing with temperature control in terms such as "SP =500F", "T <1000F", or "210C <TEMP <220C", terms like "IF (process is too cool) AND (process is getting colder) THEN (add heat to the process)" or "IF (process is too hot) AND (process is heating rapidly) THEN (cool the process quickly)" are used. These terms are imprecise and yet very descriptive of what must actually happen. Consider what you do in the shower if the temperature is too cold: you will make the water comfortable very quickly with little trouble. FL is capable of mimicking this type of behavior but at very high rate. HOW DOES FL WORK? FL requires some numerical parameters in order to operate such as what is considered significant error and significant rate-of-change-of-error, but exact values of these numbers are usually not critical unless very responsive performance is required in which case empirical tuning would determine them. For example, a simple temperature control system could use a single temperature feedback sensor whose data is subtracted from the command signal to compute "error" and then time-differentiated to yield the error slope or rate-of-change-of-error, hereafter called "error-dot". Error might have units of degs F and a small error considered to be 2F while a large error is 5F. The "error-dot" might then have units of degs/min with a small error-dot being 5F/min and a large one being 15F/min. These values don't have to be symmetrical and can be "tweaked" once the system is operating in order to optimize performance. Generally, FL is so forgiving that the system will probably work the first time without any tweaking. Dempster/Shafer Theory The Dempster-Shafer theory, also known as the theory of belief functions, is a generalization of the Bayesian theory of subjective probability. Whereas the Bayesian theory requires probabilities for each question of interest, belief functions allow us to base degrees of belief for one question
  • 54. on probabilities for a related question. These degrees of belief may or may not have the mathematical properties of probabilities; how much they differ from probabilities will depend on how closely the two questions are related. The Dempster-Shafer theory owes its name to work by A. P. Dempster (1968) and Glenn Shafer (1976), but the kind of reasoning the theory uses can be found as far back as the seventeenth century. The theory came to the attention of AI researchers in the early 1980s, when they were trying to adapt probability theory to expert systems. Dempster-Shafer degrees of belief resemble the certainty factors in MYCIN, and this resemblance suggested that they might combine the rigor of probability theory with the flexibility of rule-based systems. Subsequent work has made clear that the management of uncertainty inherently requires more structure than is available in simple rule-based systems, but the Dempster-Shafer theory remains attractive because of its relative flexibility. The Dempster-Shafer theory is based on two ideas: the idea of obtaining degrees of belief for one question from subjective probabilities for a related question, and Dempster's rule for combining such degrees of belief when they are based on independent items of evidence. To illustrate the idea of obtaining degrees of belief for one question from subjective probabilities for another, suppose I have subjective probabilities for the reliability of my friend Jon. My probability that he is reliable is 0.9, and my probability that he is unreliable is 0.1. Suppose he tells me a limb fell on my car. This statement, which must true if she is reliable, is not necessarily false if she is unreliable. So his testimony alone justifies a 0.9 degree of belief that a limb fell on my car, but only a zero degree of belief (not a 0.1 degree of belief) that no limb fell on my car. This zero does not mean that I am sure that no limb fell on my car, as a zero probability would; it merely means that jon's testimony gives me no reason to believe that no limb fell on my car. The 0.9 and the zero together constitute a belief function. Knowledge Acquisition Knowledge Acquisition is concerned with the development of knowledge bases based on the expertise of a human expert. This requires to express knowledge in a formalism suitable for automatic interpretation. Within this field, research at UNSW focusses on incremental knowledge acquisition techniques, which allow a human expert to provide explanations of their decisions that are automatically integrated into sophisticated knowledge bases. Types of Learning Learning is acquiring new knowledge, behaviors, skills, values, preferences or understanding, and may involve synthesizing different types of information. The ability to learn is possessed by humans, animals and some machines. Progress over time tends to follow learning curves. Human learning may occur as part of education or personal development. It may be goal- oriented and may be aided by motivation. The study of how learning occurs is part of neuropsychology, educational psychology, learning theory, and pedagogy.
  • 55. Learning may occur as a result of habituation or classical conditioning, seen in many animal species, or as a result of more complex activities such as play, seen only in relatively intelligent animals and humans. Learning may occur consciously or without conscious awareness. There is evidence for human behavioral learning prenatally, in which habituation has been observed as early as 32 weeks into gestation, indicating that the central nervous system is sufficiently developed and primed for learning and memory to occur very early on in development. Play has been approached by several theorists as the first form of learning. Children play, experiment with the world, learn the rules, and learn to interact. Vygotsky agrees that play is pivotal for children's development, since they make meaning of their environment through play. Types of Learning Habituation In psychology, habituation is an example of non-associative learning in which there is a progressive diminution of behavioral response probability with repetition of a stimulus. It is another form of integration. An animal first responds to a stimulus, but if it is neither rewarding nor harmful the animal reduces subsequent responses. One example of this can be seen in small song birds - if a stuffed owl (or similar predator) is put into the cage, the birds initially react to it as though it were a real predator. Soon the birds react less, showing habituation. If another stuffed owl is introduced (or the same one removed and re-introduced), the birds react to it again as though it were a predator, demonstrating that it is only a very specific stimulus that is habituated to (namely, one particular unmoving owl in one place). Habituation has been shown in essentially every species of animal, including the large protozoan Stentor Coeruleus. Sensitization Sensitization is an example of non-associative learning in which the progressive amplification of a response follows repeated administrations of a stimulus (Bell et al., 1995). An everyday example of this mechanism is the repeated tonic stimulation of peripheral nerves that will occur if a person rubs his arm continuously. After a while, this stimulation will create a warm sensation that will eventually turn painful. The pain is the result of the progressively amplified synaptic response of the peripheral nerves warning the person that the stimulation is harmful. Sensitization is thought to underlie both adaptive as well as maladaptive learning processes in the organism. Asociative learning Associative learning is the process by which an element is learned through association with a separate, pre-occurring element. Operant conditioning Operant conditioning is the use of consequences to modify the occurrence and form of behavior. Operant conditioning is distinguished from Pavlovian conditioning in that operant conditioning