Ai complete note

Chapter -1: Introduction

What is artificial intelligence?

It is the science and engineering of making intelligent machines, especially intelligent
computer programs. It is related to the similar task of using computers to understand human
intelligence, but AI does not have to confine itself to methods that are biologically
observable.

It is Duplication of human thought process by machine
Learning from experience
Interpreting ambiguities
Rapid response to varying situations
Applying reasoning to problem-solving
Manipulating environment by applying knowledge
Thinking and reasoning

Yes, but what is intelligence?

Intelligence is the computational part of the ability to achieve goals in the world. Varying
kinds and degrees of intelligence occur in people, many animals and some machines.

Isn't there a solid definition of intelligence that doesn't depend on relating it to human
intelligence?

Not yet. The problem is that we cannot yet characterize in general what kinds of
computational procedures we want to call intelligent. We understand some of the mechanisms
of intelligence and not others.

Acting humanly: The Turing Test approach

Fig. The imitation game

Abridged history of AI(summary)

1943 McCulloch & Pitts: Boolean circuit model of brain
1950 Turing's "Computing Machinery and Intelligence"
1956 Dartmouth meeting: "Artificial Intelligence" adopted
1950s Early AI programs, including Samuel's checkers program, Newell & Simon's
Logic Theorist, Gelernter's Geometry Engine
1965 Robinson's complete algorithm for logical reasoning
1966—73 AI discovers computational complexity, neural network research almost
disappears
1969—79 early development of knowledge-based systems
1980-- AI becomes an industry
1986-- Neural networks return to popularity
1987-- AI becomes a science
1995-- The emergence of intelligent agents

Prepared By: Najar Aryal, BCT(III/II), KEC,Kalimati Page 1

√ Goals of AI
Replicate human intelligence

"AI is the study of complex information processing problems that often have their roots in
some aspect of biological information processing. The goal of the subject is to identify
solvable and interesting information processing problems, and solve them." -- David Marr

Solve knowledge-intensive tasks

"AI is the design, study and construction of computer programs that behave intelligently." --
Tom Dean

"... to achieve their full impact, computer systems must have more than processing power--
they must have intelligence. They need to be able to assimilate and use large bodies of
information and collaborate with and help people find new ways of working together
effectively. The technology must become more responsive to human needs and styles of
work, and must employ more natural means of communication." -- Barbara Grosz and
Randall Davis

Intelligent connection of perception and action

AI not centered around representation of the world, but around action in the world. Behavior-
based intelligence. (see Rod Brooks in the movie Fast, Cheap and Out of Control)

Enhance human-human, human-computer and computer-computer interaction/communication

Computer can sense and recognize its users, see and recognize its environment, respond
visually and audibly to stimuli. New paradigms for interacting productively with computers
using speech, vision, natural language, 3D virtual reality, 3D displays, more natural and
powerful user interfaces, etc. (See, for example, projects in Microsoft's "Advanced
Interactivity and Intelligence" group.)

Some Application Areas of AI

Game Playing
Deep Blue Chess program beat world champion Gary Kasparov
Speech Recognition
PEGASUS spoken language interface to American Airlines' EAASY SABRE reseration
system, which allows users to obtain flight information and make reservations over the
telephone. The 1990s has seen significant advances in speech recognition so that limited
systems are now successful.
Computer Vision
Face recognition programs in use by banks, government, etc. The ALVINN system from
CMU autonomously drove a van from Washington, D.C. to San Diego (all but 52 of 2,849
miles), averaging 63 mph day and night, and in all weather conditions. Handwriting
recognition, electronics and manufacturing inspection, photointerpretation, baggage
inspection, reverse engineering to automatically construct a 3D geometric model.
Expert Systems
Application-specific systems that rely on obtaining the knowledge of human experts in an
area and programming that knowledge into a system.
o Diagnostic Systems
Microsoft Office Assistant in Office 97 provides customized help by decision-
theoretic reasoning about an individual user. MYCIN system for diagnosing bacterial
infections of the blood and suggesting treatments. Intellipath pathology diagnosis


system (AMA approved). Pathfinder medical diagnosis system, which suggests tests
and makes diagnoses. Whirlpool customer assistance center.
o System Configuration
DEC's XCON system for custom hardware configuration. Radiotherapy treatment
planning.
o Financial Decision Making
Credit card companies, mortgage companies, banks, and the U.S. government employ
AI systems to detect fraud and expedite financial transactions. For example, AMEX
credit check. Systems often use learning algorithms to construct profiles of customer
usage patterns, and then use these profiles to detect unusual patterns and take
appropriate action.
o Classification Systems
Put information into one of a fixed set of categories using several sources of
information. E.g., financial decision making systems. NASA developed a system for
classifying very faint areas in astronomical images into either stars or galaxies with
very high accuracy by learning from human experts' classifications.
Mathematical Theorem Proving
Use inference methods to prove new theorems.
Natural Language Understanding
AltaVista's translation of web pages. Translation of Catepillar Truck manuals into 20
languages. (Note: One early system translated the English sentence "The spirit is willing but
the flesh is weak" into the Russian equivalent of "The vodka is good but the meat is rotten.")
Scheduling and Planning
Automatic scheduling for manufacturing. DARPA's DART system used in Desert Storm and
Desert Shield operations to plan logistics of people and supplies. American Airlines rerouting
contingency planner. European space agency planning and scheduling of spacecraft assembly,
integration and verification.

Some AI "Grand Challenge" Problems

Translating telephone
Accident-avoiding car
Aids for the disabled
Smart clothes
Intelligent agents that monitor and manage information by filtering, digesting, abstracting
Tutors
Self-organizing systems, e.g., that learn to assemble something by observing a human do it.

A Framework for Building AI Systems
 Perception
Intelligent biological systems are physically embodied in the world and experience the world through
their sensors (senses). For an autonomous vehicle, input might be images from a camera and range
information from a rangefinder. For a medical diagnosis system, perception is the set of symptoms
and test results that have been obtained and input to the system manually. Includes areas of vision,
speech processing, natural language processing, and signal processing (e.g., market data and acoustic
data).

 Reasoning
Inference, decision-making, classification from what is sensed and what the internal "model" is of the
world. Might be a neural network, logical deduction system, Hidden Markov Model induction,
heuristic searching a problem space, Bayes Network inference, genetic algorithms, etc. Includes areas
of knowledge representation, problem solving, decision theory, planning, game theory, machine
learning, uncertainty reasoning, etc.


 Action
Biological systems interact within their environment by actuation, speech, etc. All behavior is
centered around actions in the world. Examples include controlling the steering of a Mars rover or
autonomous vehicle, or suggesting tests and making diagnoses for a medical diagnosis system.
Includes areas of robot actuation, natural language generation, and speech synthesis.

Some Fundamental Issues for Most AI Problems

Representation
Facts about the world have to be represented in some way, e.g., mathematical logic is one
language that is used in AI. Deals with the questions of what to represent and how to
represent it. How to structure knowledge? What is explicit, and what must be inferred? How
to encode "rules" for inferencing so as to find information that is only implicitly known? How
to deal with incomplete, inconsistent, and probabilistic knowledge? Epistemology issues
(what kinds of knowledge are required to solve problems).

Example: "The fly buzzed irritatingly on the window pane. Jill picked up the newspaper."
Inference: Jill has malicious intent; she is not intending to read the newspaper, or use it to
start a fire, or ...

Example: Given 17 sticks in 3 x 2 grid, remove 5 sticks to leave exactly 3 squares.

Search
Many tasks can be viewed as searching a very large problem space for a solution. For
example, Checkers has about 1040 states, and Chess has about 10120 states in a typical games.
Use of heuristics (meaning "serving to aid discovery") and constraints.
Inference
From some facts others can be inferred. Related to search. For example, knowing "All
elephants have trunks" and "Clyde is an elephant," can we answer the question "Does Clyde
hae a trunk?" What about "Peanuts has a trunk, is it an elephant?" Or "Peanuts lives in a tree
and has a trunk, is it an elephant?" Deduction, abduction, non-monotonic reasoning, reasoning
under uncertainty.
Learning
Inductive inference, neural networks, genetic algorithms, artificial life, evolutionary
approaches.
Planning
Starting with general facts about the world, facts about the effects of basic actions, facts about
a particular situation, and a statement of a goal, generate a strategy for achieving that goals in
terms of a sequence of primitive steps or actions.

The State of the Art
Computer beats human in a chess game.
Computer-human conversation using speech recognition.
Computer program can chat with human
Expert system controls a spacecraft.
Robot can walk on stairs and hold a cup of water.
Language translation for webpages.
Home appliances use fuzzy logic.


Agent and Environment
An agent is anything that can be viewed as perceiving its environment through
sensors and acting upon that environment through actuators. A human agent has
eyes, ears, and other organs for sensors and hands, legs, mouth, and other body parts
for actuators. A robotic agent might have cameras and infrared range finders for
sensors and various motors for actuators. A software agent receives keystrokes, file
contents, and network packets as sensory inputs and acts on the environment by
displaying on the screen, writing files, and sending network packets. We will make
the general assumption that every agent can perceive its own actions (but not always
the effects).
We use the term percept to refer to the agent's perceptual inputs at any given instant.
An agent's percept sequence is the complete history of everything the agent has ever
perceived. In general, an agent's choice of action at any given instant can depend on
the entire percept sequence observed to date

If we can specify the agent's choice of action for every possible percept sequence,
then we have said more or less everything there is to say about the agent.
Mathematically speaking, we say that an agent's behavior is described by the agent
function that maps any given percept sequence to an action.
f : P * A
The agent program runs on the physical architecture to produce f


Fig. Agents interact with environments through sensors and actuators

Fig. Vacuum cleaner world

Percepts: location and contents, e.g., [A, Dirty]
Actions: Left, Right, Suck, NoOp
For Vacuum Cleaner Agent:

Percept sequence Action

[A, Clean] Right

[A, Dirty] Suck

[B, Clean] Left

[B, Dirty] Suck

[A, Clean], [A, Clean] Right

[A, Clean], [A, Dirty] Suck

…

function Reflex-Vacuum-Agent( [location,status]) returns an action

if status = Dirty then return Suck
else if location =A then return Right
else if location = B then return Left
Rationality

Definition of Rational Agent:

For each possible percept sequence, a rational agent should select an action that is expected to maximize its
performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge
the agent has.

Rational ≠ omniscient (percepts may not supply all relevant information)
Rational ≠ clairvoyant (action outcomes may not be as expected)
Hence, rational ≠ successful


Rational exploration, learning, autonomy

PEAS (Performance measure, Environment, Actuators, Sensors)

To design a rational agent, we must specify the task environments. Task environments are
essentially the "problems" to which rational agents are the "solutions." Those task
environments come in a variety of flavors and the flavor of the task environment directly
affects the appropriate design for the agent program.

Consider, e.g., the task of designing an automated taxi:

Agent Type Performance Environment Actuators Sensors
Measure
Taxi driver Safe, fast, legal, Roads, other traffic, Steering, accelerator, Cameras, sonar,
comfortable trip, pedestrians, customers brake, signal, horn, speedometer, GPS,
maximize profits display odometer,
accelerometer, engine
sensors, keyboard
Figure PEAS description of the task environment for an automated taxi.

Agent Type Performance Environment Actuators Sensors
Measure
Medical Healthy patient, Patient, hospital, staff Display Keyboard entry of
diagnosis system minimize costs, questions, tests, symptoms, findings,
lawsuits diagnoses, treatments, patient's answers
referrals
Internet Shopping Price, quality, www sites, vendors, Display to user, follow HTML pages (text,
Agent appropriateness, shippers URL, fill in form graphics, scripts)
efficiency

The range of task environments that might arise in AI is obviously vast. We can, however,
identify a fairly small number of dimensions along which task environments can be catego-
rized.

Fully observable vs. partially observable:
If an agent's sensors give it access to the complete state of the environment at each point in
time, then we say that the task environment is fully observable. An environment might be
partially observable because of noisy and inaccurate sensors or because parts of the state are
simply missing from the sensor data For example, a vacuum agent with only a local dirt
sensor cannot tell whether there is dirt in other squares, and an automated taxi cannot see
what other drivers are thinking.

Deterministic vs. stochastic.
If the next state of the environment is completely determined by the current state and the
action executed by the agent, then we say the environment is deterministic; otherwise, it is
stochastic.
Episodic vs. sequential.
In an episodic task environment, the agent's experience is divided into atomic episodes.


Each episode consists of the agent perceiving and then performing a single action. Crucially,
the next episode does not depend on the actions taken in previous episodes. In episodic
environments, the choice of action in each episode depends only on the episode itself. In
sequential environments, on the other hand, the current decision could affect all future
decisions. Chess and taxi driving are sequential: in both cases, short-term actions can have
long-term consequences. Episodic environments are much simpler than sequential
environments because the agent does not need to think ahead.

Static vs. dynamic.
If the environment can change while an agent is deliberating, then we say the environ-
ment is dynamic for that agent; otherwise, it is static. If the environment itself does not
change with the passage of time but the agent's performance score does, then we say the
environment is semidynamic. Taxi driving is clearly dynamic: the other cars and the taxi
itself keep moving while the driving algorithm dithers about what to do next. Chess, when
played with a clock, is semidynamic. Crossword puzzles are static.

Discrete vs. continuous.
The discrete/continuous distinction can be applied to the state of the environment, to the way
time is handled, and to the percepts and actions of the agent. For example, a discrete-state
environment such as a chess game has a finite number of distinct states. Chess also has a
discrete set of percepts and actions. Taxi driving is a continuous-state

Single agent vs. multiagent.
Single agent and multiagent environment is differentiated by observing no. of agents in the
environment. For example, an agent solving a crossword puzzle by itself is clearly in a
single-agent environment, whereas an agent playing chess is in a two-agent environment.

As one might expect, the hardest case is partially observable, stochastic, sequential, dynamic,
continuous, and multiagent. The real world is partially observable, stochastic, sequential,
dynamic, continuous, multi-agent.

There are four basic kinds of agent program that embody the principles underlying almost all
intelligent systems. All these can be turned into learning agents
• Simple reflex agents;
• Model-based reflex agents;
• Goal-based agents; and
• Utility-based agents.
All these can be turned into learning agents.

Agent types; simple reflex
Select action on the basis of only the current percept.E.g. the vacuum-agent
Large reduction in possible percept/action situations.
Implemented through condition-action rules
If dirty then suck


function REFLEX-VACUUM-AGENT ([location, status]) return an action
if status == Dirty then return Suck
else if location == A then return Right
else if location == B then return Left
Reduction from 4T to 4 entries

Agent types; reflex and state
To tackle partially observable environments.
Maintain internal state
Over time update state using world knowledge
How does the world change.
How do actions affect world.
⇒Model of World

Agent types; goal-based
The agent needs a goal to know which situations are desirable.
o Things become difficult when long sequences of actions are required to find
the goal.
Typically investigated in search and planning research.
Major difference: future is taken into account
Is more flexible since knowledge is represented explicitly and can be manipulated.


Agent types; utility-based
Certain goals can be reached in different ways.
o Some are better, have a higher utility.
Utility function maps a (sequence of) state(s) onto a real number.
Improves on goals:
o Selecting between conflicting goals
o Select appropriately between several goals based on likelihood of success.

Agent types; learning
All previous agent-programs describe methods for selecting actions.
o Yet it does not explain the origin of these programs.
o Learning mechanisms can be used to perform this task.
o Teach them instead of instructing them.
o Advantage is the robustness of the program toward initially unknown
environments.

Learning element: introduce improvements in performance element.
Critic provides feedback on agents performance based on fixed performance standard.
Performance element: selecting actions based on percepts.
Corresponds to the previous agent programs


Problem generator: suggests actions that will lead to new and informative
experiences.
Exploration vs. exploitation

KNOWLEDGE
• Data = collection of facts, measurements, statistics
• Information = organized data
• Knowledge = contextual, relevant, actionable information
– Strong experiential and reflective elements
– Good leverage and increasing returns
– Dynamic
– Branches and fragments with growth
– Difficult to estimate impact of investment
– Uncertain value in sharing
– Evolves over time with experience
• Explicit knowledge
– Objective, rational, technical
– Policies, goals, strategies, papers, reports
– Codified
– Leaky knowledge
• Tacit knowledge
– Subjective, cognitive, experiential learning
– Highly personalized
– Difficult to formalize
– Sticky knowledge


Chapter 2 :Problem Solving

Problem-solving agent
Four general steps in problem solving:
Goal formulation
o What are the successful world states
Problem formulation
o What actions and states to consider to give the goal
Search
o Determine the possible sequence of actions that lead to the states of known
values and then choosing the best sequence.
Execute
o Give the solution perform the actions.
function SIMPLE-PROBLEM-SOLVING-AGENT(percept) return an action
static: seq, an action sequence
state, some description of the current world state
goal, a goal
problem, a problem formulation
state UPDATE-STATE(state, percept)
if seq is empty then
goal FORMULATE-GOAL(state)
problem FORMULATE-PROBLEM(state,goal)
seq SEARCH(problem)
action FIRST(seq)
seq REST(seq)
return action

EXAMPLE:

On holiday in Romania; currently in Arad
o Flight leaves tomorrow from Bucharest
Formulate goal
o Be in Bucharest
Formulate problem
o States: various cities
o Actions: drive between cities
Find solution
o Sequence of cities; e.g. Arad, Sibiu, Fagaras, Bucharest, …


Selecting a state space
Real world is absurdly complex.
State space must be abstracted for problem solving.
(Abstract) state = set of real states.
(Abstract) action = complex combination of real actions.
o e.g. Arad ®Zerind represents a complex set of possible routes, detours, rest stops, etc.
o The abstraction is valid if the path between two states is reflected in the real world.
(Abstract) solution = set of real paths that are solutions in the real world.
_ Each abstract action should be ―easier‖ than the real problem.

Formulating Problem as a Graph
In the graph

each node represents a possible state;
a node is designated as the initial state;
one or more nodes represent goal states, states in which the agent‘s goal is considered
accomplished.
each edge represents a state transition caused by specific agent action;
associated to each edge is the cost of performing that transition.

State space graph of vacuum world
Example: vacuum world
States?? two locations with or without dirt: 2 x 22=8 states.
Initial state?? Any state can be initial
Actions?? {Left, Right, Suck}
Goal test?? Check whether squares are clean.
o Path cost?? Number of actions to reach goal.

Example: 8-puzzle
States?? Integer location of each tile
Initial state?? Any state can be initial
Actions?? {Left, Right, Up, Down}
Goal test?? Check whether goal configuration is reached
o Path cost?? Number of actions to reach goal


Problem Solving as Search
Search space: set of states reachable from an initial state S0 via a (possibly empty/finite/infinite)
sequence of state transitions.

To achieve the problem‘s goal

search the space for a (possibly optimal) sequence of transitions starting from S0 and leading
to a goal state;
execute (in order) the actions associated to each transition in the identified sequence.

Depending on the features of the agent‘s world the two steps above can be interleaved.

How do we reach a goal state?

There may be several possible ways. Or none!

Factors to consider:

cost of finding a path;
cost of traversing a path.

Problem Solving as Search
Reduce the original problem to a search problem.
A solution for the search problem is a path initial state–goal state.
The solution for the original problem is either
o the sequence of actions associated with the path
o Or the description of the goal state.
Example: The 8-puzzle
It can be generalized to 15-puzzle, 24-puzzle, or (n2 − 1)-puzzle for n ≥ 6.

States: configurations of tiles


Operators: move one tile Up/Down/Left/Right
There are 9! = 362, 880 possible states (all permutations of {⊓⊔, 1, 2, 3, 4, 5, 6, 7,
8}).
There are 16! possible states for 15-puzzle.
Not all states are directly reachable from a given state.
(In fact, exactly half of them are reachable from a given state.)

How can an artificial agent represent the states and the state
space for this problem?

Go from state S to state G.

Problem formulation
A problem is defined by:
o An initial state, e.g. Arad
o Successor function S(X)= set of action-state pairs
 e.g. S(Arad)={<Arad ® Zerind, Zerind>,…}
intial state + successor function = state space
o Goal test, can be
 Explicit, e.g. x=‗at bucharest‘
 Implicit, e.g. checkmate(x)
o Path cost (additive)
 e.g. sum of distances, number of actions executed, …
 c(x,a,y) is the step cost, assumed to be >= 0
A solution is a sequence of actions from initial to goal state.
Optimal solution has the lowest path cost.

Problem formulation
1. Choose an appropriate data structure to represent the world states.
2. Define each operator as a precondition/effects pair where the precondition holds exactly in the
states the operator applies to, effects describe how a state changes into a successor state by the
application of the operator.
3. Specify an initial state.


4. Provide a description of the goal (used to check if a reached state is a goal state).

Formulating the 8-puzzle Problem
States: each represented by a 3 × 3 array of numbers in [0 . . . 8], where value 0 is for the empty cell.

Operators: 24 operators of the form Op(r,c,d) where r, c ∈ {1, 2, 3}, d ∈ {L,R,U,D}.
Op(r,c,d) moves the empty space at position (r, c) in the direction d.

Example: Op(3,2,R)

We have 24 operators in this problem formulation . . .
20 too many!
Problem types
Deterministic, fully observable ⇒single state problem
o Agent knows exactly which state it will be in; solution is a sequence.
Partial knowledge of states and actions:
o Non-observable ⇒sensorless or conformant problem
 Agent may have no idea where it is; solution (if any) is a sequence.
o Nondeterministic and/or partially observable ⇒contingency problem
 Percepts provide new information about current state; solution is a tree or
policy; often interleave search and execution.
 If uncertainty is caused by actions of another agent: adversarial problem
o Unknown state space ⇒exploration problem (―online‖)
 When states and actions of the environment are unknown.


Problem Solutions need Well-Defined Problems, and Well Defined Problems need to
embody explicit solutions on possible solutions: well defined problems must define the space
of possible solutions.We use searching to solve well defined problems.

Constraint satisfaction problems
What is a CSP?

Finite set of variables V1, V2, …, Vn
Finite set of constraints C1, C2, …, Cm
Nonemtpy domain of possible values for each variables DV1, DV2, … DVn
Each constraint Ci limits the values that variables can take,
 e.g., V1 ≠ V2
A state is defined as an assignment of values to some or all variables.
Consistent assignment: assignment does not not violate the constraints.
An assignment is complete when every value is mentioned.
A solution to a CSP is a complete assignment that satisfies all constraints.
Some CSPs require a solution that maximizes an objective function.
Applications: Scheduling the time of observations on the Hubble Space Telescope, Floor planning,
Map coloring, Cryptography
CSPs are a special kind of problem: states defined by values of a fixed set of variables, goal test
defined by constraints on variable values

Varieties of Constraints
Unary constraints involve a single variable.
e.g. SA ¹ green
Binary constraints involve pairs of variables.
e.g. SA ¹ WA
Higher-order constraints involve 3 or more variables.
e.g. cryptharithmetic column constraints.
Preference (soft constraints) e.g. red is better than greenoften representable by a cost for each
variable assignment constrained optimization problems.

CSP example: map coloring

Variables: WA, NT, Q, NSW, V, SA, T
Domains: Di={red,green,blue}
Constraints: adjacent regions must have different colors.
o E.g. WA ¹ NT (if the language allows this)
o E.g. (WA,NT) ¹ {(red,green),(red,blue),(green,red),…}


Solutions are assignments satisfying all constraints, e.g.
{WA=red,NT=green,Q=red,NSW=green,V=red,SA=blue,T=green}

Constraint graph
CSP benefits
Standard representation pattern
Generic goal and successor functions
Generic heuristics (no domain specific expertise).
Constraint graph = nodes are variables, edges show constraints.
Graph can be used to simplify search.
o e.g. Tasmania is an independent subproblem.

Cryptarithmetic conventions

Each letter or symbol represents only one digit throughout the problem;
When letters are replaced by their digits, the resultant arithmetical operation must be
correct;
The numerical base, unless specifically stated, is 10;
Numbers must not begin with a zero;
There must be only one solution to the problem.
1.
S E N D
+ M O R E
------------
M O N E Y

We see at once that M in the total must be 1, since the total of the column SM cannot reach as
high as 20. Now if M in this column is replaced by 1, how can we make this column total as
much as 10 to provide the 1 carried over to the left below? Only by making S very large: 9 or
8. In either case the letter O must stand for zero: the summation of SM could produce only 10
or 11, but we cannot use 1 for letter O as we have already used it for M.

If letter O is zero, then in column EO we cannot reach a total as high as 10, so that there will
be no 1 to carry over from this column to SM. Hence S must positively be 9.

Since the summation EO gives N, and letter O is zero, N must be 1 greater than E and the
column NR must total over 10. To put it into an equation: E + 1 = N

From the NR column we can derive the equation: N + R + (+ 1) = E + 10

We have to insert the expression (+ 1) because we don‘t know yet whether 1 is carried over
from column DE. But we do know that 1 has to be carried over from column NR to EO.

Subtract the first equation from the second: R + (+1) = 9

We cannot let R equal 9, since we already have S equal to 9. Therefore we will have to make
R equal to 8; hence we know that 1 has to be carried over from column DE.

Column DE must total at least 12, since Y cannot be 1 or zero. What values can we give D
and E to reach this total? We have already used 9 and 8 elsewhere. The only digits left that
are high enough are 7, 6 and 7, 5. But remember that one of these has to be E, and N is 1
greater than E. Hence E must be 5, N must be 6, while D is 7. Then Y turns out to be 2, and
the puzzle is completely solved.

S E N D
9 5 6 7
+ M O R E
1 0 8 5
---------
M O N E Y
1 0 6 5 2

2.
T W O
+ T W O
_____
F O U R

Since, Lets first check with F as 0.Now imagine O with highest possible value 9.Now R must be 8
and T should be 4. Now among the remaining numbers if we check then we get U as 3.Thus W must
be 6,

T W O
4 6 9
+ T W O
4 6 9
_____
F O U R
0 9 3 8

Game Playing
Summary
Games are fun (and dangerous)
They illustrate several important points about AI
Perfection is unattainable -> approximation

Good idea what to think about
Uncertainty constrains the assignment of values to states
Games are to AI as grand prix racing is to automobile design.

Games are a form of multi-agent environment
o What do other agents do and how do they affect our success?
o Cooperative vs. competitive multi-agent environments.
o Competitive multi-agent environments give rise to adversarialproblems a.k.a. games
Why study games?
o Fun; historically entertaining
o Interesting subject of study because they are hard
 Chess game:
 average branch factor: 35, each player: 50 moves-> Search tree: 35100 nodes

Relation of Search and Games
Search – no adversary
Solution is (heuristic) method for finding goal
Heuristics and CSP techniques can find optimal solution
Evaluation function: estimate of cost from start to goal through given node
Examples: path planning, scheduling activities
Games – adversary
Solution is strategy (strategy specifies move for every possible opponent reply).
Time limits force an approximate solution
Evaluation function: evaluate ―goodness‖ of game position
Examples: chess, checkers, Othello, backgammon

Types Of Games

Multiplayer Games allow more than one player

Game setup
Two players: MAX and MIN
MAX moves first and they take turns until the game is over. Winner gets award, looser gets
penalty.
Games as search:
o Initial state: e.g. board configuration of chess
o Successor function: list of (move,state) pairs specifying legal moves.
o Terminal test: Is the game finished?
o Utility function: Gives numerical value of terminal states.
 E.g. win (+1), loose (-1) and draw (0) in tic-tac-toe (next)
MAX uses search tree to determine next move.

Partial Game Tree for Tic Tac Toe

Optimal strategies
Find the contingent strategy for MAX assuming an infallible MIN opponent.
Assumption: Both players play optimally !!
Given a game tree, the optimal strategy can be determined by using the minimax value of
each node:
MINIMAX-VALUE(n)=

Two-Ply Game Tree

Minimax maximizes the worst-case outcome for max.


Production System
Production systems are applied to problem solving programs that must perform a wide-range of
seaches. Production ssytems are symbolic AI systems. The difference between these two terms is
only one of semantics. A symbolic AI system may not be restricted to the very definition of production
systems, but they can't be much different either.

Production systems are composed of three parts, a global database, production rules and a control
structure.

A production system (or production rule system) is a computer program typically used to provide
some form of artificial intelligence, which consists primarily of a set of rules about behavior. These
rules, termed productions, are a basic representation found useful in automated planning, expert
systems and action selection. A production system provides the mechanism necessary to execute
productions in order to achieve some goal for the system.
Productions consist of two parts: a sensory precondition (or "IF" statement) and an action (or
"THEN"). If a production's precondition matches the current state of the world, then the production is
said to betriggered. If a production's action is executed, it is said to have fired.

The first production systems were done by Newell and Simon in the 1950s, and the idea was written
up in their (1972).

"Production" in the title of these notes (or "production rule") is a synonym for "rule", i.e. for a
condition-action rule (see below). The term seems to have originated with the term used for
rewriting rules in the Chomsky hierarchy of grammar types, where for example context-free
grammar rules are sometimes referred to as context-free productions.

Rules

These are also called condition-action rules.
These components of a rule-based system have the form:
if <condition> then <conclusion>

or
if <condition> then <action>

Example:
if patient has high levels of the enzyme ferritin in their blood
and patient has the Cys282→Tyr mutation in HFE gene
then conclude patient has haemochromatosis*

* medical validity of this rule is not asserted here

Rules can be evaluated by:

backward chaining
forward chaining


Backward Chaining

To determine if a decision should be made, work backwards looking for justifications for the
decision.
Eventually, a decision must be justified by facts.

Forward Chaining

Given some facts, work forward through inference net.
Discovers what conclusions can be derived from data.

Forward Chaining 2
Until a problem is solved or no rule's 'if' part is satisfied by the current situation:

1. Collect rules whose 'if' parts are satisfied.
2. If more than one rule's 'if' part is satisfied, use a conflict resolution strategy to eliminate all
but one.
3. Do what the rule's 'then' part says to do.

Production Rules
A production rule system consists of


a set of rules
working memory that stores temporary data
a forward chaining inference engine

Match-Resolve-Act Cycle

The match-resolve-act cycle is what the inference engine does.

loop
match conditions of rules with contents of working memory
if no rule matches then stop
resolve conflicts
act (i.e. perform conclusion part of rule)

end loop

Chapter-3

3.1. Uninformed Search
3.1.1 Breadth-first search (BFS)
 Description
 A simple strategy in which the root is expanded first then all the root successors are expanded next, then
their successors.
 We visit the search tree level by level that all nodes are expanded at a given depth before any nodes at
the next level are expanded.
 Order in which nodes are expanded.

 Performance Measure:
 Completeness:
 it is easy to see that breadth-first search is complete that it visit all levels given that d factor is finite, so
in some d it will find a solution.
 Optimality:
 breadth-first search is not optimal until all actions have the same cost.
 Space complexity and Time complexity:
 Consider a state space where each node as a branching factor b, the root of the tree generates b
2 3
nodes, each of which generates b nodes yielding b each of these generates b and so on.
 In the worst case, suppose that our solution is at depth d, and we expand all nodes but the last node
2 3 4 d+1 d+1
at level d, then the total number of generated nodes is: b + b + b + b + b – b = O(b ), which is
the time complexity of BFS.


 As all the nodes must retain in memory while we expand our search, then the space complexity is like
d+1
the time complexity plus the root node = O(b ).
 Conclusion:
 We see that space complexity is the biggest problem for BFS than its exponential execution time.
 Time complexity is still a major problem, to convince your-self look at the table below.

3.1.2. Depth-first search (DFS)
 Description:
 DFS progresses by expanding the first child node of the search tree that appears and thus going deeper
and deeper until a goal node is found, or until it hits a node that has no children. Then the
search backtracks, returning to the most recent node it hasn’t finished exploring.
 Order in which nodes are expanded

 Performance Measure:
 Completeness:
 DFS is not complete, to convince yourself consider that our search start expanding the left sub tree of
the root for so long path (may be infinite) when different choice near the root could lead to a solution,
now suppose that the left sub tree of the root has no solution, and it is unbounded, then the search
will continue going deep infinitely, in this case we say that DFS is not complete.
 Optimality:
 Consider the scenario that there is more than one goal node, and our search decided to first expand
the left sub tree of the root where there is a solution at a very deep level of this left sub tree, in the
same time the right sub tree of the root has a solution near the root, here comes the non-optimality of
DFS that it is not guaranteed that the first goal to find is the optimal one, so we conclude that DFS is
not optimal.
 Time Complexity:
 Consider a state space that is identical to that of BFS, with branching factor b, and we start the search
from the root.


 In the worst case that goal will be in the shallowest level in the search tree resulting in generating all
m
tree nodes which are O(b ).
 Space Complexity:
 Unlike BFS, our DFS has a very modest memory requirements, it needs to story only the path from
the root to the leaf node, beside the siblings of each node on the path, remember that BFS needs to
store all the explored nodes in memory.
 DFS removes a node from memory once all of its descendants have been expanded.
 With branching factor b and maximum depth m, DFS requires storage of only bm + 1 nodes which
d+1
areO(bm) compared to the O(b ) of the BFS.
 Conclusion:
 DFS may suffer from non-termination when the length of a path in the search tree is infinite, so we
perform DFS to a limited depth which is called Depth-limited Search.

3.1.3 Depth Limited Search

• Breadth first has computational, especially, space problems. Depth first can run off down a very
long (or infinite) path..
• Idea: introduce a depth limit on branches to be expanded.
• Don‘t expand a branch below this depth.
• Most useful if you know the maximum depth of the solution.

 Perform depth first search but only to a pre-specified depth limit L.
 No node on a path that is more than L steps from the initial state is placed on the Frontier.
 We ―truncate‖ the search by looking only at paths of length L or less.
Description:
 The unbounded tree problem appeared in DFS can be fixed by imposing a limit on the depth that DFS
can reach, this limit we will call depth limit l, this solves the infinite path problem.
Performance Measure:
 Completeness:
 The limited path introduces another problem which is the case when we choose l < d, in
which is our DLS will never reach a goal, in this case we can say that DLS is not complete.
 Optimality:
 One can view DFS as a special case of the depth DLS, that DFS is DLS with l = infinity.
 DLS is not optimal even if l > d.
l
 Time Complexity: O(b )
 Space Complexity: O(bl)
Conclusion:
 DLS can be used when the there is a prior knowledge to the problem, which is always not the case,
Typically, we will not know the depth of the shallowest goal of a problem unless we solved this
problem before.

It is Depth First -search with depth limit l.
 i.e. nodes at depth l have no successors.
 Problem knowledge can be used
Solves the infinite-path problem.
If l < d then incompleteness results.
If l > d then not optimal.
Time complexity: O(bl )
Space complexity: O(bl )


Advantages
Will always terminate
Will find solution if there is one in the depth bound
Disadvantages
• Too small a depth bound misses solutions
• Too large a depth bound may find poor solutions when there are better ones

3.1.4. Search Strategies’ Comparison:
Here is a table that compares the performance measures of each search strategy.

3.2. Informed Search
- more powerful than uninformed
- Informed = use problem-specific knowledge

3.2.1. Hill Climbing
 Here feedback from the test procedure is used to help the generator decide which direction to
move in search space.
 The test function is augmented with a heuristic function that provides an estimate of how
close a given state is to the goal state.
 Computation of heuristic function can be done with negligible amount of computation.
 Greedy local search
Hill climbing is often used when a good heuristic function is available for evaluating states but when
no other useful knowledge is available


 Loop that continuously moves in the direction of increasing value
 Terminates when it reaches a ―Peak‖
 Problem: depending on initial state, can get stuck in local maxima

This simple policy has three well-known drawbacks:

1. Local Maxima: a local maximum
as opposed to global maximum.

2. Plateaus: An area of the search
space where evaluation function is
flat, thus requiring random walk.

3. Ridge: Where there are steep
slopes and the search direction is
not towards the top but towards the
side.
Variations of Hill Climbing
Stochastic hill-climbing
o Random selection among the uphill moves.
o The selection probability can vary with the steepness of the uphill move.
First-choice hill-climbing
o cfr. stochastic hill climbing by generating successors randomly until a better one is
found.
Random-restart hill-climbing
o Tries to avoid getting stuck in local maxima.

3.2.2. Best First Search
General approach of informed search:
o Best-first search: node is selected for expansion based on an evaluation function f(n)
Idea: evaluation function measures distance to the goal.
o Choose node which appears best
Implementation:
o fringe is queue sorted in decreasing order of desirability.
o Special cases: greedy search, A* search
Best First Search is a general search strategy
Uses an evaluation function f(n) in deciding which node (in queue) to expand next
Note: ―best‖ could be misleading (it is relative, not absolute)
Greedy search is one type of Best First Search

3.2.2.1.Greedy Search

Use a heuristic h() (cost estimate to goal) as the evaluation function

Example: straight-line distance in finding a path from one city to another
Evaluation function f(n) = h(n) (heuristic)= (estimate of cost from n to goal)
e.g., hSLD(n) = straight-line distance from n to Bucharest
Greedy best-first search expands the node that appears to be closest to goal
Complete? No – can get stuck in loops, e.g., Iasi  Neamt  Iasi  Neamt 
Time? O(bm), but a good heuristic can give dramatic improvement
Space? O(bm) -- keeps all nodes in memory
Optimal? No
But can be acceptable in practice

3.2.2. A* Search
Best-known form of best-first search.
Idea: avoid expanding paths that are already expensive.
Evaluation function f(n)=g(n) + h(n)
o g(n) the cost (so far) to reach the node.
o h(n) estimated cost to get from the node to the goal.
o f(n) estimated total cost of path through n to goal.
A* search uses an admissible heuristic
o A heuristic is admissible if it never overestimates the cost to reach the goal
o Are optimistic

Formally:
1. h(n) <= h*(n) where h*(n) is the true cost from n
2. h(n) >= 0 so h(G)=0 for any goal G.
e.g. hSLD(n) never overestimates the actual road distance
example:
Find Bucharest starting at Arad
f(Arad) = c(??,Arad)+h(Arad)=0+366=366
Initial State:


Expand Arrad and determine f(n) for each node
f(Sibiu)=c(Arad,Sibiu)+h(Sibiu)=140+253=393
f(Timisoara)=c(Arad,Timisoara)+h(Timisoara)=118+329=447
f(Zerind)=c(Arad,Zerind)+h(Zerind)=75+374=449
Best choice is Sibiu

And so on…

Admissible Heuristic
A heuristic h(n) is admissible if for every node n,
h(n) ≤ h*(n), where h*(n) is the true cost to reach the goal state from n.
An admissible heuristic never overestimates the cost to reach the goal, i.e., it is optimistic
Example: hSLD(n) (never overestimates the actual road distance)
Theorem: If h(n) is admissible, A* using TREE-SEARCH is optimal

A* Search Evaluation
Completeness: YES
Time complexity: (exponential with path length)
Space complexity:(all nodes are stored)
Optimality: YES
 Cannot expand fi+1 until fi is finished.
 A* expands all nodes with f(n)< C*
 A* expands some nodes with f(n)=C*


 A* expands no nodes with f(n)>C*
Also optimally efficient (not including ties)

3.2.3. Adversarial Search
MINMAX procedure
 Perfect play for deterministic games
 Idea: choose move to position with highest minimax value = best achievable payoff against
best play
 E.g., 2-ply game:

MINMAX Algorithm
minimax(player,board)
if(game over in current board position)
return winner
children = all legal moves for player from this board
if(max's turn)
return maximal score of calling minimax on all the children
else (min's turn)
return minimal score of calling minimax on all the children

Complete? Yes (if tree is finite)
Optimal? Yes (against an optimal opponent)
Time complexity? O(bm)
Space complexity? O(bm) (depth-first exploration)
For chess, b ≈ 35, m ≈100 for "reasonable" games
 exact solution completely infeasible

Alpha Beta Pruning

ALPHA-BETA pruning is a method that reduces the number of nodes explored in Minimax
strategy.
It reduces the time required for the search and it must be restricted so that no time is to be
wasted searching moves that are obviously bad for the current player.
The exact implementation of alpha-beta keeps track of the best move for each side as it moves
throughout the tree.


Properties of α-β
Pruning does not affect final result
Good move ordering improves effectiveness of pruning


With "perfect ordering," time complexity = O(bm/2)
 doubles depth of search
Why it is called alpha-beta?
 A simple example of the value of reasoning about which computations are relevant
 α is the value of the best (i.e., highest-value) choice found so far at any choice point along the
path for max
 If v is worse than α, max will avoid it
 prune that branch
 Define β similarly for min

Chapter 4

4.1.1 Logics are formal languages for formalizing reasoning, in particular for representing
information such that conclusions can be drawn
Logic involves:
– A language with a syntax for specifying what is a legal expression in the language;
syntax defines well formed sentences in the language
– Semantics for associating elements of the language with elements of some subject
matter. Semantics defines the "meaning" of sentences (link to the world); i.e.,
semantics defines the truth of a sentence with respect to each possible world
– Inference rules for manipulating sentences in the language

4.1.2. Syntax (grammar, internal structure of the language)
– Vocabulary: grammatical categories
– Identifying Well-Formed Formulae (―WFFs‖)
4.1.3 Semantics (pertaining to meaning and truth value)
– Translation
– Truth functions
– Truth tables for the connectives

4.1.4. Connectives (“Sentence-Forming Operators”)
~ negation ―not,‖ ―it is not the case that‖
⋅ conjunction ―and‖
∨ disjunction ―or‖ (inclusive)
⊃ conditional ―if – then,‖ ―implies‖
≣ biconditional ―if and only if,‖ ―iff‖
• Connect to sentences to make new sentences
• Negation attaches to one sentence
– It is not raining ∼ R


• Conjunction, disjunction, conditional and biconditional attach two sentences together
– It is raining and it is cold R ∙ C
– If it rains then it pours R⊃P

4.1.5. Well-Formed Formulae
Rules for WFF
1. A sentence letter by itself is a WFF
A B Z
2. The result of putting immediately in front of a WFF is a WFF
A B B (A B) ( C D)
3. The result of putting , , , or between two WFFs and surrounding the whole thing with
parentheses is a WFF
(A B) ( C D) (( C D) (E (F G)))
4. Outside parentheses may be dropped
A B C D ( C D) (E (F G))
A sentence that can be constructed by applying the rules for constructing WFFs one at a time is a
WFF
A sentence which can't be so constructed is not a WFF.
– Atomic sentences are wffs:
Propositional symbol (atom)
Examples: P, Q, R, BlockIsRed, SeasonIsWinter
– Complex or compound wffs.
Given w1 and w2 wffs:
w1 (negation)
(w1 w2) (conjunction)
(w1 w2) (disjunction)
(w1 w2) (implication; w1 is the antecedent;
w2 is the consequent)
(w1 w2) (biconditional)
4.1.6. Tautology
If a wff is True under all the interpretations of its constituents atoms, we say that
the wff is valid or it is a tautology.
Examples:
1 P P 2 (P P) 3 [P (Q P)] 4 [(P Q) P) P]
An inconsistent sentence or contradiction is a sentence that is False under all interpretations. The
world is never like what it describes, as in ―It‘s raining and it‘s not raining.‖

4.1.7.Validity
An argument is valid whenever the truth of all its premises implies the truth of its conclusion.

An argument is a sequence of propositions. The final proposition is called the conclusion of the argument while the other
proposition are called the premises or hypotheses of the argument.
one can use the rules of inference to show the validity of an argument.
Note that p1, p2, … q are generally compound propositions or wffs.

4.2.

Intelligent agents should have capacity for:


 Perceiving: acquiring information from environment,
 Knowledge Representation: representing its understanding of the world,
 Reasoning: inferring the implications of what it knows and of the choices it has, and
 Acting: choosing what it want to do and carry it out.
4.2.1.Knowledge Base
 Representation of knowledge and the reasoning processes that brings knowledge to life –
center to entire field of AI
 Knowledge and reasoning also play a crucial role in dealing partially observable
environments
 Central component of Knowledge-based agent is its knowledge base.

 Knowledge base = set of sentences in a formal language
 Declarative approach to building an agent (or other system):
 TELL it what it needs to know
 Then it can Ask itself what to do
- answers should follow from the KB
4.2.2.Entailment
Entailment means that one thing follows from another:
KB ╞ α
Knowledge base KB entails sentence α if and only if α is true in all worlds where KB is true
o e.g., the KB containing ―the Giants won‖ and ―the Reds won‖ entails ―Either the
Giants won or the Reds won‖
o E.g., x+y = 4 entails 4 = x+y
o Entailment is a relationship between sentences (i.e., syntax) that is based on
semantics
Inference
Notation :KB ├i α = sentence α can be derived from KB by procedure i
Soundness: i is sound if whenever KB ├i α, it is also true that KB╞ α
Completeness: i is complete if whenever KB╞ α, it is also true that KB ├i α

Sound Rules of Inference
Here are some examples of sound rules of inference
 A rule is sound if its conclusion is true whenever the premise is true
Each can be shown to be sound using a truth table
RULE PREMISE CONCLUSION
Modus Ponens A, A B B
And Introduction A, B A B
And Elimination A B A
Double Negation A A
Unit Resolution A B, B A
Resolution A B, B C A C


Soundness of Modus Ponens

A B A→B OK?

True True True

True False False

False True True

False False True

Horn Clause
A Horn sentence or Horn clause has the form:
P1 P2 P3 ... Pn Q
or alternatively
P1 P2 P3 ... Pn Q
where Ps and Q are non-negated atoms
• To get a proof for Horn sentences, apply Modus Ponens repeatedly until nothing can be done
• We will use the Horn clause form later

4.2.3.Propositional Logic
Propositional Logic Syntax
 Propositional logic is the simplest logic – illustrates basic ideas
 All objects described are fixed or unique
 E.g. "John is a student" student(john) ; Here John refers to one unique person.
 In propositional logic (PL) an user defines a set of propositional symbols, like P and Q. User
defines the semantics of each of these symbols. For example,
 P means "It is hot"
 Q means "It is humid―
 R means "It is raining"
 The proposition symbols:
S, S1, S2 etc are sentences
_ If S is a sentence, ØS is a sentence (negation )
_ If S1 and S2 are sentences, S1 Ù S2 is a sentence (conjunction )
_ If S1 and S2 are sentences, S1 Ú S2 is a sentence (disjunction )
_ If S1 and S2 are sentences, S1 => S2 is a sentence (implication )
_ If S1 and S2 are sentences, S1  S2 is a sentence (biconditional )

Propositional Logic Semantics
 Each model specifies true/false for each proposition symbol
With these symbols, 8 possible models, can be enumerated automatically.
Rules for evaluating truth with respect to a model m:
S is true iff S is false
S1 S2 is true iff S1 is true and S2 is true
S1 S2 is true iff S1is true or S2 is true
S1 S2 is true iff S1 is false or S2 is true
i.e., is false iff S1 is true and S2 is false


S1 S2 is true iff S1 S2 is true and S2 S1 is true
 Simple recursive process evaluates an arbitrary sentence, e.g.,
P1,2 (P2,2 P3,1) = true (true false) = true true = true

Truth Table for Connectives

Validity and satisfiability
A sentence is valid if it is true in all models,
e.g., True, A A, A A, (A (A B)) B
Validity is connected to inference via the Deduction Theorem:
KB ╞ α if and only if (KB α) is valid
A sentence is satisfiable if it is true in some model
e.g., A B, C
A sentence is unsatisfiable if it is true in no models
e.g., A A
Satisfiability is connected to inference via the following:
KB ╞ α if and only if (KB α) is unsatisfiable

Logical Equivalence
 Two sentences are logically equivalent iff true in same models: α ≡ ß iff α╞ β and β╞ α

Resolution
 Conjunctive Normal Form (CNF)
o conjunction of disjunctions of literals clauses
 E.g., (A Ú ØB) Ù (B Ú ØC Ú ØD)
 Resolution is sound and complete for propositional logic
 Conversion to CNF

B1,1 (P1,2 P2,1)β
1. Eliminate , replacing α β with (α β) (β α).
(B1,1 (P1,2 P2,1)) ((P1,2 P2,1) B1,1)

2. Eliminate , replacing α β with α β.
( B1,1 P1,2 P2,1) ( (P1,2 P2,1) B1,1)
3. Move inwards using de Morgan's rules and double-negation:
( B1,1 P1,2 P2,1) (( P1,2 P2,1) B1,1)
4. Apply distributivity law ( over ) and flatten:
( B1,1 P1,2 P2,1) ( P1,2 B1,1) ( P2,1 B1,1)
 Resolution Algorithm
 Proof by contradiction, i.e., show KB α unsatisfiable

 Proportional Resolution

Advantages of propositional logic:
· Simple.
· No decidability problems.

Limitations of Propositional Calculus
 An argument may not be provable using propositional logic, but may be provable using
predicate logic.
 e.g. All horses are animals.
Therefore, the head of a horse is the head of an animal.


We know that this argument is correct and yet it cannot be proved under propositional logic,
but it can be proved under predicate logic.
 Limited representational power.
 Simple statements may require large and awkward representations.
4.2.4.First Order Predicate Logic (FOPL)

Predicate Logic (FOPL) provides
i) A language to express assertions (axioms) about certain "worlds ".
ii) An inference system or deductive apparatus whereby we may draw conclusions from
such assertions and
iii) A semantics based on set theory.

The language of FOPL consists of
i) A set of constant symbols (to name particular individuals such as table, a,b,c,d,e etc. - these depend
on the application)
ii) A set of variables (to refer to arbitrary individuals)
iii) A set of predicate symbols (to represent relations such as On, Above etc. -these depend on the
application)
iv) A set of function symbols (to represent functions - these depend on the application)
v) The logical connectives −, . , υ ,ω , ¬ (to capture and, or, implies, iff and not)
vi) The Universal Quantifier, ∀ : and the Existential Quantifer, ∃ :(to capture ―all‖, ―every‖, ―some‖,
―few‖, ―there exists‖ etc.)
vii) Normally a special binary relation of equality (=) is considered (at least in mathematics) as part of
the language.
Quantification
Universal Qunatification
 <variables> <sentence>
Everyone at KEC is smart:
x At(x,KEC) Smart(x)
x P is true in a model m iff P is true with x being each possible object in the model
 Roughly speaking, equivalent to the conjunction of instantiations of P
At(KingJohn,KEC) Smart(KingJohn)
At(Richard,KEC) Smart(Richard)

 Common mistake to avoid:
 Typically, is the main connective with
 Common mistake: using as the main connective with :
x At(x,KEC) Smart(x) means ―Everyone is at KEC and everyone is smart
Existential Quantification
 <variables> <sentence>
 Someone at KEC is smart:
 x At(x,KEC) Smart(x)
 x P is true in a model m iff P is true with x being some
possible object in the model
 Typically, is the main connective with
 Common mistake: using as the main connective with :
x At(x,KEC) Smart(x) is true if there is anyone who is not at KEC


Properties of Quantifiers
 x y is the same as y x
 x y is the same as y x
 x y is not the same as y x
x y Loves(x,y)
 ―There is a person who loves everyone in the world‖
 y x Loves(x,y)
 ―Everyone in the world is loved by at least one person‖
 Quantifier duality: each can be expressed using the other
 x Likes(x,IceCream) x Likes(x,IceCream)
 x Likes(x,Broccoli) x Likes(x,Broccoli)
Example 1
For example, Suppose we wish to represent in FOPL the following sentences
a) ―Everyone loves Janet‖
b) ―Not everyone loves Daphne‖
c) ―Everyone is loved by their mother‖
Introducing constant symbols j and d to represent Janet and Daphne respectively; a binary
predicate symbol L to represent loves and the unary function symbol1 m to represent the
mother of a person given as argument.
The above sentences may now be represented in FOPL by
a) ∀x.L(x,j)
b) ∃x.¬L(x,d)
c) ∀x.L(m(x),x)

Example 2
We will express the following in first order predicate calculus
―sam is Kind‖
―Every kind person has someone who loves them‖
―sam loves someone‖
The non-logical symbols of our language are
the constant sam and
the unary predicate (or property) Kind and
the binary predicate Loves.
We may represent the above sentences as
1. Kind(sam)
2. ∀x.(Kind(x) υ ∃y.Loves(y,x))
3. ∃y Loves(sam,y)

Some Semantic Issues
An interpretation (of the language of FOPL) consists of
a) a non empty set of objects (the Universe of Discourse, D) containing designated
individuals named by the constant symbols
b) for each function symbol in the language of FOPL, a corresponding function over D.
c) for each predicate symbol in the language of FOPL, a corresponding relation over D.

An interpretation is said to be a model for a set of sentences Γ, if each sentence of Γ is true
under the given interpretation.


 The interpretation of a formula F in first order predicate logic consists of fixing a
domain of values (non empty) D and of an association of values for every constant,
function and predicate in the formula F as follows:
 (1) Every constant has an associated value in D.
 (2) Every function f, of arity n, is defined by the correspondence
where D n = {(x 1 ,..., x n ) | x1 D,..., x n D}
n
 (3) Every predicate of arity n, is defined by the correspondence P : D {a, f }
 Interpretation Example

Using FOL
 Brothers are siblings
x,y Brother(x,y) Sibling(x,y)
 One's mother is one's female parent
m,c Mother(c) = m (Female(m) Parent(m,c))
 ―Sibling‖ is symmetric
x,y Sibling(x,y) Sibling(y,x)
 Marcus was a man
 Man(Marcus)
 Marcus was a Pompeian
 Pompeian(Marcus)
 All Pompeians were Romans
 x:Pompeian(x)Roman(x)
 All Romans were either loyal to Caesar or hated him
 x:Roman(x) loyalto(x,Caesar) V hate(x, Caesar)
 Everyone is loyal to someone
 x: y: loyalto(x,y)
 People only try to assassinate rulers they are not loyal to
x: y: person(x) AND ruler(y) AND tryassassinate(x,y) ~loyalto(x,y)


4.3

Inference Rules
Complex deductive arguments can be judged valid or invalid based on whether or not the steps in that
argument follow the nine basic rules of inference. These rules of inference are all relatively simple,
although when presented in formal terms they can look overly complex.
Conjunction:
1. P
2. Q
3. Therefore, P and Q.
1. It is raining in New York.
2. It is raining in Boston
3. Therefore, it is raining in both New York and Boston
Simplification
1. P and Q.
2. Therefore, P.
1. It is raining in both New York and Boston.
2. Therefore, it is raining in New York.
Addition
1. P
2. Therefore, P or Q.
1. It is raining
2. Therefore, either either it is raining or the sun is shining.
Absorption
1. If P, then Q.
2. Therfore, If P then P and Q.
1. If it is raining, then I will get wet.
2. Therefore, if it is raining, then it is raining and I will get wet.
Modus Ponens
1. If P then Q.
2. P.
3. Therefore, Q.
1. If it is raining, then I will get wet.
2. It is raining.
3. Therefore, I will get wet.
Modus Tollens
1. If P then Q.
2. Not Q. (~Q).
3. Therefore, not P (~P).


1. If it had rained this morning, I would have gotten wet.
2. I did not get wet.
3. Therefore, it did not rain this morning.
Hypothetical Syllogism
1. If P then Q.
2. If Q then R.
3. Therefore, if P then R.
1. If it rains, then I will get wet.
2. If I get wet, then my shirt will be ruined.
3. If it rains, then my shirt will be ruined.
Disjunctive Syllogism
1. Either P or Q.
2. Not P (~P).
3. Therefore, Q.
1. Either it rained or I took a cab to the movies.
2. It did not rain.
3. Therefore, I took a cab to the movies.
Constructive Dilemma
1. (If P then Q) and (If R then S).
2. P or R.
3. Therefore, Q or S.
1. If it rains, then I will get wet and if it is sunny, then I will be dry.
2. Either it will rain or it will be sunny.
3. Therefore, either I will get wet or I will be dry.

The above rules of inference, when combined with the rules of replacement, mean that propositional
calculus is "complete." Propositional calculus is simply another name for formal logic

Unification
I in computer science and logic, is an algorithmic process by which one attempts to solve
the satisfiability problem. The goal of unification is to find a substitution which demonstrates that two
seemingly different terms are in fact either identical or just equal. Unification is widely used
in automated reasoning, logic programming and programming language type system implementation.

Several kinds of unification are commonly studied: that for theories without any equations (the empty
theory) is referred to as syntactic unification: one wishes to show that (pairs of) terms are identical.
If one has a non-empty equational theory, then one is typically interested in showing the equality of (a
pair of) terms; this is referred to as semantic unification. Since substitutions can be ordered into
a partial order, unification can be understood as the procedure of finding a join on a lattice.

We also need some way of binding variables to values in a consistent way so that components of
sentences can be matched. This is the process of Unification.

Binding

A binding list is a set of enteries of the form v = e where v is a variable and e is an object. Given an
expression p and a binding list we write for the instantiation of p using bindings in.

Unifier

Given two expressions p and q, a unifier is a binding list such that
= .
Most General Unifier

MGU is a unifier that binds the fewest variables or binds them to less specific expressions.

Most General Unifier (MGU) Algorithm for expressions p and q

1. If either p or q is either an object constant or a variable, then:

i). If p=q, then p and q already unify and we return { }.
ii). If either p or q is a variable, then return the result binding that variable to the other expression.
iii). Otherwise return failure.
2.If neither p nor q is an object constant or a variable, then they must both be compound expressions
(suppose each is made up ofp1,......pn and q1,......qm) and must be unified one component at a time.
i).If the types and any function/relation constant are not equal, return failure.

ii).If , then return failure.
iii).Otherwise and do the following
a).Set = { }, k = 0.
b).If k = n then stop and return as the mgu of p and q.

c).Otherwise, increment k and apply mgu recursively to and .

 If and unify, add new bindings to and return to step 2(c)ii.

 If and fail to unify then return failure for unification of p and q.

Resolution Refutation System
 Resolution is a technique for proving theorems in predicate calculus
 Resolution is a sound inference rule that, when used to produce a refutation, is also complete
 In an important practical application resolution theorem proving particularly the resolution
refutation system, has made the current generation of Prolog interpreters possible
 The resolution principle, describes a way of finding contradictions in a data base of clauses
with minimum substitution
 Resolution Refutation proves a theorem by negating the statement to be proved and adding
the negated goal to the set of axioms that are known or have been assumed to be true
 It then uses the resolution rule of inference to show that this leads to a contradiction
 Steps in Resolution Refutation Proof
1. Put the premises or axioms into clause form
2. Add the negations of what is to be proved in clause form, to the set of axioms
3. Resolve these clauses together, producing new clauses that logically follow from them
4. Produce a contradiction by generating the empty clause
Discussion on Steps
 Resolution Refutation proofs require that the axioms and the negation of the goal be placed in
a normal form called the clause form


 Clausal form represents the logical database as a set of disjunctions of literals
 Resolution is applied to two clauses when one contains a literal and the other its negation
 The substitutions used to produce the empty clause are those under which the opposite of the
negated goal is true
 If these literals contain variables, they must be unified to make them equivalent
 A new clause is then produced consisting of the disjunction of all the predicates in the two
clauses minus the literal and its negative instance (which are said to have been ―resolved
away‖)
 Example:
We wish to prove that ―Fido will die‖ from the statements that
―Fido is a dog‖ and ―all dogs are animals‖ and ―all animals will die‖
Convert these predicates to clause form
Predicate Form Clause Form

x: [dog(x)animal(x)] ¬ dog(x) V animal(x)

Dog(fido) Dog(fido)

y:[animal(y) die(y)] ¬ animal(y) V die(y)

Apply Resolution

Q.1. Anyone passing the Artificial Intelligence exam and winning the lottery is happy. But anyone
who studies or is lucky can pass all their exams. Ali did not study be he is lucky. Anyone who is lucky
wins the lottery. Is Ali happy?
Anyone passing the AI Exam and winning the lottery is happy
X:[pass(x,AI) Λ win(x, lottery) happy(x)]
Anyone who studies or is lucky can pass all their exams
X Y [studies(x) V lucky(x) pass(x,y)]
Ali did not study but he is lucky
¬ study(ali) Λ lucky(ali)
Anyone who is lucky wins the lottery
X: [lucky(x) win(x,lottery)]

Change to clausal form
1. ¬pass(X,AI) V ¬win(X,lottery) V happy(X)
2. ¬study(Y) V pass(Y,Z)
3. ¬lucky(W) V pass(W,V)


4. ¬study(ali)
5. Lucky(ali)
6. ¬lucky(u) V win(u,lottery)
7. Add negation of the conclusion ¬happy(ali)

4.4.
Symbolic versus statistical reasoning
The (Symbolic) methods basically represent uncertainty belief as being
True,
False, or
Neither True nor False.
Some methods also had problems with
Incomplete Knowledge
Contradictions in the knowledge.
Statistical methods provide a method for representing beliefs that are not certain (or uncertain) but for
which there may be some supporting (or contradictory) evidence.
Statistical methods offer advantages in two broad scenarios:
Genuine Randomness
-- Card games are a good example. We may not be able to predict any outcomes with
certainty but we have knowledge about the likelihood of certain items (e.g. like being dealt an
ace) and we can exploit this.
Exceptions
-- Symbolic methods can represent this. However if the number of exceptions is large such
system tend to break down. Many common sense and expert reasoning tasks for example.
Statistical techniques can summarise large exceptions without resorting enumeration.

Basic Statistical methods -- Probability
The basic approach statistical methods adopt to deal with uncertainty is via the axioms of probability:
Probabilities are (real) numbers in the range 0 to 1.
A probability of P(A) = 0 indicates total uncertainty in A, P(A) = 1 total certainty and values
in between some degree of (un)certainty.
Probabilities can be calculated in a number of ways.

Very Simply
Probability = (number of desired outcomes) / (total number of outcomes)

So given a pack of playing cards the probability of being dealt an ace from a full normal deck
is 4 (the number of aces) / 52 (number of cards in deck) which is 1/13. Similarly the
probability of being dealt a spade suit is 13 / 52 = 1/4.

If you have a choice of number of items k from a set of items n then the
formula is applied to find the number of ways of making this choice. (! = factorial).

So the chance of winning the national lottery (choosing 6 from 49) is to
1.
Conditional probability, P(A|B), indicates the probability of of event A given that we know
event B has occurred.

Bayes Theorem
This states:

o This reads that given some evidence E then probability that hypothesis is true is
equal to the ratio of the probability that E will be true given times the a
priori evidence on the probability of and the sum of the probability of E over the
set of all hypotheses times the probability of these hypotheses.
o The set of all hypotheses must be mutually exclusive and exhaustive.
o Thus to find if we examine medical evidence to diagnose an illness. We must know
all the prior probabilities of find symptom and also the probability of having an
illness based on certain symptoms being observed.
Bayesian statistics lie at the heart of most statistical reasoning systems.
How is Bayes theorem exploited?
The key is to formulate problem correctly:
P(A|B) states the probability of A given only B's evidence. If there is other relevant evidence
then it must also be considered.
Herein lies a problem:
All events must be mutually exclusive. However in real world problems events are not
generally unrelated. For example in diagnosing measles, the symptoms of spots and a fever
are related. This means that computing the conditional probabilities gets complex.
In general if a prior evidence, p and some new observation, N then computing

grows exponentially for large sets of p
All events must be exhaustive. This means that in order to compute all probabilities the set of
possible events must be closed. Thus if new information arises the set must be created afresh
and all probabilities recalculated.
Thus Simple Bayes rule-based systems are not suitable for uncertain reasoning.
Knowledge acquisition is very hard.
Too many probabilities needed -- too large a storage space.
Computation time is too large.
Updating new information is difficult and time consuming.
Exceptions like ``none of the above'' cannot be represented.
Humans are not very good probability estimators.
However, Bayesian statistics still provide the core to reasoning in many uncertain reasoning systems
with suitable enhancement to overcome the above problems.
We will look at three broad categories:


Certainty factors,
Dempster-Shafer models,
Bayesian networks.

Belief Models and Certainty Factors
This approach has been suggested by Shortliffe and Buchanan and used in their famous medical
diagnosis MYCIN system.
MYCIN is essentially and expert system. Here we only concentrate on the probabilistic reasoning
aspects of MYCIN.
MYCIN represents knowledge as a set of rules.
Associated with each rule is a certainty factor
A certainty factor is based on measures of belief B and disbelief D of an hypothesis given
evidence E as follows:

where is the standard probability.
The certainty factor C of some hypothesis given evidenceE is defined as:

Reasoning with Certainty factors
Rules expressed as if evidence list then there is suggestive evidence with
probability, p for symptom .
MYCIN uses rules to reason backward to clinical data evidence from its goal of predicting a
disease-causing organism.
Certainty factors initially supplied by experts changed according to previous formulae.
How do we perform reasoning when several rules are chained together?
Measures of belief and disbelief given several observations are calculated as follows:

How about our belief about several hypotheses taken together? Measures of belief given
several hypotheses and to be combined logically are calculated as follows:

Disbelief is calculated similarly.

Bayesian networks
These are also called Belief Networks or Probabilistic Inference Networks. Initially developed by
Pearl (1988).
The basic idea is:
Knowledge in the world is modular -- most events are conditionally independent of most
other events.
Adopt a model that can use a more local representation to allow interactions between events
that only affect each other.


Some events may only be unidirectional others may be bidirectional -- make a distinction
between these in model.
Events may be causal and thus get chained together in a network.

Implementation
A Bayesian Network is a directed acyclic graph:
o A graph where the directions are links which indicate dependencies that exist
between nodes.
o Nodes represent propositions about events or events themselves.
o Conditional probabilities quantify the strength of dependencies.
Consider the following example:
The probability, that my car won't start.
If my car won't start then it is likely that
o The battery is flat or
o The staring motor is broken.
In order to decide whether to fix the car myself or send it to the garage I make the following decision:
If the headlights do not work then the battery is likely to be flat so i fix it myself.
If the starting motor is defective then send car to garage.
If battery and starting motor both gone send car to garage.
The network to represent this is as follows:

Fig. A simple Bayesian network

Reasoning in Bayesian(belief) nets
Probabilities in links obey standard conditional probability axioms.
Therefore follow links in reaching hypothesis and update beliefs accordingly.
A few broad classes of algorithms have been used to help with this:
o Pearls's message passing method.
o Clique triangulation.
o Stochastic methods.
o Basically they all take advantage of clusters in the network and use their limits on the
influence to constrain the search through net.
o They also ensure that probabilities are updated correctly.
Since information is local information can be readily added and deleted with minimum effect
on the whole network. ONLY affected nodes need updating.
Example
o Consider problem: ―block-lifting‖
o B: the battery is charged.
o L: the block is liftable.
o M: the arm moves.
o G: the gauge indicates that the battery is charged


Chapter-5
Knowledge Representation.
solving complex AI problems requires large amounts of knowledge and mechanisms for manipulating
that knowledge. The inference mechanisms that operate on knowledge, relay on the ways knowledge
is represented. A good knowledge representation model allows for more powerful inference
mechanisms that operate on them. While representing knowledge one has to consider two things.
1. Facts, which are truths in some relevant world.
2. Representation of facts in some chosen formalism . These are the things which are actually
manipulated by inference mechanism.

Knowledge representation schemes are useful only if there are functions that map facts to
representations and vice versa. AI is more concerned with a natural language representation of facts
and the functions which map natural language sentences into some representational formalism. An
appealing way of representing facts is using the language of logic. Logical formalism provides a way
of deriving new knowledge from the old through mathematical deduction. In this formalism, we can
conclude that a new statement is true by proving that it follows from the statements already known to
be facts.

STRUCTURED REPRESNTATION OF KNOWLEDGE
Representing knowledge using logical formalism, like predicate logic, has several advantages. They
can be combined with powerful inference mechanisms like resolution, which makes reasoning with
facts easy. But using logical formalism complex structures of the world, objects and their
relationships, events, sequences of events etc. can not be described easily.

A good system for the representation of structured knowledge in a particular domain should posses
the following four properties:

(i) Representational Adequacy:- The ability to represent all kinds of knowledge that are needed in that
domain.

(ii) Inferential Adequacy :- The ability to manipulate the represented structure and infer new
structures.

(iii) Inferential Efficiency:- The ability to incorporate additional information into the knowledge
structure that will aid the inference mechanisms.

(iv) Acquisitional Efficiency :- The ability to acquire new information easily, either by direct insertion
or by program control.

The techniques that have been developed in AI systems to accomplish these objectives fall under two
categories:

1. Declarative Methods:- In these knowledge is represented as static collection of facts which are
manipulated by general procedures. Here the facts need to be stored only one and they can be used in
any number of ways. Facts can be easily added to declarative systems without changing the general
procedures.

2. Procedural Method:- In these knowledge is represented as procedures. Default reasoning and
probabilistic reasoning are examples of procedural methods. In these, heuristic knowledge of ―How to
do things efficiently ―can be easily represented.

In practice most of the knowledge representation employ a combination of both. Most of the
knowledge representation structures have been developed to handle programs that handle natural
language input. One of the reasons that knowledge structures are so important is that they provide a
way to represent information about commonly occurring patterns of things . such descriptions are
some times called schema. One definition of schema is

Ai complete note

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Ai complete note

Similar to Ai complete note (20)

Recently uploaded

Recently uploaded (20)

Ai complete note