• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Whole basic

Whole basic



all topic covered in this ms word document fiile so read it

all topic covered in this ms word document fiile so read it



Total Views
Views on SlideShare
Embed Views



1 Embed 1

http://www.slideshare.net 1



Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Whole basic Whole basic Document Transcript

    • Artificial Intelligence Amit purohitEvidence of Artificial Intelligence folklore can be traced back to ancient Egypt, but with the development of theelectronic computer in 1941, the technology finally became available to create machine intelligence. The termartificial intelligence was first coined in 1956, at the Dartmouth conference, and since then Artificial Intelligence hasexpanded because of the theories and principles developed by its dedicated researchers. Through its short modernhistory, advancement in the fields of AI have been slower than first estimated, progress continues to be made. Fromits birth 4 decades ago, there have been a variety of AI programs, and they have impacted other technologicaladvancements.DefinitionAI is the science and engineering of making intelligent machines, especially intelligent computerprograms. It is related to the similar task of using computers to understand human intelligence,but AI does not have to confine itself to methods that are biologically observable.Intelligence is the computational part of the ability to achieve goals in the world. Varying kindsand degrees of intelligence occur in people, many animals and some machines.Objectives1).To formally define AI.2).To discuss the character features of AI.3).To get the student acquainted with the essence of AI.4).To be able to distinguish betwee the human intelligence and AI.5).To give an overview of the applications where the AI technology can be used.6).To import the knowledge about the representation schemes like Production System, ProblemReduction.Turing TestAlan Turings 1950 article Computing Machinery and Intelligence [Tur50] discussed conditionsfor considering a machine to be intelligent. He argued that if the machine could successfullypretend to be human to a knowledgeable observer then you certainly should consider itintelligent. This test would satisfy most people but not all philosophers. The observer couldinteract with the machine and a human by teletype (to avoid requiring that the machine imitatethe appearance or voice of the person), and the human would try to persuade the observer that itwas human and the machine would try to fool the observer.
    • The Turing test is a one-sided test. A machine that passes the test should certainly be consideredintelligent, but a machine could still be considered intelligent without knowing enough abouthumans to imitate a human.Daniel Dennetts book Brainchildren [Den98] has an excellent discussion of the Turing test andthe various partial Turing tests that have been implemented, i.e. with restrictions on theobservers knowledge of AI and the subject matter of questioning. It turns out that some peopleare easily led into believing that a rather dumb program is intelligent.Background and HistoryEvidence of Artificial Intelligence folklore can be traced back to ancient Egypt, but with thedevelopment of the electronic computer in 1941, the technology finally became available tocreate machine intelligence. The term artificial intelligence was first coined in 1956, at theDartmouth conference, and since then Artificial Intelligence has expanded because of thetheories and principles developed by its dedicated researchers. Through its short modern history,advancement in the fields of AI have been slower than first estimated, progress continues to bemade. From its birth 4 decades ago, there have been a variety of AI programs, and they haveimpacted other technological advancements.In 1941 an invention revolutionized every aspect of the storage and processing of information.That invention, developed in both the US and Germany was the electronic computer. The firstcomputers required large, separate air-conditioned rooms, and were a programmers nightmare,involving the separate configuration of thousands of wires to even get a program running.The 1949 innovation, the stored program computer, made the job of entering a program easier,and advancements in computer theory lead to computer science, and eventually Artificialintelligence. With the invention of an electronic means of processing data, came a medium thatmade AI possible.Although the computer provided the technology necessary for AI, it was not until the early1950s that the link between human intelligence and machines was really observed. NorbertWiener was one of the first Americans to make observations on the principle of feedback theoryfeedback theory. The most familiar example of feedback theory is the thermostat: It controls thetemperature of an environment by gathering the actual temperature of the house, comparing it tothe desired temperature, and responding by turning the heat up or down. What was so importantabout his research into feedback loops was that Wiener theorized that all intelligent behavior wasthe result of feedback mechanisms. Mechanisms that could possibly be simulated by machines.This discovery influenced much of early development of AI.In late 1955, Newell and Simon developed The Logic Theorist, considered by many to be thefirst AI program. The program, representing each problem as a tree model, would attempt tosolve it by selecting the branch that would most likely result in the correct conclusion. Theimpact that the logic theorist made on both the public and the field of AI has made it a crucialstepping stone in developing the AI field.
    • In 1956 John McCarthy regarded as the father of AI, organized a conference to draw the talentand expertise of others interested in machine intelligence for a month of brainstorming. Heinvited them to Vermont for "The Dartmouth summer research project on artificial intelligence."From that point on, because of McCarthy, the field would be known as Artificial intelligence.Although not a huge success, (explain) the Dartmouth conference did bring together the foundersin AI, and served to lay the groundwork for the future of AI research.In the seven years after the conference, AI began to pick up momentum. Although the field wasstill undefined, ideas formed at the conference were re-examined, and built upon. Centers for AIresearch began forming at Carnegie Mellon and MIT, and a new challenges were faced: furtherresearch was placed upon creating systems that could efficiently solve problems, by limiting thesearch, such as the Logic Theorist. And second, making systems that could learn by themselves.In 1957, the first version of a new program The General Problem Solver(GPS) was tested. Theprogram developed by the same pair which developed the Logic Theorist. The GPS was anextension of Wieners feedback principle, and was capable of solving a greater extent of commonsense problems. A couple of years after the GPS, IBM contracted a team to research artificialintelligence. Herbert Gelerneter spent 3 years working on a program for solving geometrytheorems.While more programs were being produced, McCarthy was busy developing a majorbreakthrough in AI history. In 1958 McCarthy announced his new development; the LISPlanguage, which is still used today. LISP stands for LISt Processing, and was soon adopted as thelanguage of choice among most AI developers.During the 1970s Many new methods in the development of AI were tested, notably Minskysframes theory. Also David Marr proposed new theories about machine vision, for example, howit would be possible to distinguish an image based on the shading of an image, basic informationon shapes, color, edges, and texture. With analysis of this information, frames of what an imagemight be could then be referenced. another development during this time was the PROLOGUElanguage. The language was proposed for In 1972During the 1980s AI was moving at a faster pace, and further into the corporate sector. In 1986,US sales of AI-related hardware and software surged to $425 million. Expert systems inparticular demand because of their efficiency. Companies such as Digital Electronics were usingXCON, an expert system designed to program the large VAX computers. DuPont, GeneralMotors, and Boeing relied heavily on expert systems Indeed to keep up with the demand for thecomputer experts, companies such as Teknowledge and Intellicorp specializing in creatingsoftware to aid in producing expert systems formed. Other expert systems were designed to findand correct flaws in existing expert systems.Overview of AI Application AreasGame Playing
    • You can buy machines that can play master level chess for a few hundred dollars. There is someAI in them, but they play well against people mainly through brute force computation--looking athundreds of thousands of positions. To beat a world champion by brute force and known reliableheuristics requires being able to look at 200 million positions per second.Speech RecognitionIn the 1990s, computer speech recognition reached a practical level for limited purposes. ThusUnited Airlines has replaced its keyboard tree for flight information by a system using speechrecognition of flight numbers and city names. It is quite convenient. On the the other hand, whileit is possible to instruct some computers using speech, most users have gone back to thekeyboard and the mouse as still more convenient.Understanding Natural LanguageJust getting a sequence of words into a computer is not enough. Parsing sentences is not enougheither. The computer has to be provided with an understanding of the domain the text is about,and this is presently possible only for very limited domains.Computer VisionThe world is composed of three-dimensional objects, but the inputs to the human eye andcomputers TV cameras are two dimensional. Some useful programs can work solely in twodimensions, but full computer vision requires partial three-dimensional information that is notjust a set of two-dimensional views. At present there are only limited ways of representing three-dimensional information directly, and they are not as good as what humans evidently use.Expert SystemsA "knowledge engineer" interviews experts in a certain domain and tries to embody theirknowledge in a computer program for carrying out some task. How well this works depends onwhether the intellectual mechanisms required for the task are within the present state of AI.When this turned out not to be so, there were many disappointing results. One of the first expertsystems was MYCIN in 1974, which diagnosed bacterial infections of the blood and suggestedtreatments. It did better than medical students or practicing doctors, provided its limitations wereobserved. Namely, its ontology included bacteria, symptoms, and treatments and did not includepatients, doctors, hospitals, death, recovery, and events occurring in time. Its interactionsdepended on a single patient being considered. Since the experts consulted by the knowledgeengineers knew about patients, doctors, death, recovery, etc., it is clear that the knowledgeengineers forced what the experts told them into a predetermined framework. In the present stateof AI, this has to be true. The usefulness of current expert systems depends on their users havingcommon sense.Heuristic Classification
    • One of the most feasible kinds of expert system given the present knowledge of AI is to put someinformation in one of a fixed set of categories using several sources of information. An exampleis advising whether to accept a proposed credit card purchase. Information is available about theowner of the credit card, his record of payment and also about the item he is buying and aboutthe establishment from which he is buying it (e.g., about whether there have been previous creditcard frauds at this establishment).Production SystemProduction systems are applied to problem solving programs that must perform a wide-range ofseaches. Production ssytems are symbolic AI systems. The difference between these two terms isonly one of semantics. A symbolic AI system may not be restricted to the very definition ofproduction systems, but they cant be much different either.Production systems are composed of three parts, a global database, production rules and a controlstructure.The global database is the systems short-term memory. These are collections of facts that are tobe analyzed. A part of the global database represents the current state of the systemsenvironment. In a game of chess, the current state could represent all the positions of the piecesfor example.Production rules (or simply productions) are conditional if-then branches. In a production systemwhenever a or condition in the system is satisfied, the system is allowed to execute or perform aspecific action which may be specified under that rule. If the rule is not fufilled, it may performanother action. This can be simply paraphrased:WHEN (condition) IS SATISFIED, PERFORM (action)A Production System AlgorithmDATA (binded with initial global data base)when DATA satisfies the halting condition dobeginselect some rule R that can be applied to DATAreturn DATA (binded with the result of when R was applied to DATA)endTypes of Production SystemThere are two basic types of production System: • Commutative Production System • Decomposable Production SystemCommutative Production System
    • A production system is commutative if it has the following properties with respect to a databaseD:1. Each member of the set of rules applicable to D is also applicable to any database produced byapplying an applicable rule to D.2. If the goal condition is satisfied by D, then it is also satisfied by any database produced byapplying any applicable rule to D.3. The database that results by applying to D any sequence composed of rules that are applicableto D is invariant under permutations of the sequence.Decomposable Production SystemInitial database can be decomposed or split into separate components that can be processedindependently.Search ProcessSearching is defined as a sequence of steps that transforms the initial state to the goal state. Todo a search process, the following are needed: • The initial state description of the problem • A set of legal operators that changes the state. • The final or goal state.The searching process in AI can be classified into two types:1. Uniformed Search/ Blind Search2. Heuristic Search/ Informed SearchUniformed/ Blind SearchA uniformed search algorithm is one that do not have any domain specific knowledge. They useinformation like initial state, final state and a set of logical operators. this search shoul proceed ina systemic way by exploring nodes in some predetermined orders. It can be classified in to twosearch technologies:1. Breadth First search2. Depth First Search
    • Depth First Search !Depth first search works by taking a node, checking its neighbors, expanding the first node itfinds among the neighbors, checking if that expanded node is our destination, and if not,continue exploring more nodes.The above explanation is probably confusing if this is your first exposure to depth first search. Ihope the following demonstration will help more. Using our same search tree, lets find a pathbetween nodes A and F:Step 0Lets start with our root/goal node:We will be using two lists to keep track of what we are doing - an Open list and a Closed List.An Open list keeps track of what you need to do, and the Closed List keeps track of what youhave already done. Right now, we only have our starting point, node A. We havent doneanything to it yet, so lets add it to our Open list.Open List: AClosed List: <empty>Step 1
    • Now, lets explore the neighbors of our A node. To put another way, lets take the first item fromour Open list and explore its neighbors:Node As neighbors are the B and C nodes. Because we are now done with our A node, we canremove it from our Open list and add it to our Closed List. You arent done with this step though.You now have two new nodes B and C that need exploring. Add those two nodes to our Openlist.Our current Open and Closed Lists contain the following data:Open List: B, CClosed List: AStep 2Our Open list contains two items. For depth first search and breadth first search, you alwaysexplore explore the first item from our Open list. The first item in our Open list is the B node. Bis not our destination, so lets explore its neighbors:Because I have now expanded B, I am going to remove it from the Open list and add it to theClosed List. Our new nodes are D and E, and we add these nodes to the beginning of our Openlist:Open List: D, E, CClosed List: A, BStep 3You should start to see a pattern forming. Because D is at the beginning of our Open List, weexpand it. D isnt our destination, and it does not contain any neighbors. All you do in this step isremove D from our Open List and add it to our Closed List:
    • Open List: E, CClosed List: A, B, DStep 4We now expand the E node from our Open list. E is not our destination, so we explore itsneighbors and find out that it contains the neighbors F and G. Remember, F is our target, but wedont stop here though. Despite F being on our path, we only end when we are about to expandour target Node - F in this case:Our Open list will have the E node removed and the F and G nodes added. The removed E nodewill be added to our Closed List:Open List: F, G, CClosed List: A, B, D, EStep 5We now expand the F node. Since it is our intended destination, we stop:We remove F from our Open list and add it to our Closed List. Since we are at our destination,there is no need to expand F in order to find its neighbors. Our final Open and Closed Listscontain the following data:
    • Open List: G, CClosed List: A, B, D, E, FThe final path taken by our depth first search method is what the final value of our Closed Listis: A, B, D, E, F.Breadth First SearchIn depth first search, newly explored nodes were added to the beginning of your Open list. Inbreadth first search, newly explored nodes are added to the end of your Open list.For example, here is our original search tree:The above explanation is probably confusing if this is your first exposure to depth first search. Ihope the following demonstration will help more. Using our same search tree, lets find a pathbetween nodes A and F:Step 0Lets start with our root/goal node:We will be using two lists to keep track of what we are doing - an Open list and a Closed List.An Open list keeps track of what you need to do, and the Closed List keeps track of what youhave already done. Right now, we only have our starting point, node A. We havent doneanything to it yet, so lets add it to our Open list.
    • Open List: AClosed List: <empty>Step 1Now, lets explore the neighbors of our A node. To put another way, lets take the first item fromour Open list and explore its neighbors:Node As neighbors are the B and C nodes. Because we are now done with our A node, we canremove it from our Open list and add it to our Closed List. You arent done with this step though.You now have two new nodes B and C that need exploring. Add those two nodes to our Openlist.Our current Open and Closed Lists contain the following data:Open List: B, CClosed List: AStep 2Our Open list contains two items. For depth first search and breadth first search, you alwaysexplore explore the first item from our Open list. The first item in our Open list is the B node. Bis not our destination, so lets explore its neighbors:Because I have now expanded B, I am going to remove it from the Open list and add it to theClosed List. Our new nodes are D and E, and we add these nodes to the beginning of our Openlist:Open List: D, E, CClosed List: A, B
    • Step 3You should start to see a pattern forming. Because D is at the beginning of our Open List, weexpand it. D isnt our destination, and it does not contain any neighbors. All you do in this step isremove D from our Open List and add it to our Closed List:Open List: E, CClosed List: A, B, DStep 4We now expand the E node from our Open list. E is not our destination, so we explore itsneighbors and find out that it contains the neighbors F and G. Remember, F is our target, but wedont stop here though. Despite F being on our path, we only end when we are about to expandour target Node - F in this case:Our Open list will have the E node removed and the F and G nodes added. The removed E nodewill be added to our Closed List:Open List: F, G, CClosed List: A, B, D, EStep 5We now expand the F node. Since it is our intended destination, we stop:
    • We remove F from our Open list and add it to our Closed List. Since we are at our destination,there is no need to expand F in order to find its neighbors. Our final Open and Closed Listscontain the following data:Open List: G, CClosed List: A, B, D, E, FThe final path taken by our depth first search method is what the final value of our Closed Listis: A, B, D, E, F.iterative Deepening Depth-First Search !Iterative deepening depth-first search (IDDFS) is a state space search strategy in which a depth-limited search is run repeatedly, increasing the depth limit with each iteration until it reaches d,the depth of the shallowest goal state. On each iteration, IDDFS visits the nodes in the searchtree in the same order as depth-first search, but the cumulative order in which nodes are firstvisited, assuming no pruning, is effectively breadth-first.IDDFS combines depth-first searchs space-efficiency and breadth-first searchs completeness(when the branching factor is finite). It is optimal when the path cost is a non-decreasingfunction of the depth of the node.The space complexity of IDDFS is O(bd), where b is the branching factor and d is the depth ofshallowest goal. Since iterative deepening visits states multiple times, it may seem wasteful, butit turns out to be not so costly, since in a tree most of the nodes are in the bottom level, so it doesnot matter much if the upper levels are visited multiple times.The main advantage of IDDFS in game tree searching is that the earlier searches tend to improvethe commonly used heuristics, such as the killer heuristic and alpha-beta pruning, so that a moreaccurate estimate of the score of various nodes at the final depth search can occur, and the searchcompletes more quickly since it is done in a better order. For example, alpha-beta pruning ismost efficient if it searches the best moves first.A second advantage is the responsiveness of the algorithm. Because early iterations use smallvalues for d, they execute extremely quickly. This allows the algorithm to supply earlyindications of the result almost immediately, followed by refinements as d increases. When used
    • in an interactive setting, such as in a chess-playing program, this facility allows the program toplay at any time with the current best move found in the search it has completed so far. This isnot possible with a traditional depth-first search.The time complexity of IDDFS in well-balanced trees works out to be the same as Depth-firstsearch: O(bd).In an iterative deepening search, the nodes on the bottom level are expanded once, those on thenext to bottom level are expanded twice, and so on, up to the root of the search tree, which isexpanded d + 1 times.[1] So the total number of expansions in an iterative deepening search isAll together, an iterative deepening search from depth 1 to depth d expands only about 11%more nodes than a single breadth-first or depth-limited search to depth d, when b = 10. Thehigher the branching factor, the lower the overhead of repeatedly expanded states, but even whenthe branching factor is 2, iterative deepening search only takes about twice as long as a completebreadth-first search. This means that the time complexity of iterative deepening is still O(bd), andthe space complexity is O(bd). In general, iterative deepening is the preferred search methodwhen there is a large search space and the depth of the solution is not known.Informed SearchIt is not difficult to see that uninformed search will pursue options that lead away from the goalas easily as it pursues options that lead to wards the goal. For any but the smallest problems thisleads to searches that take unacceptable amounts of time and/or space. Informed search tries toreduce the amount of search that must be done by making intelligent choices for the nodes thatare selected for expansion. This implies the existence of some way of evaluating the likelyhoodthat a given node is on the solution path. In general this is done using a heuristic function.Hill ClimbingHill climbing is a mathematical optimization technique which belongs to the family of localsearch. It is relatively simple to implement, making it a popular first choice. Although moreadvanced algorithms may give better results, in some situations hill climbing works just as well.Hill climbing can be used to solve problems that have many solutions, some of which are betterthan others. It starts with a random (potentially poor) solution, and iteratively makes small
    • changes to the solution, each time improving it a little. When the algorithm cannot see anyimprovement anymore, it terminates. Ideally, at that point the current solution is close to optimal,but it is not guaranteed that hill climbing will ever come close to the optimal solution.For example, hill climbing can be applied to the traveling salesman problem. It is easy to find asolution that visits all the cities but will be very poor compared to the optimal solution. Thealgorithm starts with such a solution and makes small improvements to it, such as switching theorder in which two cities are visited. Eventually, a much better route is obtained.Hill climbing is used widely in artificial intelligence, for reaching a goal state from a startingnode. Choice of next node and starting node can be varied to give a list of related algorithms.Mathematical descriptionHill climbing attempts to maximize (or minimize) a function f(x), where x are discrete states.These states are typically represented by vertices in a graph, where edges in the graph encodenearness or similarity of a graph. Hill climbing will follow the graph from vertex to vertex,always locally increasing (or decreasing) the value of f, until a local maximum (or localminimum) xm is reached. Hill climbing can also operate on a continuous space: in that case, thealgorithm is called gradient ascent (or gradient descent if the function is minimized).*.VariantsIn simple hill climbing, the first closer node is chosen, whereas in steepest ascent hill climbingall successors are compared and the closest to the solution is chosen. Both forms fail if there isno closer node, which may happen if there are local maxima in the search space which are notsolutions. Steepest ascent hill climbing is similar to best-first search, which tries all possibleextensions of the current path instead of only one.Stochastic hill climbing does not examine all neighbors before deciding how to move. Rather, itselects a neighbour at random, and decides (based on the amount of improvement in thatneighbour) whether to move to that neighbour or to examine another.
    • Random-restart hill climbing is a meta-algorithm built on top of the hill climbing algorithm. It isalso known as Shotgun hill climbing. It iteratively does hill-climbing, each time with a randominitial condition x0. The best xm is kept: if a new run of hill climbing produces a better xm thanthe stored state, it replaces the stored state.Random-restart hill climbing is a surprisingly effective algorithm in many cases. It turns out thatit is often better to spend CPU time exploring the space, than carefully optimizing from an initialcondition.Local MaximaA problem with hill climbing is that it will find only local maxima. Unless the heuristic isconvex, it may not reach a global maximum. Other local search algorithms try to overcome thisproblem such as stochastic hill climbing, random walks and simulated annealing.RidgesA ridge is a curve in the search place that leads to a maximum, but the orientation of the ridgecompared to the available moves that are used to climb is such that each move will lead to asmaller point. In other words, each point on a ridge looks to the algorithm like a local maximum,even though the point is part of a curve leading to a better optimum.PlateauAnother problem with hill climbing is that of a plateau, which occurs when we get to a "flat" partof the search space, i.e. we have a path where the heuristics are all very close together. This kindof flatness can cause the algorithm to cease progress and wander aimlessly.PseudocodeHill Climbing AlgorithmcurrentNode = startNode;loop doL = NEIGHBORS(currentNode);
    • nextEval = -INF;nextNode = NULL;for all x in Lif (EVAL(x) > nextEval)nextNode = x;nextEval = EVAL(x);if nextEval <= EVAL(currentNode)//Return current node since no better neighbors existreturn currentNode;currentNode = nextNode;Best-First SearchBest-first search is a search algorithm which explores a graph by expanding the most promisingnode chosen according to a specified rule.Judea Pearl described best-first search as estimating the promise of node n by a "heuristicevaluation function f(n) which, in general, may depend on the description of n, the description ofthe goal, the information gathered by the search up to that point, and most important, on anyextra knowledge about the problem domain."Some authors have used "best-first search" to refer specifically to a search with a heuristic thatattempts to predict how close the end of a path is to a solution, so that paths which are judged tobe closer to a solution are extended first. This specific type of search is called greedy best-firstsearch.Efficient selection of the current best candidate for extension is typically implemented using apriority queue.Examples of best-first search algorithms include the A* search algorithm, and in turn, Dijkstrasalgorithm (which can be considered a specialization of A*). Best-first algorithms are often usedfor path finding in combinatorial search.Codeopen = initial statewhile open != nulldo1. Pick the best node on open.2. Create opens successors3. For each successor do:a. If it has not been generated before: evaluate it, add it to OPEN, and record its parentb. Otherwise: change the parent if this new path is better than previous one.done
    • Syntax of Propositional LogicLogic is used to represent properties of objects in the world about which we are going to reason.When we say Miss Piggy is plump we are talking about the object Miss Piggy and a propertyplump. Similarly when we say Kermits voice is high-pitched then the object is Kermits voiceand the property is high-pitched. It is normal to write these in logic as:plump(misspiggy)highpitched(voiceof(kermit))So misspiggy and kermit are constants representing objects in our domain. Notice that plump andhighpitched is different from voiceof:plump and highpitched are represent properties and so are boolean valued functions. They areoften called predicates or relations.voiceof is a function that returns an object (not true/false). To help us differentiate we shall use``of at the end of a function name.The predicates plump and highpitched are unary predicates but of course we can have binary orn-ary predicates; e.g. loves(misspiggy, voiceof(kermit))Simple SentencesThe fundamental components of logic are • object constants; e.g. misspiggy, kermit • function constants; e.g. voiceof • predicate constants; e.g. plump, highpitched, lovedPredicate and function constants take arguments which are objects in our domain. Predicateconstants are used to describe relationships concerning the objects and return the value true/false.Function constants return values that are objects.More Complex SentencesWe need to apply operators to construct more complex sentences from atoms.Negation applied to an atom negates the atom:
    • loves(kermit, voiceof(misspiggy))Kermit does not love Miss Piggys voiceConjunction combines two conjuncts:loves(misspiggy, kermit) loves(misspiggy, voiceof(kermit))Miss Piggy loves Kermit and Miss Piggy loves Kernits voiceNotice it is not correct syntax to write in logicloves(misspiggy, kermit) voiceof(kermit)because we have tried to conjoin a sentence (truth valued) with an object. Logic operators mustapply to truth-valued sentences.Disjunction combines two disjuncts:loves(misspiggy, kermit) loves(misspiggy, voiceof(kermit))Miss Piggy loves Kermit or Miss Piggy loves Kermits voiceImplication combines a condition and conclusionloves(misspiggy, voiceof(kermit)) loves(misspiggy, kermit)If Miss Piggy loves Kermits voice then Miss Piggy loves KermitThe language we have described so far contains atoms and the connectives , , and .This defines the syntax of propositional Logic. It is normal to represent atoms in propositionallogic as single upper-case letters but here we have used a more meaningful terminology for theatoms that extends easily to Predicate Logic.Semantics of Propositional LogicWe have defined the syntax of propositional Logic. However, this is of no use without talkingabout the meaning, or semantics, of the sentences. Suppose our logic contained only atoms; e.g.no logical connectives. This logic is very silly because any subset of these atoms is consistent;e.g. beautiful(misspiggy) and ugly(misspiggy) are consistent because we cannot represent
    • ugly(misspiggy) beautiful(misspiggy) So we now need a way in our logic to define whichsentences are true.Example: Models Define TruthSuppose a language contains only one object constant misspiggy and two relation constants uglyand beautiful. The following models define different facts about Miss Piggy.M=ø: In this model Miss Piggy is neither ugly nor beautiful.M={ugly(misspiggy)}: In this model Miss Piggy is ugly and not beautiful.M={beautiful(misspiggy)}: In this model Miss Piggy is beautiful and not ugly.M={ugly(misspiggy), beautiful(misspiggy)}: In this model Miss Piggy is both ugly andbeautiful. The last statement is intuitively wrong but the model selected commits the truth of theatoms in the language.Compound SentencesSo far we have restricted our attention to the semantics of atoms: an atom is true if it is a memberof the model M; otherwise it is false. Extending the semantics to compound sentences is easy.Notice that in the definitions below p and q do not need to be atoms because these definitionswork recursively until atoms are reached.Conjunctionp q is true in M iff p and q are true in M individually.So the conjunctloves(misspiggy, kermit) loves(misspiggy, voiceof(kermit))is true only when bothMiss Piggy loves Kermit; andMiss Piggy loves Kermits voiceDisjunctionp q is true in M iff at least one of p or q is true in M.So the disjunctloves(misspiggy, kermit) loves(misspiggy, voiceof(kermit))is true whenever
    • Miss Piggy loves Kermit;Miss Piggy loves Kermits voice; orMiss Piggy loves both Kermit and his voice.Therefore the disjunction is weaker than either disjunct and the conjunction of these disjuncts.Negation p is true in M iff p is not true in M.Implicationp q is true in M iff p is not true in M or q is true in M.We have been careful about the definition of . When people use an implication p q theynormally imply that p causes q. So if p is true we are happy to say that p q is true iff q is true.But if p is false the causal link causes confusion because we cant tell whether q should be true ornot. Logic requires that the connectives are truth functional and so the truth of the compoundsentence must be determined from the truth of its component parts. Logic defines that if p is falsethen p q is true regardless of the truth of q.So both of the following implications are true (provided you believe pigs do not fly!):fly(pigs) beautiful(misspiggy)fly(pigs) beautiful(misspiggy)Example: Implications and ModelsIn which of the following models isugly(misspiggy) beautiful(misspiggy) true?M=ØMiss Piggy is not ugly and so the antecedent fails. Therefore the implication holds. (Miss Piggyis also not beautiful in this model.)M={beautiful(misspiggy)}Again, Miss Piggy is not ugly and so the implication holds.M={ugly(misspiggy)}Miss Piggy is not beautiful and so the conclusion is valid and hence the implication holds.M={ugly(misspiggy), beautiful(misspiggy)}
    • Miss Piggy is ugly and so the antecedent holds. But she is also beautiful and sobeautiful(misspiggy) is not true. Therefore the conclusion does not hold and so the implicationfails in this (and only this) case.Truth TablesTruth tables are often used to calculate the truth of complex propositional sentences. A truthtable represents all possible combinations of truths of the atoms and so contains all possiblemodels. A column is created for each of the atoms in the sentence, and all combinations of truthvalues for these atoms are assigned one per row. So if there are $n$ atoms then there are $n$initial columns and $2^n$ rows. The final column contains the truth of the sentence for eachcombination of truths for the atoms. Intervening columns can be added to store intermediate truthcalculations. Below are two sample truth tables:EquivalenceTwo sentences are equivalence if they hold in exactly the same models.Therefore we can determine equivalence by drawing truth tables that represent the sentences inthe various models. If the initial and final columns of the truth tables are identical then thesentences are equivalent. Examples of equivalences include:
    • Unlike and , is not commutative:loves(misspiggy, voiceof(kermit)) loves(misspiggy, kermit)is very different fromloves(misspiggy, kermit) loves(misspiggy, voiceof(kermit))Similarly is not associative. )Syntax & Semantics for Predicate LogicSyntax of Predicate LogicPropositional logic is fairly powerful but we must add variables and quantification to be able toreason about objects in atoms and express properties of a set of objects without listing the atomcorresponding to each object.We shall adopt the Prolog convention that variables have an initial capital letter. (This is contraryto many Mathematical Logic books where variables are lower case and constants have an initialcapital.)When we include variables we must specify their scope or quantification. The first quantifier wewant is the universal quantifier (for all). X.loves(misspiggy, X)This allows X to range over all the objects and asserts that Miss Piggy loves each of them. Wehave introduced one variable but any number is allowed: XY.loves(X, Y)Each of the objects love all of the objects, even itself! Therefore XY. is the same as X. Y.Quantifiers, like connectives, act on sentences. So if Miss Piggy loves all cute things (not justKermit!) we would write C.[cute(C) -> loves(misspiggy, C)]rather than loves(misspiggy, C.cute(C))because the second argument to loves must be an object, not a sentence.
    • When the world contains a finite set of objects then a universally quantified sentence can beconverted into a sentence without the quantifier; e.g. X.loves(misspiggy, X) becomesloves(misspiggy, misspiggy) loves(misspiggy, kermit)loves(misspiggy, animal) ...Contrast this with the infinite set of positive integers and the sentence N.[odd(N) $vee$ even(N)]The other quantifier is the existential quantifier (there exists). X.loves(misspiggy, X)This allows X to range over all the objects and asserts that Miss Piggy loves (at least) one ofthem. Similarly XY.loves(X, Y)asserts that there is at least one loving couple (or self-loving object).We shall be using First Order Predicate Logic where quantified variables range over objectconstants only. We are defining Second Order Predicate Logic if we allow quantified variables torange over functions or predicates as well; e.g. X.loves(misspiggy, X(kermit)) includes loves(misspiggy, voiceof(kermit)) X.X(misspiggy, kermit) (there exists some relationship linking Miss Piggy and Kermit!)Semantics of First Order Predicate LogicNow we must deal with quantification. : X.p(X) holds in a model iff $p(z)$ holds for all objects $z$ in our domain. : X.p(X) holds in a model iff there is some object z in our domain so that p(z) holds.Example: Available Objects affects QuantificationIf misspiggy is the only object in our domain then ugly(misspiggy) beautiful(misspiggy) is equivalent to
    • X.ugly(X) beautiful(X)If there were other objects then there would be more atoms and so the set of models would belarger; e.g. with objects misspiggy and kermit the possible models are all combinations of theatoms ugly(misspiggy), beautiful(misspiggy) ugly(kermit), beautiful(kermit). Now the 2sentences are no longer equivalent.1). Although, every model in which X.ugly(X) beautiful(X) holds,ugly(misspiggy) beautiful(misspiggy) also holds2).There are models in which ugly(misspiggy) beautiful(misspiggy) holds,but X.ugly(X) beautiful(X) does not hold; e.g.M = {ugly(kermit), beautiful(kermit)}.What about M = {ugly(misspiggy)}, beautiful(misspiggy)?Clausal Form for Predicate Calculus !In order to prove a formula in the predicate calculus by resolution,we1.Negate the formula.2.Put the negated formula into CNF, by doing the following:i.Get rid of all operators.ii.Push the operators in as far as possible.iii.Rename variables as necessary (see the step below).iv.Move all of the quantifiers to the left (the outside) of the expression using the following rules(where Q is either or and G is a formula that does not contain x):
    • This leaves the formula in what is called prenex form which consists of a series of quantifiersfollowed by a quantifier-free formula, called the matrix.v.Remove all quantifiers from the formula. First we remove the existentially quantified variablesby using Skolemization. Each existentially quantified variable, say x is replaced by a functionterm which begins with a new, n-ary function symbol, say f where n is the number of universallyquantified variables that occur before x is quantified in the formula. The arguments to thefunction term are precisely these variables. For example, if we have the formulathen z would be replaced by a function term f(x,y) where f is a new function symbol. The resultis:This new formula is satisfiable if and only if the original formula is satisfiable.The new function symbol is called a Skolem function. If the existentially quantified variable hasno preceding universally quantified variables, then the function is a 0-ary function and is oftencalled a Skolem constant.After removing all existential quantifiers, we simply drop all the universal quantifiers as weassume that any variable appearing in a formula is universally quantified.vi.The remaining formula (the matrix) is put in CNF by moving any operators outside of any operations.3.Finally, the CNF formula is written in clausal format by writing each conjunct as a set ofliterals (a clause), and the whole formula as a set clauses (the clause set).For example, if we begin with the proposition
    • we have:1.Negate the theorem:i.Push the operators in. No change.ii).Rename variables if necessary:iii)Move the quantifiers to the outside: First, we have Then we getiv)Remove the quantifiers, first by Skolemizing the existentially quantified variables. As thesehave no universally quantified variables to their left, they are replaced by Skolem constants:Drop the universal quantifiers:v)Put the matrix into CNF. No change.2.Write the formula in clausal form:Inference Rules !Complex deductive arguments can be judged valid or invalid based on whether or not the steps inthat argument follow the nine basic rules of inference. These rules of inference are all relativelysimple, although when presented in formal terms they can look overly complex.Conjunction:1. P2. Q3. Therefore, P and Q.1. It is raining in New York.2. It is raining in Boston3. Therefore, it is raining in both New York and Boston
    • Simplification1. P and Q.2. Therefore, P.1. It is raining in both New York and Boston.2. Therefore, it is raining in New York.Addition1. P2. Therefore, P or Q.1. It is raining2. Therefore, either either it is raining or the sun is shining.Absorption1. If P, then Q.2. Therfore, If P then P and Q.1. If it is raining, then I will get wet.2. Therefore, if it is raining, then it is raining and I will get wet.Modus Ponens1. If P then Q.2. P.3. Therefore, Q.1. If it is raining, then I will get wet.2. It is raining.3. Therefore, I will get wet.Modus Tollens1. If P then Q.2. Not Q. (~Q).3. Therefore, not P (~P).1. If it had rained this morning, I would have gotten wet.2. I did not get wet.3. Therefore, it did not rain this morning.Hypothetical Syllogism
    • 1. If P then Q.2. If Q then R.3. Therefore, if P then R.1. If it rains, then I will get wet.2. If I get wet, then my shirt will be ruined.3. If it rains, then my shirt will be ruined.Disjunctive Syllogism1. Either P or Q.2. Not P (~P).3. Therefore, Q.1. Either it rained or I took a cab to the movies.2. It did not rain.3. Therefore, I took a cab to the movies.Constructive Dilemma1. (If P then Q) and (If R then S).2. P or R.3. Therefore, Q or S.1. If it rains, then I will get wet and if it is sunny, then I will be dry.2. Either it will rain or it will be sunny.3. Therefore, either I will get wet or I will be dry.The above rules of inference, when combined with the rules of replacement, mean thatpropositional calculus is "complete." Propositional calculus is simply another name for formallogic.Resolution !Resolution is a rule of inference leading to a refutation theorem-proving technique for sentencesin propositional logic and first-order logic. In other words, iteratively applying the resolution rulein a suitable way allows for telling whether a propositional formula is satisfiable and for provingthat a first-order formula is unsatisfiable; this method may prove the satisfiability of a first-ordersatisfiable formula, but not always, as it is the case for all methods for first-order logic.Resolution was introduced by John Alan Robinson in 1965.Resolution in propositional logicThe resolution rule in propositional logic is a single valid inference rule that produces a newclause implied by two clauses containing complementary literals. A literal is a propositionalvariable or the negation of a propositional variable. Two literals are said to be complements if
    • one is the negation of the other (in the following, ai is taken to be the complement to bj). Theresulting clause contains all the literals that do not have complements. Formally:whereall as and bs are literals,ai is the complement to bj, andthe dividing line stands for entailsThe clause produced by the resolution rule is called the resolvent of the two input clauses.When the two clauses contain more than one pair of complementary literals, the resolution rulecan be applied (independently) for each such pair. However, only the pair of literals that areresolved upon can be removed: all other pair of literals remain in the resolvent clause.A resolution techniqueWhen coupled with a complete search algorithm, the resolution rule yields a sound and completealgorithm for deciding the satisfiability of a propositional formula, and, by extension, the validityof a sentence under a set of axioms.This resolution technique uses proof by contradiction and is based on the fact that any sentencein propositional logic can be transformed into an equivalent sentence in conjunctive normalform. The steps are as follows:1).All sentences in the knowledge base and the negation of the sentence to be proved (theconjecture) are conjunctively connected.2).The resulting sentence is transformed into a conjunctive normal form with the conjunctsviewed as elements in a set, S, of clauses.For examplewould give rise to a set3).The resolution rule is applied to all possible pairs of clauses that contain complementaryliterals. After each application of the resolution rule, the resulting sentence is simplified byremoving repeated literals. If the sentence contains complementary literals, it is discarded (as a
    • tautology). If not, and if it is not yet present in the clause set S, it is added to S, and is consideredfor further resolution inferences.4).If after applying a resolution rule the empty clause is derived, the complete formula isunsatisfiable (or contradictory), and hence it can be concluded that the initial conjecture followsfrom the axioms.5).If, on the other hand, the empty clause cannot be derived, and the resolution rule cannot beapplied to derive any more new clauses, the conjecture is not a theorem of the originalknowledge base.One instance of this algorithm is the original Davis–Putnam algorithm that was later refined intothe DPLL algorithm that removed the need for explicit representation of the resolvents.This description of the resolution technique uses a set S as the underlying data-structure torepresent resolution derivations. Lists, Trees and Directed Acyclic Graphs are other possible andcommon alternatives. Tree representations are more faithful to the fact that the resolution rule isbinary. Together with a sequent notation for clauses, a tree representation also makes it clear tosee how the resolution rule is related to a special case of the cut-rule, restricted to atomic cut-formulas. However, tree representations are not as compact as set or list representations, becausethey explicitly show redundant subderivations of clauses that are used more than once in thederivation of the empty clause. Graph representations can be as compact in the number of clausesas list representations and they also store structural information regarding which clauses wereresolved to derive each resolvent.ExampleIn English: if a or b is true, and a is false or c is true, then either b or c is true.If a is true, then for the second premise to hold, c must be true. If a is false, then for the firstpremise to hold, b must be true.So regardless of a, if both premises hold, then b or c is true.UnificationWe also need some way of binding variables to values in a consistent way so that components ofsentences can be matched. This is the process of Unification.Knowledge Representation
    • Network RepresentationsNetworks are often used in artificial intelligence as schemes for representation. One of theadvantages of using a network representation is that theorists in computer science have studiedsuch structures in detail and there are a number of efficient and robust algorithms that may beused to manipulate the representations.Trees and GraphsA tree is a collection of nodes in which each node may be expanded into one or more uniquesubnodes until termination occurs. There may be no termination and an infinite tree results. Agraph is simply a tree in which non-unique nodes are generated; in other words, a tree is a graphwith no loops. The representation of the nodes and links is arbitrary. In a computer chess player,for example, nodes might represent individual board positions and the links from each node thelegal moves from that position. This is a specific instance of a problem space. In general,problem spaces are graphs in which the nodes represent states and the connections between statesrepresented by an operator that makes the state transformation.IS-A Links and Semantic NetworksIn constructing concept hierarchies, often the most important means of showing inclusion in a setis to use what is called an IS-A link, in which X is a member in some more general set Y. Forexample, a DOG ISA MAMMAL. As one travels up the link, the more general concept isdefined. This is generally the simplest type of link between concepts in concept or semantichierarchies. The combination of instances and classes connected by ISA links in a graph or treeis generally known as a semantic network. Semantic networks are useful, in part, because theyprovide a natural structure for inheritance. For instance, if a DOG ISA MAMMAL then thoseproperties that are true for MAMMALs and DOGs need not be specified for the DOG; insteadthey may be derived via an inheritance procedure. This greatly reduces the amount ofinformation that must be stored explicitly although there is an increase in the time required toaccess knowledge through the inheritance mechanism. Frames are a special type of semanticnetwork representation.Associative NetworkA means of representing relational knowledge as a labeled directed graph. Each vertex of thegraph represents a concept and each label represents a relation between concepts. Access andupdating procedures traverse and manipulate the graph. A semantic network is sometimesregarded as a graphical notation for logical formulas.Conceptual Graphs !A conceptual graph (CG) is a graph representation for logic based on the semantic networks ofartificial intelligence.
    • A conceptual graph consists of concept nodes and relation nodes. • The concept nodes represent entities, attributes, states, and events • The relation nodes show how the concepts are interconnectedConceptual Graphs are finite, connected, bipartite graphs.Finite: because any graph (in human brain or computer storage) can only have a finite numberof concepts and conceptual relations.Connected: because two parts that are not connected would simply be called two conceptualgraphs.Bipartite: because there are two different kinds of nodes: concepts and conceptual relations, andevery arc links a node of one kind to a node of another kindExampleFollowing CG display form for John is going to Boston by bus.The conceptual graph in Figure represents a typed or sorted version of logic. Each of the fourconcepts has a type label, which represents the type of entity the concept refers to: Person, Go,Boston, or Bus. Two of the concepts have names, which identify the referent: John or Boston.Each of the three conceptual relations has a type label that represents the type of relation: agent(Agnt), destination (Dest), or instrument (Inst). The CG as a whole indicates that the person Johnis the agent of some instance of going, the city Boston is the destination, and a bus is theinstrument. Figure 1 can be translated to the following formula:As this translation shows, the only logical operators used in Figure are conjunction and theexistential quantifier. Those two operators are the most common in translations from naturallanguages, and many of the early semantic networks could not represent any others.
    • Structured RepresentationStructure representation can be done in various ways like: • Frames • ScriptsFramesA frame is a method of representation in which a particular class is defined by a number ofattributes (or slots) with certain values (the attributes are filled in for each instance). Thus,frames are also known as slot-and-filler structures. Frame systems are also somewhat equivalentto semantic networks although frames are usually associated with more defined structure than thenetworks.Like a semantic network, one of the chief properties of frames is that they provide a naturalstructure for inheritance. ISA-Links connect classes to larger parent classes and properties of thesubclasses may be determined at both the level of the class itself and from parent classes.This leads into the idea of defaults. Frames may indicate specific values for some attributes orinstead indicate a default. This is especially useful when values are not always known but cangenerally be assumed to be true for most of the class. For example, the class BIRD may have adefault value of FLIES set to TRUE even though instances below it (say, for example, anOSTRICH) have FLIES values of FALSE.In addition, the values of particular attribute need not necessarily be filled with a value but mayalso indicate a procedure to run to obtain a value. This is known as an attached procedure.Attached procedures are especially useful when there is a high cost associated with computing aparticular value, when the value changes with time or when the expected access frequency islow. Instead of computing the value for each instance, the values are computed only whenneeded. However, this computation is run during execution (rather than during the establishmentof the frame network) and may be costly.ScriptsA script is a remembered precedent, consisting of tightly coupled, expectation-suggestingprimitive-action and state-change frames.A script is a structured representation describing a stereotyped sequence of events in a particularcontext. That is, extend frames by explicitly representing expectations of actions and statechanges.Why represent knowledge in this way?1) Because real-world events do follow stereotyped patterns. Human beings use previousexperiences to understand verbal accounts; computers can use scripts instead.
    • 2) Because people, when relating events, do leave large amounts of assumed detail out of theiraccounts. People dont find it easy to converse with a system that cant fill in missingconversational detail.Min Max AlgorithmThere are plenty of applications for AI, but games are the most interesting to the public.Nowadays every major OS comes with some games. So it is no surprise that there are somealgorithms that were devised with games in mind.The Min-Max algorithm is applied in two player games, such as tic-tac-toe, checkers, chess, go,and so on. All these games have at least one thing in common, they are logic games. This meansthat they can be described by a set of rules and premisses. With them, it is possible to know froma given point in the game, what are the next available moves. So they also share othercharacteristic, they are ‘full information games’. Each player knows everything about thepossible moves of the adversary.Before explaining the algorithm, a brief introduction to search trees is required. Search trees area way to represent searches. The squares are known as nodes and they represent points of thedecision in the search. The nodes are connected with branches. The search starts at the root node,the one at the top of the figure. At each decision point, nodes for the available search paths aregenerated, until no more decisions are possible. The nodes that represent the end of the searchare known as leaf nodes.There are two players involved, MAX and MIN. A search tree is generated, depth-first, startingwith the current game position upto the end game position. Then, the final game position isevaluated from MAX’s point of view, as shown in Figure 1. Afterwards, the inner node values ofthe tree are filled bottom-up with the evaluated values. The nodes that belong to the MAX playerreceive the maximun value of it’s children. The nodes for the MIN player will select theminimun value of it’s children.MinMax (GamePosition game) {return MaxMove (game);}MaxMove (GamePosition game) {if (GameEnded(game)) {return EvalGameState(game);
    • }else {best_move < - {};moves <- GenerateMoves(game);ForEach moves {move <- MinMove(ApplyMove(game));if (Value(move) > Value(best_move)) {best_move < - move;}}return best_move;}}MinMove (GamePosition game) {best_move <- {};moves <- GenerateMoves(game);ForEach moves {move <- MaxMove(ApplyMove(game));if (Value(move) > Value(best_move)) {best_move < - move;}}return best_move;}So what is happening here? The values represent how good a game move is. So the MAX playerwill try to select the move with highest value in the end. But the MIN player also has somethingto say about it and he will try to select the moves that are better to him, thus minimizing MAX’soutcome.OptimisationHowever only very simple games can have their entire search tree generated in a short time. Formost games this isn’t possible, the universe would probably vanish first. So there are a fewoptimizations to add to the algorithm.First a word of caution, optimization comes with a price. When optimizing we are trading the fullinformation about the game’s events with probabilities and shortcuts. Instead of knowing the fullpath that leads to victory, the decisions are made with the path that might lead to victory. If theoptimization isn’t well choosen, or it is badly applied, then we could end with a dumb AI. And itwould have been better to use random moves.
    • One basic optimization is to limit the depth of the search tree. Why does this help? Generatingthe full tree could take ages. If a game has a branching factor of 3, which means that each nodehas tree children, the tree will have the folling number of nodes per depth:The sequence shows that at depth n the tree will have 3^n nodes. To know the total number ofgenerated nodes, we need to sum the node count at each level. So the total number of nodes for atree with depth n is sum (0, n, 3^n). For many games, like chess that have a very big branchingfactor, this means that the tree might not fit into memory. Even if it did, it would take to long togenerate. If each node took 1s to be analyzed, that means that for the previous example, eachsearch tree would take sum (0, n, 3^n) * 1s. For a search tree with depth 5, that would mean1+3+9+27+81+243 = 364 * 1 = 364s = 6m! This is too long for a game. The player would giveup playing the game, if he had to wait 6m for each move from the computer.The second optimization is to use a function that evaluates the current game position from thepoint of view of some player. It does this by giving a value to the current state of the game, likecounting the number of pieces in the board, for example. Or the number of moves left to the endof the game, or anything else that we might use to give a value to the game position.Instead of evaluating the current game position, the function might calculate how the currentgame position might help ending the game. Or in another words, how probable is that given thecurrent game position we might win the game. In this case the function is known as an estimationfunction.This function will have to take into account some heuristics. Heuristics are knowledge that wehave about the game, and it can help generate better evaluation functions. For example, incheckers, pieces at corners and sideways positions can’t be eaten. So we can create an evaluationfunction that gives higher values to pieces that lie on those board positions thus giving higheroutcomes for game moves that place pieces in those positions.One of the reasons that the evaluation function must be able to evalute game positions for bothplayers is that you don’t know to which player the limit depth belongs.However having two functions can be avoided if the game is symetric. This means that the lossof a player equals the gains of the other. Such games are also known as ZERO-SUM games. Forthese games one evalution function is enough, one of the players just have to negate the return ofthe function.
    • The revised algorithm is:MinMax (GamePosition game) {return MaxMove (game);}MaxMove (GamePosition game) {if (GameEnded(game) || DepthLimitReached()) {return EvalGameState(game, MAX);}else {best_move < - {};moves <- GenerateMoves(game);ForEach moves {move <- MinMove(ApplyMove(game));if (Value(move) > Value(best_move)) {best_move < - move;}}return best_move;}}MinMove (GamePosition game) {if (GameEnded(game) || DepthLimitReached()) {return EvalGameState(game, MIN);}else {best_move <- {};moves <- GenerateMoves(game);ForEach moves {move <- MaxMove(ApplyMove(game));if (Value(move) > Value(best_move)) {best_move < - move;}}return best_move;}}Even so the algorithm has a few flaw, some of them can be fixed while other can only be solvedby choosing another algorithm.One of flaws is that if the game is too complex the answer will always take too long even with adepth limit. One solution it limit the time for search. If the time runs out choose the best movefound until the moment.
    • A big flaw is the limited horizon problem. A game position that appears to be very good mightturn out very bad. This happens because the algorithm wasn’t able to see that a few game movesahead the adversary will be able to make a move that will bring him a great outcome. Thealgorithm missed that fatal move because it was blinded by the depth limit.Speeding the AlgorithmThere are a few things can still be done to reduce the search time. Take a look at figure 2. Thevalue for node A is 3, and the first found value for the subtree starting at node B is 2. So sincethe B node is at a MIN level, we know that the selected value for the B node must be less orequal than 2. But we also know that the A node has the value 3, and both A and B nodes sharethe same parent at a MAX level. This means that the game path starting at the B node wouldn’tbe selected because 3 is better than 2 for the MAX node. So it isn’t worth to pursue the searchfor children of the B node, and we can safely ignore all the remaining children.This all means that sometimes the search can be aborted because we find out that the searchsubtree won’t lead us to any viable answer.This optimization is know as alpha-beta cuttoffs and the algorithm is as follows:1. Have two values passed around the tree nodes:i)the alpha value which holds the best MAX value found;ii)the beta value which holds the best MIN value found.2. At MAX level, before evaluating each child path, compare the returned value with of theprevious path with the beta value. If the value is greater than it abort the search for the currentnode;3. At MIN level, before evaluating each child path, compare the returned value with of theprevious path with the alpha value. If the value is lesser than it abort the search for the currentnode.Full pseudocode for MinMax with alpha-beta cuttoffs.MinMax (GamePosition game) {return MaxMove (game);}
    • MaxMove (GamePosition game, Integer alpha, Integer beta) {if (GameEnded(game) || DepthLimitReached()) {return EvalGameState(game, MAX);}else {best_move < - {};moves <- GenerateMoves(game);ForEach moves {move <- MinMove(ApplyMove(game), alpha, beta);if (Value(move) > Value(best_move)) {best_move < - move;alpha <- Value(move);}// Ignore remaining movesif (beta > alpha)return best_move;}return best_move;}}MinMove (GamePosition game) {if (GameEnded(game) || DepthLimitReached()) {return EvalGameState(game, MIN);}else {best_move < - {};moves <- GenerateMoves(game);ForEach moves {move <- MaxMove(ApplyMove(game), alpha, beta);if (Value(move) > Value(best_move)) {best_move < - move;beta <- Value(move);}// Ignore remaining movesif (beta < alpha)return best_move;}return best_move;}}How better does a MinMax with alpha-beta cuttoffs behave when compared with a normalMinMax? It depends on the order the search is searched. If the way the game positions are
    • generated doesn’t create situations where the algorithm can take advantage of alpha-beta cutoffsthen the improvements won’t be noticible. However, if the evaluation function and thegeneration of game positions leads to alpha-beta cuttoffs then the improvements might be great.Alpha-Beta CutoffWith all this talk about search speed many of you might be wondering what this is all about.Well, the search speed is very important in AI because if an algorithm takes too long to give agood answer the algorithm may not be suitable.For example, a good MinMax algorithm implementation with an evaluation function capable togive very good estimatives might be able to search 1000 positions a second. In tourament chesseach player has around 150 seconds to make a move. So it would probably be able to analyze150 000 positions during that period. But in chess each move has around 35 possible branchs! Inthe end the program would only be able to analyze around 3, to 4 moves ahead in the game. Evenhumans with very few pratice in chess can do better than this.But if we use MinMax with alpha-beta cutoffs, again a decent implementation with a goodevaluation function, the result behaviour might be much better. In this case, the program mightbe able to double the number of analyzed positions and thus becoming a much toughteradversary.ExampleExample of a board with the values estimated for each position.The game uses MinMax with alpha-beta cutoffs for the computer moves. The evaluation functionis an weighted average of the positions occupied by the checker pieces. The figure shows thevalues for each board position. The value of each board position is multiplied by the type of thepiece that rests on that position, described in first table.Rule based Expert System
    • Expert System !"An expert system is an interactive computer-based decision tool that uses both facts andheuristics to solve difficult decision problems based on knowledge acquired from an expert."An expert system is a computer program that simulates the thought process of a human expert tosolve complex decision problems in a specific domain. This chapter addresses the characteristicsof expert systems that make them different from conventional programming and traditional de-cision support tools. The growth of expert systems is expected to continue for several years.With the continuing growth, many new and exciting applications will emerge. An expert systemoperates as an interactive system that responds to questions, asks for clarification, makesrecommendations, and generally aids the decision-making process. Expert systems provideexpert advice and guidance in a wide variety of activities, from computer diagnosisAn expert system may be viewed as a computer simulation of a human expert. Expert systemsare an emerging technology with many areas for po- tential applications. Past applications rangefrom MYCIN, used in the medical field to diagnose infectious blood diseases, to XCON, used toconfigure com- puter systems. These expert systems have proven to be quite successful. Mostapplications of expert systems will fall into one of the following categories: • Interpreting and identifying • Predicting • Diagnosing • Designing • Planning • Monitoring • Debugging and testing • Instructing and training • ControllingApplications that are computational or deterministic in nature are not good candidates for expertsystems. Traditional decision support systems such as spreadsheets are very mechanistic in theway they solve problems. They operate under mathematical and Boolean operators in theirexecution and arrive at one and only one static solution for a given set of data. Calculationintensive applications with very exacting requirements are better handled by traditional decisionsupport tools or conventional programming. The best application candidates for expert systemsare those dealing with expert heuristics for solving problems. Conventional computer programsare based on factual knowledge, an indisputable strength of computers. Humans, by contrast,solve problems on the basis of a mixture of factual and heuristic knowledge. Heuristicknowledge, composed of intuition, judgment, and logical inferences, is an indisputable strengthof humans. Successful expert systems will be those that combine facts and heuristics and thusmerge human knowledge with computer power in solving problems. To be effective, an expertsystem must focus on a particular problem domain, as discussed belowDomain Specificity
    • Expert systems are typically very domain specific. For example, a diagnostic expert system fortroubleshooting computers must actually perform all the necessary data manipulation as a humanexpert would. The developer of such a system must limit his or her scope of the system to justwhat is needed to solve the target problem. Special tools or programming languages are oftenneeded to accomplish the specific objectives of the system.Special Programming LanguagesExpert systems are typically written in special programming languages. The use of languageslike LISP and PROLOG in the development of an expert system simplifies the coding process.The major advantage of these languages, as compared to conventional programming languages,is the simplicity of the addition, elimination, or substitution of new rules and memorymanagement capabilities. Some of the distinguishing characteristics of programming languagesneeded for expert systems work are: • Efficient mix of integer and real variables • Good memory-management procedures • Extensive data-manipulation routines • Incremental compilation • Tagged memory architecture • Optimization of the systems environment • Efficient search proceduresArchitecture of Expert System !Expert systems typically contain the following four components: • Knowledge-Acquisition Interface • User Interface • Knowledge Base • Inference EngineThis architecture differs considerably from traditional computer programs, resulting in severalcharacteristics of expert systems.
    • # Expert System Components #Knowledge-Acquisition InterfaceThe knowledge-acquisition interface controls how the expert and knowledge engineer interactwith the program to incorporate knowledge into the knowledge base. It includes features to assistexperts in expressing their knowledge in a form suitable for reasoning by the computer.This process of expressing knowledge in the knowledge base is called knowledge acquisition.Knowledge acquisition turns out to be quite difficult in many cases--so difficult that someauthors refer to the knowledge acquisition bottleneck to indicate that it is this aspect of expertsystem development which often requires the most time and effort.
    • Debugging faulty knowlege bases is facilitated by traces (lists of rules in the order they werefired), probes (commands to find and edit specific rules, facts, and so on), and bookkeepingfunctions and indexes (which keep track of various features of the knowledge base such asvariables and rules). Some rule-based expert system shells for personal computers monitor dataentry, checking the syntactic validity of rules. Expert systems are typically validated by testingtheir preditions for several cases against those of human experts. Case facilities--permitting a fileof such cases to be stored and automatically evaluated after the program is revised--can greatlyspeed the vaidation process. Many features that are useful for the user interface, such as on-screen help and explanations, are also of benefit to the developer of expert systems and are alsopart of knowledge-acquisition interfaces.Expert systems in the literature demonstrate a wide range of modes of knowledge acquisition(Buchanan, 1985). Expert system shells on microcomputers typically require the user to eitherenter rules explicitly or enter several examples of cases with appropriate conclusions, fromwhich the program will infer a rule.User InterfaceThe user interface is the part of the program that interacts with the user. It prompts the user forinformation required to solve a problem, displays conclusions, and explains its reasoning.Features of the user interface often include: • Doesnt ask "dumb" questions • Explains its reasoning on request • Provides documentation and references • Defines technical terms • Permits sensitivity analyses, simulations, and what-if analyses • Detailed report of recommendations • Justifies recommendations • Online help • Graphical displays of information • Trace or step through reasoningThe user interface can be judged by how well it reproduces the kind of interaction one mightexpect between a human expert and someone consulting that expert.Knowledge BaseThe knowledge base consists of specific knowledge about some substantive domain. Aknowledge base differs from a data base in that the knowledge base includes both explicitknowledge and implicit knowledge. Much of the knowledge in the knowledge base is not statedexplicitly, but inferred by the inference engine from explicit statements in the knowledge base.This makes knowledge bases have more efficient data storage than data bases and gives them thepower to exhaustively represent all the knowledge implied by explicit statements of knowledge.
    • There are several important ways in which knowledge is represented in a knowledge base. Formore information, see knowledge representation strategies.Knowledge bases can contain many different types of knowledge and the process of acquiringknowledge for the knowledge base (this is often called knowledge acquisition) often needs to bequite different depending on the type of knowledge sought.Types of KnpwledgeThere are many different kinds of knowledge considered in expert systems. Many of these formdimensions of contrasting knowledge: • explicit knowledge • implicit knowledge • domain knowledge • common sense or world knowledge • heuristics • algorithms • procedural knowledge • declarative or semantic knowledge • public knowledge • private knowledge • shallow knowledge • deep knowledge • metaknowledgeInference EngineThe inference engine uses general rules of inference to reason from the knowledge base anddraw conclusions which are not explicitly stated but can be inferred from the knowledge base.Inference engines are capable of symbolic reasoning, not just mathematical reasoning. Hence,they expand the scope of fruitful applications of computer programs.The specific forms of inference permitted by different inference engines varies, depending onseveral factors, including the knowledge representation strategies employed by the expertsystem.Expert System Development !Most expert systems are developed by a team of people, with the number of members varyingwith the complexity and scope of the project. Of course, a single individual can develop a verysimple system. But usually at least two people are involved.There are two essential roles that must filled by the development: knowledge engineer andsubstantive expert.
    • • The Knowledge Engineer • The Substantive ExpertThe Knowledge EngineerCriteria for selecting the Knowledge Engineer • Competent • Organized • PatientProblem with Knowledge Engineer • Technician with little social skill • Sociable with low technical skill • Disorganized • Unwilling to challeng expert to produce clarity • Unable to listen carefully to expert • Undiplomatic when discussing flaws in system or experts knowledge • Unable to quickly understand diverse substantive areasThe Substantive ExpertCriteria for selecting the expert • Competent • Available • Articulate • Self-Confident • Open-MindedVarieties of experts • No expert • Multiple experts • Book knowledge only • The knowledge engineer is also the expertProblem Experts • The unavailable expert • The reluctant expert • The cynical expert • The arrogant expert • The rambling expert • The uncommunicative expert • The too-cooperative expert
    • • The would-be-knowledge-engineer expertDevelopment ProcessThe systems development process often used for traditional software such as managementinformation systems often employs a process described as the "System Development Life Cycle"or "Waterfall" Model. While this model identifies a number of important tasks in thedevelopment process, many developers of expert systems have found it to be inadequate forexpert systems for a number of important reasons. Instead, many expert systems are developedusing a process called "Rapid Prototyping and Incremental Development."System Development Life-CycleProblem AnalysisIs the problem solvable? Is it feasible with this approach? cost-benefit analysisRequirement SpecificationWhat are the desired features and goals of the proposed system? Who are the users? Whatconstraints must be considered? What development and delivery environments will be used?DesignPreliminary Design - overall structure, data flow diagram, perhaps languageDetailed Design - details of each moduleImplementationWriting and debugging code, integrating modules, creating interfacesTestingComparing system to its specifications and assessing validityMaintenanceCorrections, modifications, enhancementsManaging Uncertainty in Expert SystemsSources of uncertainty in Expert System • Weak implication • Imprecise language
    • • Unknown data • Difficulty in combining the views of different expertsUncertainty in AI • Information is partial • Information is not fully reliable • Representation language is inherently imprecise • Information comes from multiple sources and it is conflicting • Information is approximate • Non-absolute cause-effect relationship existRepresenting uncertain information in Expert System • Probabilistic • Certainty factors • Theory of evidence • Fuzzy logic • Neural Network • GA • Rough setBayesian Probability TheoryBayesian probability is one of the most popular interpretations of the concept of probability. TheBayesian interpretation of probability can be seen as an extension of logic that enables reasoningwith uncertain statements. To evaluate the probability of a hypothesis, the Bayesian probabilistspecifies some prior probability, which is then updated in the light of new relevant data. TheBayesian interpretation provides a standard set of procedures and formulae to perform thiscalculation.Bayesian probability interprets the concept of probability as "a measure of a state of knowledge",in contrast to interpreting it as a frequency or a physical property of a system. Its name is derivedfrom the 18th century statistician Thomas Bayes, who pioneered some of the concepts. Broadlyspeaking, there are two views on Bayesian probability that interpret the state of knowledgeconcept in different ways. According to the objectivist view, the rules of Bayesian statistics canbe justified by requirements of rationality and consistency and interpreted as an extension oflogic. According to the subjectivist view, the state of knowledge measures a "personal belief".Many modern machine learning methods are based on objectivist Bayesian principles. One of thecrucial features of the Bayesian view is that a probability is assigned to a hypothesis, whereasunder the frequentist view, a hypothesis is typically rejected or not rejected without directlyassigning a probability.The probability of a hypothesis given the data (the posterior) is proportional to the product of thelikelihood times the prior probability (often just called the prior). The likelihood brings in the
    • effect of the data, while the prior specifies the belief in the hypothesis before the data wasobserved.More formally, Bayesian inference uses Bayes formula for conditional probability:whereH is a hypothesis, and D is the data.P(H) is the prior probability of H: the probability that H is correct before the data D was seen.P(D | H) is the conditional probability of seeing the data D given that the hypothesis H is true.P(D | H) is called the likelihood.P(D) is the marginal probability of D.P(H | D) is the posterior probability: the probability that the hypothesis is true, given the data andthe previous state of belief about the hypothesis.Stanford Certainty Factor !Uncertainty is represented as a degree of belief in two steps: • Express the degree of belief • Manipulate the degrees of belief during the use of knowledge based systemsIt is also based on evidence (or the expert’s assessment).Form of certainty factors in ESIF <evidence>THEN <hypothesis> {cf }cf represents belief in hypothesis H given that evidence E has occurredIt is based on 2 functionsi) Measure of belief MB(H, E)ii) Measure of disbelief MD(H, E)Indicate the degree to which belief/disbelief of hypothesis H is increased if evidence E wereobserved
    • Uncertain term and their intepretationTotal strength of belief and disbelief in a hypothesis:Nonmonotonic logic and Reasoning with BeliefsA non-monotonic logic is a formal logic whose consequence relation is not monotonic. Moststudied formal logics have a monotonic consequence relation, meaning that adding a formula to atheory never produces a reduction of its set of consequences. Intuitively, monotonicity indicatesthat learning a new piece of knowledge cannot reduce the set of what is known. A monotoniclogic cannot handle various reasoning tasks such as reasoning by default (consequences may bederived only because of lack of evidence of the contrary), abductive reasoning (consequences areonly deduced as most likely explanations) and some important approaches to reasoning aboutknowledge (the ignorance of a consequence must be retracted when the consequence becomesknown) and similarly belief revision (new knowledge may contradict old beliefs).Default reasoningAn example of a default assumption is that the typical bird flies. As a result, if a given animal isknown to be a bird, and nothing else is known, it can be assumed to be able to fly. The defaultassumption must however be retracted if it is later learned that the considered animal is apenguin. This example shows that a logic that models default reasoning should not bemonotonic. Logics formalizing default reasoning can be roughly divided in two categories:logics able to deal with arbitrary default assumptions (default logic, defeasible logic/defeasiblereasoning/argument (logic), and answer set programming) and logics that formalize the specific
    • default assumption that facts that are not known to be true can be assumed false by default(closed world assumption and circumscription).Abductive reasoningAbductive reasoning is the process of deriving the most likely explanations of the known facts.An abductive logic should not be monotonic because the most likely explanations are notnecessarily correct. For example, the most likely explanation for seeing wet grass is that itrained; however, this explanation has to be retracted when learning that the real cause of thegrass being wet was a sprinkler. Since the old explanation (it rained) is retracted because of theaddition of a piece of knowledge (a sprinkler was active), any logic that models explanations isnon-monotonic.Reasoning about knowledgeIf a logic includes formulae that mean that something is not known, this logic should not bemonotonic. Indeed, learning something that was previously not known leads to the removal ofthe formula specifying that this piece of knowledge is not known. This second change (a removalcaused by an addition) violates the condition of monotonicity. A logic for reasoning aboutknowledge is the autoepistemic logic.Belief revisionBelief revision is the process of changing beliefs to accommodate a new belief that might beinconsistent with the old ones. In the assumption that the new belief is correct, some of the oldones have to be retracted in order to maintain consistency. This retraction in response to anaddition of a new belief makes any logic for belief revision to be non-monotonic. The beliefrevision approach is alternative to paraconsistent logics, which tolerate inconsistency rather thanattempting to remove it.What makes belief revision non-trivial is that several different ways for performing thisoperation may be possible. For example, if the current knowledge includes the three facts “A istrue”, “B is true” and “if A and B are true then C is true”, the introduction of the new information“C is false” can be done preserving consistency only by removing at least one of the three facts.In this case, there are at least three different ways for performing revision. In general, there maybe several different ways for changing knowledge.Fuzzy LogicThe concept of Fuzzy Logic (FL) was conceived by Lotfi Zadeh, a professor at the University ofCalifornia at Berkley, and presented not as a control methodology, but as a way of processingdata by allowing partial set membership rather than crisp set membership or non-membership.This approach to set theory was not applied to control systems until the 70s due to insufficientsmall-computer capability prior to that time. Professor Zadeh reasoned that people do not requireprecise, numerical information input, and yet they are capable of highly adaptive control. Iffeedback controllers could be programmed to accept noisy, imprecise input, they would be much
    • more effective and perhaps easier to implement. Unfortunately, U.S. manufacturers have notbeen so quick to embrace this technology while the Europeans and Japanese have beenaggressively building real products around it.WHAT IS FUZZY LOGIC?In this context, FL is a problem-solving control system methodology that lends itself toimplementation in systems ranging from simple, small, embedded micro-controllers to large,networked, multi-channel PC or workstation-based data acquisition and control systems. It canbe implemented in hardware, software, or a combination of both. FL provides a simple way toarrive at a definite conclusion based upon vague, ambiguous, imprecise, noisy, or missing inputinformation. FLs approach to control problems mimics how a person would make decisions,only much faster.HOW IS FL DIFFERENT FROM CONVENTIONAL CONTROL METHODS?FL incorporates a simple, rule-based IF X AND Y THEN Z approach to a solving controlproblem rather than attempting to model a system mathematically. The FL model is empirically-based, relying on an operators experience rather than their technical understanding of thesystem. For example, rather than dealing with temperature control in terms such as "SP =500F","T <1000F", or "210C <TEMP <220C", terms like "IF (process is too cool) AND (process isgetting colder) THEN (add heat to the process)" or "IF (process is too hot) AND (process isheating rapidly) THEN (cool the process quickly)" are used. These terms are imprecise and yetvery descriptive of what must actually happen. Consider what you do in the shower if thetemperature is too cold: you will make the water comfortable very quickly with little trouble. FLis capable of mimicking this type of behavior but at very high rate.HOW DOES FL WORK?FL requires some numerical parameters in order to operate such as what is considered significanterror and significant rate-of-change-of-error, but exact values of these numbers are usually notcritical unless very responsive performance is required in which case empirical tuning woulddetermine them. For example, a simple temperature control system could use a singletemperature feedback sensor whose data is subtracted from the command signal to compute"error" and then time-differentiated to yield the error slope or rate-of-change-of-error, hereaftercalled "error-dot". Error might have units of degs F and a small error considered to be 2F while alarge error is 5F. The "error-dot" might then have units of degs/min with a small error-dot being5F/min and a large one being 15F/min. These values dont have to be symmetrical and can be"tweaked" once the system is operating in order to optimize performance. Generally, FL is soforgiving that the system will probably work the first time without any tweaking.Dempster/Shafer TheoryThe Dempster-Shafer theory, also known as the theory of belief functions, is a generalization ofthe Bayesian theory of subjective probability. Whereas the Bayesian theory requires probabilitiesfor each question of interest, belief functions allow us to base degrees of belief for one question
    • on probabilities for a related question. These degrees of belief may or may not have themathematical properties of probabilities; how much they differ from probabilities will depend onhow closely the two questions are related.The Dempster-Shafer theory owes its name to work by A. P. Dempster (1968) and Glenn Shafer(1976), but the kind of reasoning the theory uses can be found as far back as the seventeenthcentury. The theory came to the attention of AI researchers in the early 1980s, when they weretrying to adapt probability theory to expert systems. Dempster-Shafer degrees of belief resemblethe certainty factors in MYCIN, and this resemblance suggested that they might combine therigor of probability theory with the flexibility of rule-based systems. Subsequent work has madeclear that the management of uncertainty inherently requires more structure than is available insimple rule-based systems, but the Dempster-Shafer theory remains attractive because of itsrelative flexibility.The Dempster-Shafer theory is based on two ideas: the idea of obtaining degrees of belief for onequestion from subjective probabilities for a related question, and Dempsters rule for combiningsuch degrees of belief when they are based on independent items of evidence.To illustrate the idea of obtaining degrees of belief for one question from subjective probabilitiesfor another, suppose I have subjective probabilities for the reliability of my friend Jon. Myprobability that he is reliable is 0.9, and my probability that he is unreliable is 0.1. Suppose hetells me a limb fell on my car. This statement, which must true if she is reliable, is notnecessarily false if she is unreliable. So his testimony alone justifies a 0.9 degree of belief that alimb fell on my car, but only a zero degree of belief (not a 0.1 degree of belief) that no limb fellon my car. This zero does not mean that I am sure that no limb fell on my car, as a zeroprobability would; it merely means that jons testimony gives me no reason to believe that nolimb fell on my car. The 0.9 and the zero together constitute a belief function.Knowledge AcquisitionKnowledge Acquisition is concerned with the development of knowledge bases based on theexpertise of a human expert. This requires to express knowledge in a formalism suitable forautomatic interpretation. Within this field, research at UNSW focusses on incrementalknowledge acquisition techniques, which allow a human expert to provide explanations of theirdecisions that are automatically integrated into sophisticated knowledge bases.Types of LearningLearning is acquiring new knowledge, behaviors, skills, values, preferences or understanding,and may involve synthesizing different types of information. The ability to learn is possessed byhumans, animals and some machines. Progress over time tends to follow learning curves.Human learning may occur as part of education or personal development. It may be goal-oriented and may be aided by motivation. The study of how learning occurs is part ofneuropsychology, educational psychology, learning theory, and pedagogy.
    • Learning may occur as a result of habituation or classical conditioning, seen in many animalspecies, or as a result of more complex activities such as play, seen only in relatively intelligentanimals and humans. Learning may occur consciously or without conscious awareness. There isevidence for human behavioral learning prenatally, in which habituation has been observed asearly as 32 weeks into gestation, indicating that the central nervous system is sufficientlydeveloped and primed for learning and memory to occur very early on in development.Play has been approached by several theorists as the first form of learning. Children play,experiment with the world, learn the rules, and learn to interact. Vygotsky agrees that play ispivotal for childrens development, since they make meaning of their environment through play.Types of LearningHabituationIn psychology, habituation is an example of non-associative learning in which there is aprogressive diminution of behavioral response probability with repetition of a stimulus. It isanother form of integration. An animal first responds to a stimulus, but if it is neither rewardingnor harmful the animal reduces subsequent responses. One example of this can be seen in smallsong birds - if a stuffed owl (or similar predator) is put into the cage, the birds initially react to itas though it were a real predator. Soon the birds react less, showing habituation. If anotherstuffed owl is introduced (or the same one removed and re-introduced), the birds react to it againas though it were a predator, demonstrating that it is only a very specific stimulus that ishabituated to (namely, one particular unmoving owl in one place). Habituation has been shownin essentially every species of animal, including the large protozoan Stentor Coeruleus.SensitizationSensitization is an example of non-associative learning in which the progressive amplification ofa response follows repeated administrations of a stimulus (Bell et al., 1995). An everydayexample of this mechanism is the repeated tonic stimulation of peripheral nerves that will occurif a person rubs his arm continuously. After a while, this stimulation will create a warm sensationthat will eventually turn painful. The pain is the result of the progressively amplified synapticresponse of the peripheral nerves warning the person that the stimulation is harmful.Sensitization is thought to underlie both adaptive as well as maladaptive learning processes in theorganism.Asociative learningAssociative learning is the process by which an element is learned through association with aseparate, pre-occurring element.Operant conditioningOperant conditioning is the use of consequences to modify the occurrence and form of behavior.Operant conditioning is distinguished from Pavlovian conditioning in that operant conditioning
    • deals with the modification of voluntary behavior. Discrimination learning is a major form ofoperant conditioning. One form of it is called Errorless learning.Classical conditioningThe typical paradigm for classical conditioning involves repeatedly pairing an unconditionedstimulus (which unfailingly evokes a particular response) with another previously neutralstimulus (which does not normally evoke the response). Following conditioning, the responseoccurs both to the unconditioned stimulus and to the other, unrelated stimulus (now referred to asthe "conditioned stimulus"). The response to the conditioned stimulus is termed a conditionedresponse.ImprintingImprinting is the term used in psychology and ethology to describe any kind of phase-sensitivelearning (learning occurring at a particular age or a particular life stage) that is rapid andapparently independent of the consequences of behavior. It was first used to describe situationsin which an animal or person learns the characteristics of some stimulus, which is therefore saidto be "imprinted" onto the subject.Observational learningThe learning process most characteristic of humans is imitation; ones personal repetition of anobserved behaviour, such as a dance. Humans can copy three types of informationsimultaneously: the demonstrators goals, actions and environmental outcomes (results, seeEmulation (observational learning)). Through copying these types of information, (most) infantswill tune into their surrounding culture.Multimedia learningThe learning where learner uses multimedia learning environments. This type of learning relieson dual-coding theory.e-Learning and Augmented LearningElectronic learning or e-learning is a general term used to refer to Internet-based networkedcomputer-enhanced learning. A specific and always more diffused e-learning is mobile learning(m-Learning), it uses different mobile telecommunication equipments, such as cellular phones.When a learner interacts with the e-learning environment, its called augmented learning. Byadapting to the needs of individuals, the context-driven instruction can be dynamically tailored tothe learners natural environment. Augmented digital content may include text, images, video,audio (music and voice). By personalizing instruction, augmented learning has been shown toimprove learning performance for a lifetime.Rote learning
    • Rote learning is a technique which avoids understanding the inner complexities and inferences ofthe subject that is being learned and instead focuses on memorizing the material so that it can berecalled by the learner exactly the way it was read or heard. The major practice involved in rotelearning techniques is learning by repetition, based on the idea that one will be able to quicklyrecall the meaning of the material the more it is repeated. Rote learning is used in diverse areas,from mathematics to music to religion. Although it has been criticized by some schools ofthought, rote learning is a necessity in many situations.Informal learningInformal learning occurs through the experience of day-to-day situations (for example, onewould learn to look ahead while walking because of the danger inherent in not paying attentionto where one is going). It is learning from life, during a meal at table with parents, Play,exploring.Formal learningFormal learning is learning that takes place within a teacher-student relationship, such as in aschool system.Learning Automata !An automaton is a machine or control mechanism designed to automatically follow apredetermined sequence of operations or respond to encoded instructions. The term stochasticemphasizes the adaptive nature of the automaton we describe here. The automaton describedhere do not follow predetermined rules, but adapts to changes in its environment. This adaptationis the result of the learning process described in this chapter."The concept of learning automaton grew out of a fusion of the work of psychologists inmodeling observed behavior, the efforts of statisticians to model the choice of experiments basedon past observations, the attempts of operation researchers to implement optimal strategies in thecontext of the two-armed bandit problem, and the endeavors of system theorists to make rationaldecisions in random environments"In classical control theory, the control of a process is based on complete knowledge of theprocess/system. The mathematical model is assumed to be known, and the inputs to the processare deterministic functions of time. Later developments in control theory considered theuncertainties present in the system. Stochastic control theory assumes that some of thecharacteristics of the uncertainties are known. However, all those assumptions on uncertaintiesand/or input functions may be insufficient to successfully control the system if changes. It is thennecessary to observe the process in operation and obtain further knowledge of the system, i.e.,additional information must be acquired on-line since a priori assumptions are not sufficient. Oneapproach is to view these as problems in learning.Rule-based systems, although performing well on many control problems, have the disadvantageof requiring modifications, even for a minor change in the problem space. Furthermore, rule-
    • based approach, especially expert systems, cannot handle unanticipated situations. The ideabehind designing a learning system is to guarantee robust behavior without the completeknowledge, if any, of the system/environment to be controlled. A crucial advantage ofreinforcement learning compared to other learning approaches is that it requires no informationabout the environment except for the reinforcement signal.A reinforcement learning system is slower than other approaches for most applications sinceevery action needs to be tested a number of times for a satisfactory performance. Either thelearning process must be much faster than the environment changes or the reinforcementlearning must be combined with an adaptive forward model that anticipates the changes in theenvironmentLearning is defined as any permanent change in behavior as a result of past experience, and alearning system should therefore have the ability to improve its behavior with time, toward afinal goal. In a purely mathematical context, the goal of a learning system is the optimization of afunctional not known explicitlyIn the 1960’s, Y. Z. Tsypkin [Tsypkin71] introduced a method to reduce the problem to thedetermination of an optimal set of parameters and then apply stochastic hill climbing techniques.M.L. Tsetlin and colleagues [Tsetlin73] started the work on learning automata during the sameperiod. An alternative approach to applying stochastic hill-climbing techniques, introduced byNarendra and Viswanathan is to regard the problem as one of finding an optimal action out of aset of allowable actions and to achieve this using stochastic automata. The difference betweenthe two approaches is that the former updates the parameter space at each iteration while the laterupdates the probability space.The stochastic automaton attempts a solution of the problem without any information on theoptimal action (initially, equal probabilities are attached to all the actions). One action is selectedat random, the response from the environment is observed, action probabilities are updated basedon that response, and the procedure is repeated. A stochastic automaton acting as described toimprove its performance is called a learning automatonGenetic algorithms !Genetic algorithms are one of the best ways to solve a problem for which little is known. Theyare a very general algorithm and so will work well in any search space. All you need to know iswhat you need the solution to be able to do well, and a genetic algorithm will be able to create ahigh quality solution. Genetic algorithms use the principles of selection and evolution to produceseveral solutions to a given problem.Genetic algorithms tend to thrive in an environment in which there is a very large set ofcandidate solutions and in which the search space is uneven and has many hills and valleys.True, genetic algorithms will do well in any environment, but they will be greatly outclassed bymore situation specific algorithms in the simpler search spaces. Therefore you must keep in mindthat genetic algorithms are not always the best choice. Sometimes they can take quite a while torun and are therefore not always feasible for real time use. They are, however, one of the most
    • powerful methods with which to (relatively) quickly create high quality solutions to a problem.Now, before we start, Im going to provide you with some key terms so that this article makessense. • Individual - Any possible solution • Population - Group of all individuals • Search Space - All possible solutions to the problem • Chromosome - Blueprint for an individual • Trait - Possible aspect of an individual • Allele - Possible settings for a trait • Locus - The position of a gene on the chromosome • Genome - Collection of all chromosomes for an individualBasics of Genetic AlgorithmsThe most common type of genetic algorithm works like this: a population is created with a groupof individuals created randomly. The individuals in the population are then evaluated. Theevaluation function is provided by the programmer and gives the individuals a score based onhow well they perform at the given task. Two individuals are then selected based on their fitness,the higher the fitness, the higher the chance of being selected. These individuals then"reproduce" to create one or more offspring, after which the offspring are mutated randomly.This continues until a suitable solution has been found or a certain number of generations havepassed, depending on the needs of the programmer.SelectionWhile there are many different types of selection, I will cover the most common type - roulettewheel selection. In roulette wheel selection, individuals are given a probability of being selectedthat is directly proportionate to their fitness. Two individuals are then chosen randomly based onthese probabilities and produce offspring. Pseudo-code for a roulette wheel selection algorithm isshown below.for all members of populationsum += fitness of this individualend forfor all members of populationprobability = sum of probabilities + (fitness / sum)sum of probabilities += probabilityend forloop until new population is fulldo this twicenumber = Random between 0 and 1for all members of populationif number > probability but less than next probabilitythen you have been selected
    • end forendcreate offspringend loopCrossoverSo now you have selected your individuals, and you know that you are supposed to somehowproduce offspring with them, but how should you go about doing it? The most common solutionis something called crossover, and while there are many different kinds of crossover, the mostcommon type is single point crossover. In single point crossover, you choose a locus at whichyou swap the remaining alleles from on parent to the other. This is complex and is bestunderstood visually.As you can see, the children take one section of the chromosome from each parent. The point atwhich the chromosome is broken depends on the randomly selected crossover point. Thisparticular method is called single point crossover because only one crossover point exists.Sometimes only child 1 or child 2 is created, but oftentimes both offspring are created and putinto the new population. Crossover does not always occur, however. Sometimes, based on a setprobability, no crossover occurs and the parents are copied directly to the new population. Theprobability of crossover occurring is usually 60% to 70%.MutationAfter selection and crossover, you now have a new population full of individuals. Some aredirectly copied, and others are produced by crossover. In order to ensure that the individuals arenot all exactly the same, you allow for a small chance of mutation. You loop through all thealleles of all the individuals, and if that allele is selected for mutation, you can either change it bya small amount or replace it with a new value. The probability of mutation is usually between 1and 2 tenths of a percent. A visual for mutation is shown below.As you can easily see, mutation is fairly simple. You just change the selected alleles based onwhat you feel is necessary and move on. Mutation is, however, vital to ensuring genetic diversitywithin the population.Applications
    • Genetic algorithms are a very effective way of quickly finding a reasonable solution to acomplex problem. Granted they arent instantaneous, or even close, but they do an excellent jobof searching through a large and complex search space. Genetic algorithms are most effective ina search space for which little is known. You may know exactly what you want a solution to dobut have no idea how you want it to go about doing it. This is where genetic algorithms thrive.They produce solutions that solve the problem in ways you may never have even considered.Then again, they can also produce solutions that only work within the test environment andflounder once you try to use them in the real world. Put simply: use genetic algorithms foreverything you cannot easily do with another algorithm.