Ch 6 final


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Ch 6 final

  1. 1. Semantic NetsFramesSlots ExceptionsSlot – Values as ObjectsHandling UncertaintiesProbabilistic ReasoningUse of Certainty factorFuzzy LogicUnit 6Structured Knowledge RepresentationSemantic Net Learning Objectives After reading this unit you should appreciate the following: • Semantic Nets • Frames • Slots Exceptions • Handling Uncertainties • Probabilistic Reasoning • Use of Certainty Factors • Fuzzy LogicTopSemantic NetsIn a semantic net, information is represented as a set of nodes connected to each other by a setof labelled ones, which represent relationships among the nodes. A fragment of a typicalsemantic net is shown in Figure 6.1.
  2. 2. 140 ARTIFICIAL INTELLIGENCE Figure 6.1: A Semantic NetworkThis network contains example of both the isa and instance relations, as well as some other, moredomain-specific relations like team and uniform-color. In this network we would use inheritance toderive the additional relation. has-part (Pee-Wee-Reese, Nose)Intersection SearchIntersection search is one of the early ways that semantic nets used to find relationships amongobjects by spreading activation out from each of two nodes and seeing where the activation met.Using this process, it is possible to use the network of Figure 6.1 to answer questions such as“What is the connection between the Brooklyn Dodgers and blue?” This kind of reasoning exploitsone of the important advantages that slot-and-filler structures have over purely logicalrepresentation because it takes advantage of the entity –based organization of knowledge thatslot-and-filler representation provide.To answer more structured questions, however, requires networks that are themselves morehighly structured. In the next few sections we expand and refine our notion of a network in orderto support more sophisticated reasoning.Representing Nonbinary PredicatesSemantic nets are a natural way to represent relationships that would appear as ground instanceof binary predicates in predicate logic. For example, some of the areas from Figure 6.1 could berepresented in logic as. (Figure 6.2)
  3. 3. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 141 Figure 6.2: A Semantic Net for an n-Place Predicate isa (Person, Mammal) instance (Pee-Wee-Reese, Person) team (Pee-Wee-Reese, Broklyn-Dodgers) uniform-color(Pee-Wee-Reese,Blue)But the knowledge expressed by predicates of other airties can also be expressed in semanticnets. We have already seen that many unary predicates in logic can be though of a binarypredicates using some very general-purpose predicates, such as isa and instance. So forexample, man(Marcus)could be rewritten as instance (Marcus, Man)thereby making it easy to represent in a semantic net.Three or more place predicates can also be converted to a binary form by creating one newobject representing the entire predicate statement and then introducing binary predicates todescribe the relationship to this new object of each of the original arguments. For example,suppose we know that score (Cubs, Dodgers, 5-3)This can be represented in a semantic net by creating a node to represent the specific game andthen relating each of the three pieces of information to it. Doing this produces the network shownin Figure 6.2.This technique is particularly useful for representing the contents of a typical declarative sentencethat describes several aspects of a particular event. The sentence could be represented by thenetwork show in Figure 6.3.
  4. 4. 142 ARTIFICIAL INTELLIGENCE John gave the book to Mary Figure 6.3: A Semantic Net Representing a SentenceMaking Some Important DistinctionsIn the networks we have described so far, we have glossed over some distinctions that areimportant in reasoning. For example, there should be a difference between a link that defines anew entity and one that relates two existing entities. Consider the netBoth nodes represent objects that exist independent of their relationship to each other. But nowsuppose we want to represent the fact that John is taller than Bill, using the net.The nodes H1 and H2 are new concepts representing John’s height and Bills’ height,respectively. They are defined by their relationships to the nodes John and Bill. Using thesedefined concepts, it is possible to represent such facts as that John’s height increased, which wecould not do before. (The number 72 increased.)Sometimes it is useful to introduce the arc value to make this distinction clear. Thus we might usethe following net to represent the fact that John is 6 feet tall and that he is taller than Bill:
  5. 5. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 143The procedures that operate on nets such as this can exploit the fact that some arcs, such asheight, define new entities, while others, such as greater-than and value, merely describerelationship among existing entities.Yet another example that we have missed is the difference between the properties of a node itselfand the properties that a node simply holds and passes on to its instances. For example, it is aproperly of the node Person that it is a subclass of the node Mammal. But the node Person doesnot have as one of its parts a nose. Instances of the node Person do, and we want them to inheritit.It’s easier said than done to capture these distinctions without assigning more structure to ournotions of node, link, and value. But first, we discuss a network-oriented solution to a simplerproblem; this solution illustrates what can be done in the network model but at what price incomplexity.Partitioned Semantic NetsSuppose we want to represent simple quantified expressions in semantic nets. One was to do thisis to partition the semantic net into a hierarchical set of spaces, each of which corresponds to thescope of one or more variables. To see how this works, consider first the simple net shown inFigure 6.4. This net corresponds to the statement. The dog bit the mail carrier.The nodes Dogs, Bite, and Mail-Carrier represent the classes of dogs, bitings, and mail carriers,respectively, while the nodes d, b, and m represent a particular dog, a particular biting, and aparticular mail carrier. A single net with no partitioning can easily represent this fact.But now suppose that we want to represent the fact Every dog has bitten a mail carrier.
  6. 6. 144 ARTIFICIAL INTELLIGENCE Figure 6.4: Using Partitioned Semantic Nets ∀ : Dog ( x ) →∃ : Mail− Carrier( y ) ∧ Bite( x , y ) x yIt is necessary to encode the scope of the universally quantified variable x in order to representthis variable. This can be done using partitioning as shown in Figure 6.4(b). The node stands forthe assertion given above. Node g is an instance of the special class GS of general statementsabout the world (i.e., those with universal quantifiers). Every element to GS has at least twoattributes: a form, which states the relation that is being assert one or more ∀ connections, onefor each of the universally quatified variable. In this example, there is only one such variable d,which can stand for any element the class Dogs. The other two variables in the form, b and m,are understood to existentially qualified. In other words, for every dog d, there exists a bitingevent and a mail carrier m, such that d is the assailant of b and m is the victim.To see how partitioning makes variable quantification explicit, consider next similar sentence:Every dog in town has bitten the constableThe representation of this sentence is shown in Figure 6.4 (c). In this net, the node representingthe victim lies outside the form of the general statement. Thus it is not viewed as an existentiallyquantified variable whose value may depend on the value of d. Instead it is interpreted asstanding for a specific entity (in this case, a particular table), just as do other nodes in a standard,no partitioned net. Figure 6.4(d) shows how yet another similar sentencesEvery dog has bitten every mail carrier.
  7. 7. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 145should be represented. In this case, g has two ∀ links, one pointing to d, which represents dog,and one pointing to m, representing ay mail carrier.The spaces of a partitioned semantic net are related to each other by an inclusion Searchy. Forexample, in Figure 6.4(d), space SI is included in space SA. Whenever search process operatesin a partitioned semantic net, it can explore nodes and arcs in space from which it starts and inother spaces that contain the starting point, but it does not go downward, except in specialcircumstances, such as when a form arc is being traversed. So, returning to figure 6.4(d), fromnode d it can be determined that d must be a dog. But if we were to start at the node Dogs andsearch for all known instances dogs by traversing is a likes, we would not find d since it and thelink to it are in space SI, which is at a lower level than space SA, which contains Dogs. This isconstant, since d does not stand for a particular dog; it is merely a variable that can be initiatedwith a value that represents a dog. Student Activity 6.1Before reading next section, answer the following questions.1. How semantic nets are useful in knowledge representation?2. Draw semantic nets for the following facts: a. Every player hits the ball. b. Every dog eats a cat.If your answers are correct, then proceed to the next section.The Evolution into FramesThe idea of a semantic net started out simply as a way to represent labelled connections amongentities. But, as we have just seen, as we expand the rage of problem-solving tasks that therepresentation must support, the representation itself necessarily begins to become morecomplex. In particular, it becomes useful to assign more structure to nodes as well as to links.Although there is no clear distinction between a semantic net and a frame system, the morestructure the system has, the more likely it is to be termed a frame system. In the next section wecontinue our discussion of structured slot and-filler representations by describing some of themost important capabilities that frame systems offer.Top
  8. 8. 146 ARTIFICIAL INTELLIGENCEFramesA frame is a collection of attributes called slots and associated values (and possibly constraintson values) that describe some entity in the world. Sometimes a frame describes an entity in someabsolute sense; sometimes it represents the entity from particular point of view. A single frametaken alone is rarely useful. Instead, we build frame systems out of collections of frames that areconnected to each other by virtue of the fact that the value of an attribute of one frame may beanother frame. In the rest of this section, we expand on this simple definition and explore waysthat frame systems can be used to encode knowledge and support reasoning.Default FramesSet theory provides a good basis for understanding frame systems. Although not all framesystems are defined this way, we do so here. In this view, each frame represents either a class (aset) or an instance (an element of a class). To see how this works, consider the frame systemshown in Figure 6.5. In this example, the frames Person, Adult-Male, ML-Baseball-Player(corresponding to major league baseball players) Pitter, and ML-Baseball-Team (for major leaguebaseball team) are all classes. The frame Pee-Wee-Reese and Brooklyn-Dodgers are instances.The is a relation that we have been using without a precise definition is in fact the subset relation.The isa of adult males is a subset of the set of people. The set of major league baseball players isa subset of the set of adult males, and so forth. Our instance relation corresponds to the relationelement of Pee Wee Reese that is an element of the set of fielders. Thus he is also an element ofall of the superset of fielders, including major league baseball players and people. The transitivityof isa that we have taken for granted in our description of property inheritance follows directlyfrom the transitivity of the subset relation.Both the isa and instance relations have inverse attributes, where we call subclasses and all-instances. We do not bother to write them explicitly in our examples unless we need to refer tothem. We assume that the frame system maintains them automatically. Either explicitly or bycomputing them if necessary.Since a class represents a set, there are two kinds of attributes that can be associated with it.There are attributes about the set itself, and there are attributes that are to be inherited by eachelement of the set. We indicate the difference between these two by prefixing the latter with anasterisk (*). For example, consider the class ML-Baseball-Player. We have shown only twoproperties of it as a set: It is a subset of the set of adult males. And it has cardinality 624 (i.e.,
  9. 9. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 147there are 624 major league baseball players). We have listed five properties that all major leaguebaseball players have (height, bats, batting-average, team, and uniform-colour), and we havespecified default values for the first three of them. By providing both kinds of slots, we allow aclass both to define a set of objects and to describe a prototypical object of the set.Sometimes, the distinction between a set and an individual instance may not be seen clearly. Forexample, the team Brookln-Dodgers, which we have described as a instance of the class of majorleague baseball teams, could be thought of as a set of players in fact, notice that the value of theslot players is a set. Suppose, instead, what we want to represent the Dodgers as a class insteadof an instance. Then its instances would be the individual players. It cannot stay where it is in theisa hierarchy; it cannot be a subclass of ML-Baseball-Team, because if it were, then its elements,namely the players would also, by the transitivity of subclass, be elements of ML-Baseball-team,which is not what we want so say. We have to put it somewhere else in the isa hierarchy. Forexample, we could make it a subclass of major league baseball players. Then its elements, theplayers, are also elements of ML-Baseball-Players, Adult-Male, and Person. That is acceptable.But if we do that, we lose the ability to inherit properties of the Dodges from general informationabout baseball teams. We can still inherit attributes for the elements of the team, but we cannotinherit properties of the team as a whole, i.e., of the set of players. For example, we might like toknow what the default size of the team is, that it has a manager, and so on. The easiest way toallow for this is to go back to the idea of the Dodgers as an instance of ML-Baseball-Team, withthe set of players given as a slot value. Person isa: Mammal cardinality: 6,000,000,000 *handed: Right Adult-Male isa: Person cardinality: 2,000,000,000 *height; 5-10 ML-Baseball-Player
  10. 10. 148 ARTIFICIAL INTELLIGENCE isa: Adult-Male cardinality: 624 *height: 6-1 *bats: equal to handed *batting-average: . 252 *team: *uniform-color: Fielder Isa: ML-Baseball-Player cardinality: 36 *batting-average .262 Johan insance: Fielder height: 5-10 bats: Right batting-average: . 309 team: Brooklyn-Dodgers uniform-color: Blue ML-Baseball-Team isa: Team cardinality: 26 *team-size: 24 *manager: 24 Brooklyn-Dodgers instance: ML-Baseball-Team team-size: 24 manager: Leo-Durocher players: (Johan,Pee-Wee-Reese,…) Figure 6.5: A Simplified Frame SystemBut what we have encountered here is an example of a more general problem. A class is a set,and we want to be able to talk about properties that its elements possess. We want to useinheritance to infer those properties from general knowledge about the set. But a class is also anentity in itself. It may possess properties that belong not to the individual instances but rather tothe class as a whole. In the case of Brooklyn-Dodgers, such properties included team size andthe existence of a manager. We may even want to inherit some of these properties from a moregeneral kind of set. For example, the Dodgers can inherit a default team size from the set of allmajor league baseball teams. To support this, we need to view a class as two thingssimultaneously: a subset (isa) of a larger class that also contains its elements and an instance(instance) of a class of sets, from which it inherits its set-level properties.To make this distinction clear, it is useful to distinguish between regular classes, whose elementsare individual entities, and meta classes, which are special classes whose elements arethemselves classes. A class is now an element of (instance) some class (or classes) as well as asubclass (isa) of one or more classes. A class inherits properties from the class of which it is an
  11. 11. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 149instance, just as any instance does. In addition, a class passes inheritable properties down fromis super classes to its instances.Let’s consider an example. Figure 6.6 shows how we would represent teams as classes usingthis distinction. Figure 6.7 shows a graphic view of the same classes. The most basic met class inthe class represents the set of all classes. All classes are instance of it, either directly or throughone of its subclasses. In the example, Team in a subclass (subset) of Class and ML-Baseball-Team is a subclass of Team. The class introduces the attribute cardinality, which is to beinherited by all instances of Class (including itself). This makes sense that all the instances ofClass are sets and all sets have cardinality.Team represents a subset of the set of all sets, namely those elements are sets of players on ateam. It inherits the property of having cardinality from Class. Team introduces the attribute team-size, which all its elements possess. Notice that team-size is like cardinality in that it measuresthe size of a set. But it applies to something different cardinality applies to sets of sets and isinherited by all elements of Class. The slot team-size applies to the element of those sets thathappen to be teams. Those elements are of individuals.ML-Baseball-Team is also an instance of Class, since it is a set. It inherits the property of havingcardinality from the set of which it is an instance, namely Class. But it is a subset of Team. All ofits instances will have the property of having a team-size since they are also instances of thesuper class Team. We have added at this level the additional fact that the default team size is 24;so all instance of ML-Baseball-Team will inherit that as well. In addition, we have added theinheritable slot manager.Brooklyn-Dodgers in an instance of a ML-Baseball-Team. It is not an instance of Class becauseits elements are individuals, not sets. Brooklyn-Dodgers is a subclass of ML-Baseball-Playersince all of its elements are also elements of that set. Since it is an instance of a ML-Baseball-Team, it inherits the properties team-size and manager, as well as their default values. It specifiesa new attribute uniform-colour, which is to be inherited by all of its instances (who will beindividual players). Class instance : Class isa : Class *cardinanality : Team
  12. 12. 150 ARTIFICIAL INTELLIGENCE istance : Class isa : Class cardinality : { the number of teams that exist} * team size : { each team has a size} ML – Baseball – Team instance : Class isa : Team cardinality : 26 { the number of baseball teams that exist} * team-size : 24 { default 24 players on a team} * manager : Brooklyn-Dodgers instance : ML – Baseball – Team isa : ML-Baseball – Player team-size : 24 manager : Leo – Durocher * unifoirm-color Blue Pee-Wee – Reese instance : Broklyn – Dodgers instance : Fielder uniform-color: Blue batting –average : 309 Figure 6.6 : Representing the Class of All Teams as a Metaclass Figure 6.7: Classes and MetaclassesFinally, Pee-Wee-Reese is an instance of Brooklyn-Dodgers. That makes him also, by transitivity up isalinks, an instance of ML-Baseball-Player. But recall that in earlier example we also used the classFielder, to which we attached the fact that fielders have above average batting averages. To allowthat here, we simply make Pee Wee an instance of Fielder as well. He will thus inherit properties
  13. 13. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 151from both Brooklyn-Dodgers and from fielder, as well as from the classes above these. We needto guarantee that when multiple inheritance occurs, as it does here, that it works correctly.Specified in this case, we need to assure that batting – average gets inherited from Fielder andnot from ML-Baseball-Player through Brooklyn-Dodgers.In all the frame system we illustrate, all classes are instances of the metaclass Class. As a result,they all have the attribute cardinality out of our description of our examples, though unless thereis some particular reason to include them.Every class is a set. But not every set should be described as a class. A class describes a set ofentries that share significant properties. In particular, the default information associated with aclass can be used as a basis for inferring values for the properties if it’s in individual elements. Sothere is an advantage to representing as a class these sets for which membership serves as abasis for nonmonotonic inheritance. Typically, these are sets in which membership is not highlyephemeral. Instead, membership is on some fundamental structural or functional properties. Tosee the difference, consider the following sets:• People• People who are major league baseball players• People who are on my plane to New YorkThe first two sets can be advantageously represented as classes, with which a sub-statisticalnumber of inheritable attributes can be associated. The last, though, is different. The onlyproperties that all the elements of that set probably share are the definition of the set itself andsome other properties that follow from the definition (e.g. they are being transported from oneplace to another). A simple set, with some associated assertions, is adequate to represent thesefacts: non-monotonic inheritance is not necessary.Other Ways of Relating Classes to Each OtherWe have talked up to this point about two ways in which classes(sets) can be related to eachother. Class1 can be a subset of Class2. Or, if Class2 is a metaclass, then Class1 can be aninstance of Class2. But there are other ways that classes can be related to each other,corresponding to ways that sets of objects in the world can be related.One such relationship is mutually– disjoint– with, which relates a class to one or more other classesthat are guaranteed to have no elements in common with it. Another important relationship is is-
  14. 14. 152 ARTIFICIAL INTELLIGENCEcovered-by, which relates class to a set of subclasses, the man of which is equal to it. If a class iscovered by a set S of mutually disjoint classes, then S is called a partition of the class.For examples of these relationships, consider the classes shown in Figure 6.7, which representtwo orthogonal ways of decomposing the class of major league baseball players. Everyone iseither a pitcher a catcher, or a fielder (and no one is more than one if these). In addition,everyone plays in either the Normal League or the American League but not both.TopSlots ExceptionsWe have provided a way to describe sets of objects and individual objects, both in terms ofattributes and values. Thus we have made extensive use of attributes, which we haverepresented as slots attached to frames. But it turns out that there are several means why wewould like to be able to represent attributes explicitly and describe their properties. Some of theproperties we would like to be able to represent and use in meaning include: • The classes to which the attribute can be attached, i.e. for what classes does it make sense? For example, weight makes sense for physical objects but not for conceptual ones (except in some metaphorical sense). • A value that all instances of a class must have by the definition of the class. • A default value for the attribute. • Rules for inheriting values for the attribute. The usual rule is to inherit down isa and instance links. But some attributes inherit in other ways. For example last-name inherits down the child of link. ML-Baseball-Player is-covered by : { Pitcher,Catcher, Fielder} { American-Leaguer , National-Leagwer} Pitcher isa : ML-Baseball –Player mutually – disjoint with: { catcher, Fielder}
  15. 15. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 153 Catcher isa : ML-Baseball – Player mutually-disjoint –with : {Pitcher, Fielder} Fielder isa : ML-Baseball Player mutually –disjoint-with : { Pitcher, Catcher} American – Leaguer isa : ML-Baseball –Player mutually-disjoint-with { National-Leaguer } National Leaguer isa : ML-Baseball-Player mutually-disjoint-with : {american-Leaguer} Three-Finger-Brown instance : Pitcher instance : National – Leaguer Figure 6.8 : Representing Relationships among Classes • Rules for computing a value separately from inheritance. One extreme form of such a rule is a procedure written in some procedural programming language such as LISP. • An inverse attribute. • Whether the slot is single – valued or multivalued.In order to be able to represent these attributes of attributes, we need to describe attributes (slots)as frames. These frames will be organized into an isa hierarchy, just as any other frames forattributes of slots. Before we can describe such a hierarchy in detail, we need to formalize ournotion of a slot.A slot is a relation. It maps from elements of its domain (the classes for which it makes sense) toelements of its range (its possible values). A relation is a set of ordered pair. Thus it makes senseto say that one relation (R1) is a subset of another (R2). In the case, R1 is a specification of R2, soin our terminology is a (R1, R2). Since a slot is yet the set of all slots, which we will call Slot, is ametaclass. Its instances are slots, which may have subslots.Figure 6.8 and 6.9 illustrate several examples of slots represented as frames of slot metaclass. Itsinstances are slots (each of which is a set of ordered pairs). Associated with the metaclass areattributes that each instance(i.e. each actual slot) will inherit. Each slot, since it is a relation, has adomain and a range. We represent the domain in the slot labelled domain. We break up therepresentation of the range into two parts: contains logical expressions that further constrain therange to be TRUE. The advantage to breaking the description apart into these two pieces is that
  16. 16. 154 ARTIFICIAL INTELLIGENCEtype checking a such cheaper than is arbitrary constraint checking, so it is useful to be able to doit separately and early during some reasoning processes.The other slots do what you would expect from their names. If there is a value for definition, itmust be propagated to all instances of the slot. If there is a value for default, that value isinherited to all instance of the slot unless there is an overriding value. The attribute transfers listsother slots from which values for this slot can be derived through inheritance. The to-compute slotcontains a procedure for deriving its value. Inverse, sometimes they are not useful enough inreasoning to be worth representing. And single valued is used to mark the special cases in whichthe slot is a function and can have only one value.Of course, there is a no advantage of representing these properties of slots if there is a reasoningmechanism that exploits them. In the rest of our discussion we assume for the frame systeminterpreter knows how to reason with all of these slots of slots as part of its built-in reasoningcapability. In particular, we assume that it is capable of forming the following reasoning actions.• Consistency checking to verify that when a slot value is added to a frame - The slot makes sense for the frame. This relies on the domain attribute of the slot. Slot isa : Class instance : Class domain : range : range-constraint : definition : default :
  17. 17. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 155 trangulars-through : to-compute : inverse : single-valued : manager instance : Slot domain : ML-Baseball –Team range : Person range-constraint : kx (baseball-experience x, manager) default : inverse : manager – of single – valued : TRUE Figure 6. 9 ; Representing Slots as Frames , I My – manager instance : Slot domain : ML-Baseball Player range : Person range-constraint : kx(baseball-experience x any manager) to-compute : kx( x,team), manager single-valued : TRUE Colour instance : Slot domain : Physical-Object range : Colour-Set transfer-through : top-level-part-of visual-salience : High single-valued : FALSE Uniform-colour instance : Slot isa : colour domain : team – Player range : Colour – Set range-constraint : non-Pink visual-sailence : High single-valued : FALSE Bats instance : Slot domain : ML-Baseball-Player range : ( Left,Right, Switch) to-compute : kx x, handed single-valued : TRUE Figure 6.10 : Representing Slots as Frames II - The value is a legal value for the slot. This relies on the range and range – constraint attributes.• Maintenance of consistency between the values for slots and their inverses whenever one is updated.• Propagation of definition values along isa and instance links.
  18. 18. 156 ARTIFICIAL INTELLIGENCE• Inheritance of default values along isa and instance links.• Computation of a value of a slot as needed. This relies on the to-compute and transfers through attributes.• Checking that only a single value is asserted for single –valued slots. Replacing an old value by the new one when it is asserted usually does this. An alternative is to force explicit retraction of the old value and to signal a contradiction if a new value is asserted when another is already there.There is something slightly counterintuitive about this way of defining slots. We have definedproperties range – constraint and default as parts of a slot. But we then think of them as beingproperties of a slot associated with a particular class. For example in Figure 6.11, we listed twodefaults for the batting – average slot, one associated with major league baseball players and oneassociated with fielders. Figure 6.10 shows how this can be represented correctly by creating aspecialization of batting average that can he associated with a specialization of ML-Baseball-Player to represent the more special information that is known about the specialized class. Thisseems cumbersome. It is natural, though given our definition of a slot as relation. There are reallytwo relations here, one a specialization of the other. And below we will define inheritance so that itlooks for values of either the slot it is given or any of that slot’s generations.Unfortunately, although this model of slots is simple and it is internally consisted it is not easy tosee. So we introduce some notational shorthand that allows the four most important properties ofa slot (domain range definition and default) to be defined implicitly by how the slot is used in thedefinitions of the classes in its domain. We describe the domain implicitly to be the class wherethe slot appears. We describe the range and any range constraints with the clause MUST BE, asthe value of an inherited slot. Figure 6.12 shows an example of this notation. And we describe thedefinition and the default. If they are present by inserting them as the value of the slot when oneappears. The two will be distinguished by perplexing a definitional value with an assts (“). Wethen let the underlying book keeping of the frame system create the frames to represent slots asthey are needed.Now let’s look at examples of how these slots can be used. The slots bats and my managerillustrate the use of the to-compute attribute of a slot. The variable x will be bound to the frame towhich the slot is attached. We are the notation to specify the value of a slot of a frame. Specially,x, y describes the value (s) of the y slot it frame x. So we know that to compute a frame a valuefor any manager, it is necessary find the frame’s value for team, then find the resulting team’smanager. We have simply composed two slots to form a new one. Computing the value of thebats slots is a even simpler. Just go get the value of the hand slot. Batting average instance : Slot domain : ML-Baseball Player range : Number range-constraint : kx( 0 < x range-constraint < 1 ) default : 252
  19. 19. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 157 single-valued : TRUE Fielder batting average instance : Slot isa : batting-average domain : Fielder range : Number range-constraint : kx 9o < x,range – constraint < 1) default : 262 single-valued : TRUE Figure 6.11 Associating Defaults with Slots ML-Baseball-Player Bats : MUST BE (Left, Right, Switch) Figure 6.12. A Shorthand Notation For Slot – Range SpecificationThe manager slots illustrate the use of a range constraint. It is stated in terms of a variable x,which is bound to the frame whose manager slot is being described. It requires that any managerbe not only a person but someone with baseball experience relies on the domain-specificfunction baseball experience, which must be defined somewhere in the system.The slots colour and uniform–colour illustrate the arrangements of slots in is history. The relationcolour is a fairly general one that holds between physical objects colour. The attribute uniform-colour is a restricted form of colour that applies only to team players and ones that are allowed forteam uniform (anything but pink). Arranging slots in a hierarchy is useful for the same reason thanarranging any thing else in a hierarchy is, it supports inheritance. In this example the general slotis known to have high visual salience. The more specific slot uniform colour then tests thisproperty, so it too is known to have high visual salience.The slot colour also illustrates the use of the transfer-through slot, which defines a way ofcomputing a slot’s value by retrieving it from the same slot of a related object as its example. Weused transfers through to capture the fact that if you take an object and chop it up into several toplevel parts (in other words, parts that are not contained for each other) then they will all be thesame colour. For example, the arm of a sofa is the colour as the sofa. Formally what transfersthrough means in this example is John Height : 72 Bill Height : Figure 6.13 Representing Slot-Values
  20. 20. 158 ARTIFICIAL INTELLIGENCEcolor(x,y) ∧ top-level–part-of( z,x)→ color(z,y)In addition to these domain independent slot attributes slots may have domain specific propertiesthat support problem solving in a particular domain. Since these slots are not treated explicity bythe frame system interpreter, they will be useful precisely to the extent that the domain problemsolver exploits them.TopSlot – Values as ObjectsWe have already retired with the notion of a slot by making it an explicit object that we couldmake assertions about. In some scene this was not necessary. A finite relation can be completelydescribed by listing its elements. But in practical knowledge based system one often does nothave that list. So, it can be very important to be able to make assertions about the list withoutknowing all of its elements. Rectification gave us a way to do this.The next step along this path is to do the same thing to a particular attribute-value (an instance ofa relation) that we did to the relation itself. We can rectify it and make it an object about whichassertion can be made. To see why we might want to do this let us return to the example of Johnand Bill’s height that we discussed in previous section. Figure 6.13 shows a frame-basedrepresentation of some of the facts. We could easily record Bill’s height if we knew it. Supposethat we do not know it. All we know is that John is taller than Bill. We need a way to make anassertion about the and its value as an object.We could attempt to do this the same way we made slots themselves into object namely byrepresenting them explicitly as frames. There seems little advantages to doing that in this case,though, because the main advantage of frames does not apply to slot values, frames areorganized into an in hierarchy and thus support inheritance. There is no basis for such anorganization of slot values. So instead we augment our value representation language to allowthe value of a slot to be stated as either or both:• A value of the type required by the slot.• A logical constraint on the value. This constraint may relate the slot’s value to the values of other slots or to domain constants. John Height : 72 : kx ( x , height > Bill height) Bill
  21. 21. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 159 Height : kx ( x, height < John height) Figure 6.14: Representating Slot – Values with Lambda NotationIf we do this to the frames of Figure 6.13 then we get the frames of Figure 6.14. We again use thelambda notation as a way to pick up the name of the frame that is being described. Student Activity 6.2Before reading next section, answer the following questions.1. Describe the database of a cricket team using frames.2. What is the difference between frames and meta-frames?3. Write short note on slot exceptions.If your answers are correct, then proceed to the next section.TopHandling UncertaintiesIt was implicitly assumed that a sufficient amount of reliable knowledge (facts, rules, and the like)was available to defend confident conclusions. While this form of reasoning is important, it suffersfrom several limitations.• It is not possible to describe many envisioned or real-world concepts; that is, it is limited in expressive power.• There is no way to express uncertain, imprecise, hypothetical or vague knowledge, or the truth or falsity of such statements.• Available inference methods are known to be inefficient.• There is no way to produce new knowledge about the world. It is only possible to add what is derivable from the axioms and theorems in the knowledge base.In other words, strict classical logic formalisms do not provide realistic representations of theworld in which we live. On the contrary, intelligent beings are continuously required to makedecisions under a veil of uncertainty.
  22. 22. 160 ARTIFICIAL INTELLIGENCEUncertainty can arise from a variety of sources. For one thing, the information we have availablemay be incomplete or highly volatile. Important facts and details that have bearing on theproblems at hand may be missing or may change rapidly. In addition, many of the “facts’ availablemay be imprecise, vague, or fuzzy Indeed, some of the available information may becontradictory or even unbelievable. However, despite these shortcomings, we humanmiraculously deal with uncertainties on a daily basis and usually arrive at reasonable solutions. Ifit were otherwise, we would not be able to cope with the continually changing situations of ourworld.In this and the following chapter, we shall discuss methods with which to accurately represent anddeal with different forms of inconsistency, uncertainty, possibility, and beliefs. In other words, weshall be interested in representations and inference methods related to what is known ascommonsense reasoning.Consider the following real-life situation. Timothy enjoys shopping in the mall only when thestores are not crowded. He has agreed to accompany Sue there on the following Friday eveningsince this is normally a time when few people shop. Before the given date, several of the largerstores announce a one time, special sale starting on that Friday evening. Timothy, fearing largecrowds, now retracts the offer to accompany Sue, promising to go on some future date. On theThursday before the sale was to commence, weather forecasts predicted heavy snow. Now,believing the weather would discourage most shoppers. Timothy once again agreed to join Sue.But, unexpectedly, on the given Friday, the forecasts proved to be false, so Timothy once againdeclined to go.This anecdote illustrates how one’s belief’s can change in a dynamic environment. And, whileone’s beliefs may not fluctuate as much as Timothy’s in most situations, this form of beliefrevision is not uncommon. Indeed it is common enough that we label it as a form ofcommonsense reasoning, that is, reasoning with uncertain knowledge.Non-monotonic ReasoningThe logics we studied in the previous chapter are known as monotonic logics. The conclusionsderived using such logics are valid deduction, and they remain so. Adding new axioms increasesthe amount of knowledge contained in the knowledge base. Therefore, the set of facts andinferences in such systems can only grow larger, they cannot be reduced. That is they increasemonotonically. The form of reasoning performed above by Timothy, on the other hand is non-monotonic. New facts became known which contradicted and invalidated old knowledge was
  23. 23. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 161retracted causing other dependent knowledge to become invalid, thereby requiring furtherretractions. The retractions led to a shrinkage or non-monotonic growth in the knowledge at times.More formally, let KBL be a formal first order system consisting of a knowledge base and somelogic L. Then, if KB1 and KB2 are knowledge bases where KB1 = KBL KB2 = KBL ∪ F, for some wff F, then KB1 ⊂ KB2In other words, a first order KB system can only grow monotonically with added knowledge.When building knowledge-based systems, it is not reasonable to expect that all the knowledgeneeded for a set of tasks could be acquired, validated, and loaded into the system at the outset.More typically, the initial knowledge will be incomplete, contain redundancies, inconsistencies,and other sources of uncertainty. Even if it were possible to assemble complete, valid knowledgeinitially, it probably would not remain valid forever, not in a continually changing environment.In an attempt to model real-world, commonsense reasoning, researchers have proposedextensions and alternatives to traditional logics such as PL and FOPL.The extensions accommodate different forms of uncertainty and non-monotony. In some cases,the proposed methods have been implemented. In other cases they are still topics of research. Inthis and the following chapter, we will examine some of the more important of these methods.We begin in the next section with a description of truth maintenancesystems (TMS), systems that have been implemented to permit a form of non-monotonicreasoning by permitting the addition of changing (even contradictory) statements to a knowledgebase. This is followed in further part. By a description of other methods that accommodate non-monotonic reasoning through default assumptions for incomplete knowledge bases. Theassumptions are plausible most of the time, but may have to be retracted if other conflicting factsare learned. Methods to constrain the knowledge that must be considered for a given problem areconsidered next. These methods also relate to non-monotonic reasoning. Section 5.4 gives abrief treatment of modal and temporal logics, which extend the expressive power of classicallogics to permit representations and reasoning about necessary and possible situations, temporal,and other related situations. The chapter is concluded with a brief presentation of a relatively newmethod for dealing with vague and imprecise information, namely fuzzy logic and languagecomputation.
  24. 24. 162 ARTIFICIAL INTELLIGENCETruth Maintenance SystemsTruth maintenance systems (also known as belief revision or revision maintenance systems) arecompanion components to inference systems. The main job of the TMS is to maintain consistencyof the knowledge being used by the problem solver and not to perform any inference functions.As such, it frees the problem solver from any concerns of consistency and allows it to concentrateon the problem solution aspects. The TMS also gives the inference component the latitude toperform non-mononic inferences. When new discoveries are made, this more recent informationcan displace previous conclusions that are no longer valid. In this way, the set of beliefs availableto the problem solver will continue to be current and consistent.Figure 6.15 illustrates the role played by the TMS as part of the problem solver. The inferenceengine (IE) solves domain problems based on its current belief set, while the TMS maintains thecurrently active belief set. The updating process is incremental. After each inference, informationis exchanged between the two components. The IE tells the, TMS what deductions it hasmade. The TMS, in turn, asks questions about current beliefs and reasons for failures. Itmaintains a consistent set of beliefs for the IE to work with even if new knowledge is added andremoved.For example, suppose the knowledge base (KB) contained only the propositions P, P → Q, andmodus pones. From this, the IE would rightfully conclude Q and add this conclusion to the KB.Later, if it was learned that ~P was appropriate, it would be added to the KB resulting in acontradiction. Consequently, it would be necessary to remove P to eliminate the inconsistency.But, with P now removed, Q is no longer a justified belief. It too should be removed. This type ofbelief revision is the job of the TMS.Actually, the TMS does not discard conclusions like Q as suggested. That could be wasteful,since P may again become valid, which would require that Q and facts justified by Q be rederived.Instead, the TMS maintains dependency records for all such conclusions. These recordsdetermine which sets of beliefs are current (which are to be used by the IE). Thus, Q would beremoved from the current belief set by making appropriate updates to the records and not byerasing Q. Since Q would not be lost, its reservation would not be necessary if P became validonce again.The TMS maintains complete records of reasons or justifications for beliefs. Each proposition orstatement having at least one valid justification is made a part of the current belief set.Statements lacking acceptable justifications are excluded from this set. When a contradiction is
  25. 25. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 163discovered, the statements responsible for the contradiction are identified and an appropriate oneis retracted. This in turn may result in other retractions and additions. The procedure used toperform this process is called dependency-directed backtracking. This process is described later. Figure 6.15: Architecture of the problem solver with a TMSThe TMS maintains records to reflect retractions and additions so that the IE will always know itscurrent belief set. The records are maintained in the network. The nodes in the network representKB entries such as premises, conclusions, inference rules, and the like. Attached to the nodes arejustifications that represent the inference steps from which the node was derived. Nodes in thebelief set must have valid justifications. A premise is a fundamental belief that is assumed to bealways true. Premises need no justifications. They form a base from which all other currentlyactive nodes can be explained in terms of valid justifications.There are two types of justification records maintained for nodes, support lists (SL) andconceptual dependencies (CD). SLs are the most common type. They provide the supportingjustifications for nodes. The data structure used for the SL contains two lists of other dependentnode names, an in-list and an out-lists. It has the form (SL < in-list> < out-list>)In order for a node to be active and hence, labelled as IN the belief set, its SL must have at inleast one valid node 1 in its inlist, and all nodes named in its outlist, if any, must be marked OUTof the belief set. For example, a current belief set that represents Cybil as a nonflying bird (anostrich) might have the nodes and justifications listed in Table 6.1.
  26. 26. 164 ARTIFICIAL INTELLIGENCEEach IN – node given in Table 6.1 is part of the current belief set. Nodes n1 and n5 are premises.They have empty support lists since they do not require justifications. Node n2 the belief thatCybil can fly is out because n3, a valid node, is in the out-list of n2,Suppose it is discovered that Cybil is not an ostrich, thereby causing n5 to be retracted (markingits status as out). n1 which depends on n5, must also be retracted. This in turn, changes thestatus of n2 to be a justified node. The resultant belief set is now that the bird Cybil can fly.To represent a belief network, the symbol conventions shown in Figure 6.16 are sometimes used.The meanings of the nodes shown in the figure are (1) a premises is a true proposition requiringno justifications, (2) an assumption is a current belief that could change, (3) a datum is either acurrently assumed or 1E derived belief and (4) justifications are the beliefs (nodes) supports,consisting of supporting antecedent node links and a consequent node link. Table 6.1 : Example Nodes In A Dependency Network Node Status Meaning Support list Comments n1 IN Cybil is a bird (SL ( ) ( )) a premise n2 OUT Cybil can fly (SL (n1)(n3)) unjustified belief n3 IN Cybil cannot fly (SL (n5)(n4)) justified belief n4 OUT Cybil has wings (SL ( ) ( ) ) retracted premise n5 IN Cybil is an Ostrich (SL ( ) ( ) ) a premise Justifications Figure 6.16: Belief network node meaningsAn example of a typical network representation is given Figure 6. 17. The nodes T. U. and W areOUT. If the node labelled P is made IN for some reason, the TMS would update the network bypropagating the “INness” support provided by node P to make T and W INAs noted earlier, when a contradiction is discovered, the TMS locates the source of thecontradiction and corrects it by retracting one of the contributing sources. It does this by checking
  27. 27. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 165the support lists of the contradictory node and going directly to the source of the contradiction. Itgoes directly to the source by examining the dependency structure supporting the justification anddetermining the offending nodes. This is in contrast to the native backtracking approach thatwould search a deduction tree sequentially, node by node until the contradictory node is reached.Backtracking directly to the node causing the contradiction is known as dependency – directedbacktracking (DDB). This is clearly a more efficient search strategy than chronologicalbacktracking. (a) Figure 6.17: Typical Fragment of a Belief NetworkTopProbabilistic ReasoningHere we will examine methods that use probabilistic representations for all knowledge and whichreason by propagating the uncertainties from evidence and assertions to conclusions. As before,the uncertainties can arise from an inability to predict outcomes due to unreliable, vague,incomplete, or inconsistent knowledge.The probability of an uncertain event A is a measure of the degree of likelihood of occurrence ofthat event. The set of all possible events is called the sample space; S A probability measure is afunction P(A) which maps event outcome E1 , E2 ,..... from S into real numbers and whichsatisfies the following axioms of probability:1. 0 ≤ P( A ≤ 1 for any event A⊆ S. )2. P( S ) =1, a certain outcome
  28. 28. 166 ARTIFICIAL INTELLIGENCE ∩Ej = Φ3. For Ei , for all i ≠j (the Ei are mutually exclusive), P( Ei ∪ E2 ∪ E3 ∪ ... ) = P( E2 ) + P( E3 ) + ...From these three axioms and the rules of set theory, the basic law of probability can be derived.Of course, the axioms are not sufficient to compute the probability of an outcome. That requiresan understanding of the underlying distributions that must be established through one of thefollowing approaches:1. Use of a theoretical argument that accurately characterizes the processes.2. Using one’s familiarity and understanding of the basic processes to assign subjective probabilities, or3. Collecting experimental data from which statistical estimates of the underlying distributions can be made.Since much of the knowledge we deal with is uncertain in nature, a number of our beliefs must betenuous. Our conclusions are often based on available evidence and past experience, which isoften far from complete. The conclusions are, therefore, no more than educated guesses. In agreat many situations it is possible to obtain only partial knowledge concerning the possibleoutcome of some event. But, given that knowledge, one’s ability to predict the outcome iscertainly better than with no knowledge at all. We manage quite well in drawing plausibleconclusions from incomplete knowledge and past experiences.Probabilistic reasoning is sometimes used when outcomes are unpredictable. For example, whena physician examines a patient, the patient’s history, symptoms, and test results provide some,but not conclusive, evidence of possible ailments. This knowledge, together with the physician’sexperience with previous patients, improves the likelihood of predicting the unknown (disease)event, but there is still much uncertainty in most diagnoses. Likewise, weather forecasters “guess”at tomorrow’s weather based on available evidence such as temperature, humidity, barometricpressure, and cloud coverage observations. The physical relationships that overrun thesephenomena are not fully understood; so predictability is far from certain. Even a businessmanager must make decisions based on uncertain predictions when the market for a new productis considered. Many interacting factors influence the market, including the target consumer’slifestyle, population growth, potential consumer income, the general economic climate, and manyother dependent factors.
  29. 29. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 167In all of the above cases, the level of confidence placed in the hypothesized conclusions isdependent on the availability of reliable knowledge and the experience of the humanprognosticator. Our objective in this chapter is to describe some approaches taken in AI systemsto deal with reasoning under similar types of uncertain conditions.Bayesian Probabilistic InferenceThe form of probabilistic reasoning described in this section is based on the Bayesian methodintroduced by the clergyman Beyes in the eighteenth century. This form of reasoning depends onthe use of conditional probabilities of specified events when it is known that other events haveoccurred. For two events H and E with the probability P(E) > 0 , the conditional probability ofevent H, given that event E has occurred, is defined as P( H / E) = P( H & E) / P( E) (6.1)This expression can be given a frequency interpretation by considering a random experimentwhich is repeated a large number of times, n. The number of occurrences of the event E, say No.( E) , and of the joint even H and E, No. ( H & E) , are recorded and their relative frequencies rfcomputed as No. ( A& E) No. ( E) rf ( H & E) = rf ( E) = (6.2) n nWhen n is large, the two expressions (6.2) approach the corresponding probabilities respectively,and the ratio rf ( H & E) / rf ( E) = P( H & E) / P( E)then represents the proportion of times event H occurs relative to the occurrence of E, that is, theapproximate conditional occurrence of H with E.The conditional probability of event E given the event H occurred can likewise be written as P( E / H ) = P( H & E) / P( H ) (6.3)Solving 6.3 for P( H & E) and substituting this in equation 6.1 we obtain one form of Bayes Rule P( H / E) = P( E / H ) P( H ) / P( E) (6.4)This equation expresses the notion that the probability of event H occurring when it is knownthat even E occurred is the same as the probability that E occurs when it is known that H
  30. 30. 168 ARTIFICIAL INTELLIGENCEoccurred, multiplied by the ratio of the probabilities of the two events H and E occurring. As anexample of the use of equation 6.4, consider the problem of determining the probability that apatient has a certain disease D1, given that a symptom E was observed. We wish to findP( D 1 / E).Suppose now it is known from previous experience that the prior (unconditional) probabilitiesP(D1) and P(E) for randomly chosen patients are P(D1) = 0.05, and P(E) = 0.15, respectively.Also, we assume that the conditional probability of the observed symptom given that a patient hasdisease D1 is known from experience to be P(E| D1) = 0.95. Then, we easily determine the valueof P(D1| E) as P(D1| E) =P (E|D1)P(D1) /P(E) = (0.95 x 0.05) / 0.15 =0.32It may be the case that the probability P(E) is difficult to obtain. If that is the case, a different −form of Bayes Rule may be used. To see this, we write equation 6.4 with H substituted in placeof H to obtain P( E| − H ) P ( − H ) P( − H| E) = P( E)Next, we divide equation 6.4 by this result to eliminate P(E) and getP( H| E) P( E H ) P ( H ) | =P( H| E) P(E − H )P ( H ) − | (6.5)Note that equation 6.5 has two terms that are ratios of a probability of an event to the probability - -of its negation, P(H | E) / P( H | E) and P (H) / P( H). The ratio of the probability of an event Edivided by the probability of its negation is called the odds of the event and is denoted as O (E). -The remaining ratio P(E | H) / P(E | H) in equation 6.5 is known as the likelihood ratio off withrespect to H. We denote this quantity by L(E|H). Using these two new terms, the odds-likelihoodform of Bayes Rule for equation 6.5 may be written as O(H | E) = L(E | H ) . O(H)This form of Bayes Rule suggests how to compute the posterior odds 0(HE) from the prior oddson H, 0(H). That value is proportional to the likelihood L(E | H). When L(E | H) is equal to one, theknowledge that E is true has no effect on the odds of H. Values of L(E | H) less than or greaterthan one decrease or increase the odds correspondingly. When L(E | H) cannot be computed,
  31. 31. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 169estimates may still be made by an expert having some knowledge of H and E. Estimating the ratiorather than the individual probabilities appears to be easier for individuals in such cases. This issometimes done when developing expert systems where more reliable probabilities are notavailable.In the example cited above, D1 is either true or false, and P(D1 | E) is the interpretation whichassigns a measure of confidence that D1 is true when it is known that E is true. There is asimilarity between E, P (D1 | E) and modus pones. For example, when E is known to be true andD1 and E are known to be related, one concludes the truth of D1 with a confidence level P(D1 |E).One might wonder if it would not be simpler to assign probabilities to as many ground atoms E 1,E2. . . , Ekas possible, and compute inferred probabilities (probabilities of Ej | H and H) directlyfrom these. The answer is that in general this is not possible. To compute an inferred probabilityrequires knowledge of the joint distributions of the ground predicates participating in theinference. From the joint distributions, the required marginal distributions may then be computed.The distributions and computations required for that approach are, in general, much morecomplex than the computations involving the use of Bayes Rule. - - -Consider now two events A and A which are mutually exclusive (A ∩ A = φ) and exhaustive (A∪A) = S. The probability of an arbitrary event B can always be expressed as P(B) = P(B & A) + P(B & -A) = P(B|A)P(A) + P(B|-A)P(-A)Using this result, equation 6.4 can be written asP(H|E) = P(E|H)P(H) / [P(E|H)P(H) + P(E|-H)P(-H)] (6.6)Equation 6.6 can be generalized for an arbitrary number of hypotheses Hi, i = 1,..., k. Thus,suppose the Hi partition the universe; that is, the Hi are mutually exclusive and exhaustive. Thenfor any evidence E, we have k K P( E) = ∑ P( E & H i ) = ∑ P( E| H i ) i=1 i=1and hence,
  32. 32. 170 ARTIFICIAL INTELLIGENCE P( E| H i ) P( H i )P( H i / E) = ∑ P( E| H j )P( H j ) k (6.7) j =1Finally, to be more realistic and to accommodate multiple sources of evidence E 1, E2,. . . , Em, wegeneralize equation 6.7 further to obtain P( E1 , E2 ,..., Em | H i ) P( H i )P( H i + E1 , E2 ,..., Em ) = ∑ P( E1 , E2 ..... Em | H j ) P( H j ) k (6.8) j =1If there are several plausible hypotheses and a number of evidence sources, equation 6.8 can befairly complex to compute. This is one of the serious drawbacks of the Bayesian approach. Alarge number of probabilities must be known in advance in order to apply an equation such as6.8. If there were k hypotheses, H i, and m sources of evidence E j, then k + m prior probabilitiesmust be known in addition to the k likelihood probabilities. The real question then is where doesone obtain such a large number of reliable probabilities?To simplify equation 6.8, it is sometimes assumed that the E, are statistically independent. In thatcase, the numerator and denominator probability terms ( P E1 , E2 , ..... Em H j ) factor into ( P Ei H j )P(E 2 Hk )....... ( P Em H j )resulting in a somewhat simpler form. But, even though the computations are straight- forward,the number of probabilities required in a moderately large system can still be prohibitive, and onemay be forced to simply use subjective probabilities when more reliable values are not available.Furthermore, the Ej are almost never completely independent. Consequently, this assumptionmay introduce intolerable errors.The formulas presented above suggest how probabilistic evidence would be combined to producea likelihood estimate for any given hypothesis. When a number of individual hypotheses arepossible and several sources of evidence are available, it may be necessary to compute two ormore alternative probabilities and select among them. This may mean that none, one, or possiblymore than one of the hypotheses could be chosen. Normally, the one having the largestprobability of occurrence would be selected, provided no other values were close. Beforeaccepting such a choice, however, it may be desirable to require that the value exceed somethreshold to avoid selecting weakly supported hypotheses. In previous section we had described
  33. 33. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 171a typical system that combines similar values and chooses only those conclusions that exceed athreshold of 0.2.Bayesian NetworksNetwork representations of knowledge have been used to graphically exhibit theinterdependencies that exist between related pieces of knowledge. Much work has been done inthis area to develop a formal syntax and semantics for such representations. When we considerassociative networks and conceptual graphs. Here, however, we are more interested in networkrepresentations that depict the degrees of belief of propositions and the causal dependencies thatexist between them. Inferencing in a network amounts to probabilities the probabilities of givenand related information through the network to one or more conclusion nodes.Network representations for uncertain dependencies are further motivated by observations madeearlier. If we wish to represent uncertain knowledge related to a set of prepositional variables nx1,. . ., xn by their joint distribution P(x1, ...xn), it will require some 2 entries to store the distributionexplicitly. Furthermore, a determination of any of the marginal probabilities x, requires summingP(x1,…..xn) over the remaining n – 1 variables. Clearly, the time and storage requirements forsuch computations quickly become impractical. Inferring with such large numbers of probabilitiesdoes not appear to model the human process either. On the contrary, humans tend to single outonly a few propositions that are known to be causally linked when reasoning with uncertainbeliefs. This metaphor leads quite naturally to a form of network representation.One useful way to portray the problem domain is with a network of nodes that representpropostional variables x1, connected by arcs that represent causal influences or dependenciesamong the nodes. The strengths of the influences are quantified by conditional probabilities ofeach variable. For example, to represent causal relationships between the prepositional variablesx1…….x6 as illustrated in Figure 6.18, one can write the joint probability P(x1…….x6) by inspection as aproduct of (chain) conditional probabilities P( x 1 .......... .. x 6 ) = P( x 6 | x 5 ) P( x 5 | x 2 x 3 ) P( x 4 | x 1, x 2 ) P( x 3 | x 1 ) P( x 2 | x 1 ) P( x 1 )
  34. 34. 172 ARTIFICIAL INTELLIGENCE Figure 6.18: Example of Bayesian belief NetworkOnce such a network is constructed, an inference engine can use it to maintain and propagatebeliefs. When new information is received, the effects can be propagated throughout the networkuntil equilibrium probabilities are reached. Pearl (1986, 1987) has proposed simplified methodsfor updating networks (trees and, more generally, graphs) of this type by fusing and propagatingthe effects of new evidence and beliefs such that equilibrium is reached in time proportional to thelongest path through the network. At equilibrium, all propositions will have consistent probabilityassignments. Methods for graphs are more difficult. They require the use of dummy variables totransform them to equivalent tree structures that are then easier to work with.To use the type of probabilistic inference we have been considering, it is first necessary to assignprobabilities to all basic facts in the knowledge base. This requires the definition of an appropriatesample space and the assignment of a priori and conditional probabilities. In addition, somemethod must be chosen to compute the combined probabilities when pooling evidence in asequence of inference steps (such as Pearls method)is employed. Finally, when the outcome ofan inference chain results in one or more proposed conclusions, the alternatives must becompared, and one or more selected on the basis of its likelihood.TopUse of Certainty factorMYCIN uses measures of both belief and disbelief to represent degrees of confirmation anddisconfirmation respectively in a given hypothesis. The basic measure of belief, denoted byMBH , E), is actually a measure of increased belief in hypothesis H due to the evidence E. This (is roughly equivalent to the estimated increase in probability of P( H / E) over P( H ) given by anexpert as a result of the knowledge gained by E. A value of 0 corresponds to no increase in belief
  35. 35. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 173and 1 corresponds to maximum increase or absolute belief. Likewise, MD( H , E) is a measure ofthe increased disbelief in hypothesis H due to evidence E. MD ranges from 0 to +1 with +1representing maximum increase in disbelief, (total disbelief) and 0 representing no increase. Inboth measures, the evidence E may be absent or may be replaced with another hypothesis, ( )MBH 1 , H 2 . This represents the increased belief in H 1 given H 2 is true.In an attempt to formalize the uncertainty measure in MYCIN, definitions of MB and MD havebeen given in terms of prior and conditional probabilities. It should be remembered, however, theactual values are often subjective probability estimates provided by a physician. We have for thedefinitions. 1 If P( H ) = 1     MBH , E) = (  max P( , [ H E) P H ) − (H ) ( P 1 − (H )P otherwise (6.11) 1 If P( H ) = 1 min [ P( H| E) , P( H ) ] − P( H ) MBH , E) =  ( otherwise 0 = P( H ) (6.12)Note that when 0 < P( H ) < 1 , and E and H are independent (So P(H E) = P( H ), then MB = MD= 0. This would be the case if E provided no useful information.The two measures MB and MD are combined into a single measure called the certainty factor (CF),defined byCF H , E) = MBH , E) − MD( H , E) ( ( (6.13)Note that the value of CF ranges from –1 (certain disbelief) to +1 (certain belief). Furthermore, avalue of CF = 0 will result if E neither confirms nor unconfirms H (E and H are independent).TopFuzzy LogicIn the techniques we have discussed so far, we have not modified the mathematicalunderpinnings provided by set theory and logic. We have instead augmented those ideas with
  36. 36. 174 ARTIFICIAL INTELLIGENCEadditional constructs provided by probability theory. In this section, we take a different approachand briefly consider what happens if we make fundamental changes to our idea of setmembership and corresponding changes to our definitions of logical operations.The motivation for fuzzy sets is provided by the need to represent such propositions as: John is very tall. Mary is slightly till. Sue and Linda are close friends. Exceptions to the rule are nearly impossible. Most Frenchmen are not very tall.While traditional set theory defines set membership as a Boolean predicate, fuzzy set theoryallows us to represent set membership as a possibility distribution. Once set membership hasbeen redefined in this way, it is possible to define a reasoning system based on techniques forcombining distributions, the papers in the journal (Fuzzy Sets and Systems). Such reasonershave been applied in control systems for devices as diverse as trains and washing machines.Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle theconcept of partial truth -- truth-values between "completely true" and "completely false". Dr. LotfiZadeh of UC/Berkeley introduced it in the 1960s as a means to model the uncertainty of naturallanguage. Zadeh says that rather than regarding fuzzy theory as a single theory, we shouldregard the process of ``fuzzification as a methodology to generalize ANY specific theory from acrisp (discrete) to a continuous (fuzzy) form.Fuzzy SubsetsJust as there is a strong relationship between Boolean logic and the concept of a subset, there isa similar strong relationship between fuzzy logic and fuzzy subset theory.In classical set theory, a subset U of a set S can be defined as a mapping from the elements of Sto the elements of the set {0, 1}, U: S → {0, 1}This mapping may be represented as a set of ordered pairs, with exactly one ordered pair presentfor each element of S. The first element of the ordered pair is an element of the set S, and the
  37. 37. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 175second element is an element of the set {0, 1}. The value zero is used to represent non-membership, and the value one is used to represent membership. The truth or falsity of thestatement x is in U and is determined by finding the ordered pair whose first element is x.The statement is true if the second element of the ordered pair is 1, and the statement is false if itis 0. Similarly, a fuzzy subset F of a set S can be defined as a set of ordered pairs, each with thefirst element from S, and the second element from the interval [0,1], with exactly one ordered pairpresent for each element of S. This defines a mapping between elements of the set S and valuesin the interval [0,1]. The value zero is used to represent complete non-membership, the valueone is used to represent complete membership, and values in between are used to representintermediate DEGREES OF MEMBERSHIP. The set S is referred to as the UNIVERSE OFDISCOURSE for the fuzzy subset F. Frequently, the mapping is described as a function, theMEMBERSHIP FUNCTION of F. The degree to which the statement x is in F is true is determinedby finding the ordered pair whose first element is x. The DEGREE OF TRUTH of the statement isthe second element of the ordered pair.In practice, the terms "membership function" and fuzzy subset get used interchangeably.Thats a lot of mathematical baggage, so heres an example. Lets talk about people and"tallness". In this case the set S (the universe of discourse) is the set of people. Lets define afuzzy subset TALL, which will answer the question "to what degree is person x tall?" Zadehdescribes TALL as a LINGUISTIC VARIABLE, which represents our cognitive category of"tallness". To each person in the universe of discourse, we have to assign a degree ofmembership in the fuzzy subset TALL. The easiest way to do this is with a membership functionbased on the persons height. tall(x) = { 0, if height(x) < 5 ft., (height(x)-5ft.)/2ft., if 5 ft. <= height (x) <= 7 ft., 1, if height (x) > 7 ft. }
  38. 38. 176 ARTIFICIAL INTELLIGENCEA graph of this looks like: 1 .0 Ta ll( x ) 0 .5 5 .0 7 .0 9 .0 H e ig h t ( f t )Given this definition, here are some example values:Person Height degree of tallness------------------------------------------------------------------Billy 3 2" 0.00 [I think]Yoke 5 5" 0.21Drew 5 9" 0.38Erik 5 10" 0.42Mark 6 1" 0.54Kareem 7 2" 1.00 [depends on who you ask]Expressions like "A is X" can be interpreted as degrees of truth,e.g., "Drew is TALL" = 0.38.
  39. 39. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 177Note: Membership functions used in most applications almost never have as simple a shape astall(x). At minimum, they tend to be triangles pointing up, and they can be much more complexthan that. Also, the discussion characterizes membership functions as if they always are basedon a single criterion, but this isnt always the case, although it is quite common. One could, forexample, want to have the membership function for TALL depend on both a persons height andtheir age (hes tall for his age). This is perfectly legitimate, and occasionally used in practice.Its referred to as a two-dimensional membership function, or a "fuzzy relation". Its also possibleto have even more criteria, or to have the membership function depend on elements from twocompletely different universes of discourse.Logic OperationsNow that we know what a statement like "X is LOW" means in fuzzy logic, how do we interpret astatement like X is LOW and Y is HIGH or (not Z is MEDIUM).The standard definitions in fuzzy logic are: truth (not x) = 1.0 - truth (x) truth (x and y) = minimum (truth(x), truth(y)) truth (x or y) = maximum (truth(x), truth(y))Some researchers in fuzzy logic have explored the use of other interpretations of the AND andOR operations, but the definition for the NOT operation seems to be safe.Note that if you plug just the values zero and one into these definitions, you get the same truthtables as you would expect from conventional Boolean logic. This is known as the EXTENSIONPRINCIPLE, which states that the classical results of Boolean logic are recovered from fuzzylogic operations when all fuzzy membership grades are restricted to the traditional set {0, 1}. Thiseffectively establishes fuzzy subsets and logic as a true generalization of classical set theory andlogic. In fact, by this reasoning all crisp (traditional) subsets are fuzzy subsets of this very specialtype; and there is no conflict between fuzzy and crisp methods.Some examplesAssume the same definition of TALL as above, and in addition, assume that we have a fuzzysubset OLD defined by the membership function:
  40. 40. 178 ARTIFICIAL INTELLIGENCE old (x) = { 0, if age(x) < 18 yr.(age(x)-18 yr.)/42 yr., if 18 yr. <= age(x) <= 60 yr.1, if age(x) > 60 yr. }And for compactness, let a = X is TALL and X is OLD b = X is TALL or X is OLD c = not (X is TALL)Then we can compute the following values. (for height 6.0ft and age 50 years)height age X is TALL X is OLD a b c6.0 50 0.5 0.76 0.5 0.76 -0.5Uses of fuzzy logicFuzzy logic is used directly in very few applications. The Sony Palmtop apparently uses a fuzzylogic decision tree algorithm to perform and written in (well, computer light pen) Kanji characterrecognition.A fuzzy expert systemA fuzzy expert system is an expert system that uses a collection of fuzzy membership functionsand rules, instead of Boolean logic, to reason about data. The rules in a fuzzy expert system areusually of a form similar to the following: if x is low and y is high then z = medium where x and yare input variables (names for know data values), z is an output variable (a name for a data valueto be computed), low is a membership function (fuzzy subset) defined on x, high is a membershipfunction defined on y, and medium is a membership function defined on z.The antecedent (the rules premise) describes to what degree the rule applies, while theconclusion (the rules consequent) assigns a membership function to each of one or more outputvariables. Most tools for working with fuzzy expert systems allow more than one conclusion perrule. The set of rules in a fuzzy expert system is known as the rule base or knowledge base.The general inference process proceeds in three (or four) steps.
  41. 41. STRUCTURED KNOWLEDGE REPRESENTATION SEMANTIC NET 1791. Under FUZZIFICATION, the membership functions defined on the input variables are applied to their actual values, to determine the degree of truth for each rule premise.2. Under INFERENCE, the truth-value for the premise of each rule is computed, and applied to the conclusion part of each rule. This results in one fuzzy subset to be assigned to each output variable for each rule. Usually only MIN or PRODUCT are used as inference rules. In MIN inferencing, the output membership function is clipped off at a height corresponding to the rule premises computed degree of truth (fuzzy logic AND). In PRODUCT inferencing, the output membership function is scaled by the rule premises computed degree of truth.3. Under COMPOSITION, all of the fuzzy subsets assigned to each output variable are combined together to form a single fuzzy subset for each output variable. Again, usually MAX or SUM are used. In MAX composition, the combined output fuzzy subset is constructed by taking the point wise maximum over all of the fuzzy subsets assigned to variable by the inference rule (fuzzy logic OR). In SUM composition, the combined output fuzzy subset is constructed by taking the point wise sum over all of the fuzzy subsets assigned to the output variable by the inference rule.4. Finally is the (optional) DEFUZZIFICATION, which is used when it is useful to convert the fuzzy output set to a crisp number. There are more defuzzification methods than you can shake a stick at (at least 30). Two of the more common techniques are the CENTROID and MAXIMUM methods. In the CENTROID method, the crisp value of the output variable is computed by finding the variable value of the center of gravity of the membership function for the fuzzy value. In the MAXIMUM method, one of the variable values at which the fuzzy subset has its maximum truth-value is chosen as the crisp value for the output variable.Extended Example:Assume that the variables x, y, and z all take on values in the interval [0,10], and that thefollowing membership functions and rules are defined: low(t) = 1 - ( t / 10 ) high(t) = t / 10 rule 1: if x is low and y is low then z is high
  42. 42. 180 ARTIFICIAL INTELLIGENCE rule 2: if x is low and y is high then z is low rule 3: if x is high and y is low then z is low rule 4: if x is high and y is high then z is highNotice that instead of assigning a single value to the output variable z, each rule assigns an entirefuzzy subset (low or high).Notes:1. In this example, low(t)+high(t)=1.0 for all t. This is not required, but it is fairly common.2. The value of t at which low(t) is maximum is the same as the value of t at which high(t) is minimum, and vice-versa. This is also not required, but fairly common.3. The same membership functions are used for all variables. This isn’t required, and is also not common.In the fuzzification subprocess, the membership functions defined on the input variables areapplied to their actual values, to determine the degree of truth for each rule premise. The degreeof truth for a rule’s premise is sometimes referred to as its ALPHA. If a rules premise has anonzero degree of truth (if the rule applies at all...) then the rule is said to FIRE. For example,x y low(x) high(x) low(y) high(y) alpha1 alpha2 alpha3 alpha40.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.00.0 3.2 1.0 0.0 0.68 0.32 0.68 0.32 0.0 0.00.0 6.1 1.0 0.0 0.39 0.61 0.39 0.61 0.0 0.00.0 10.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.03.2 0.0 0.68 0.32 1.0 0.0 0.68 0.0 0.32 0.06.1 0.0 0.39 0.61 1.0 0.0 0.39 0.0 0.61 0.010.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.03.2 3.1 0.68 0.32 0.69 0.31 0.68 0.31 0.32 0.313.2 3.3 0.68 0.32 0.67 0.33 0.67 0.33 0.32 0.3210.0 10.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0