Hpkb year 1 results


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Hpkb year 1 results

  1. 1. AI Magazine Volume 19 Number 4 (1998) (© AAAI) Articles The DARPA High- Performance Knowledge Bases Project Paul Cohen, Robert Schrag, Eric Jones, Adam Pease, Albert Lin, Barbara Starr, David Gunning, and Murray Burke■ Now completing its first year, the High-Perfor- many applications, and they should be main- mance Knowledge Bases Project promotes technol- tained and modified easily. Clearly, these goals ogy for developing very large, flexible, and require innovation in many areas, from knowl- reusable knowledge bases. The project is supported edge representation to formal reasoning and by the Defense Advanced Research Projects Agency special-purpose problem solving, from knowl- and includes more than 15 contractors in univer- sities, research laboratories, and companies. The edge acquisition to information gathering on evaluation of the constituent technologies centers the web to machine learning, from natural lan- on two challenge problems, in crisis management guage understanding to semantic integration and battlespace reasoning, each demanding pow- of disparate knowledge bases. erful problem solving with very large knowledge For roughly one year, HPKB researchers have bases. This article discusses the challenge prob- been developing knowledge bases containing lems, the constituent technologies, and their inte- tens of thousands of axioms concerning crises gration and evaluation. and battlefield situations. Recently, the tech- nology was tested in a month-long evaluationA involving sets of open-ended test items, most lthough a computer has beaten the of which were similar to sample (training) world chess champion, no computer has items but otherwise novel. Changes to the cri- the commonsense of a six-year-old sis and battlefield scenarios were introducedchild. Programs lack knowledge about the during the evaluation to test the comprehen-world sufficient to understand and adjust tonew situations as people do. Consequently, siveness and flexibility of knowledge in theprograms have been poor at interpreting and HPKB systems. The requirement for compre-reasoning about novel and changing events, hensive, flexible knowledge about general sce-such as international crises and battlefield sit- narios forces knowledge bases to be large. Chal-uations. These problems are more open ended lenge problems, which define the scenariosthan chess. Their solution requires shallow and thus drive knowledge base development,knowledge about motives, goals, people, coun- are a central innovation of HPKB. This articletries, adversarial situations, and so on, as well discusses HPKB challenge problems, technolo-as deeper knowledge about specific political gies and integrated systems, and the evaluationregimes, economies, geographies, and armies. of these systems. The High-Performance Knowledge Base The challenge problems require significant(HPKB) Project is sponsored by the Defense developments in three broad areas of knowl-Advanced Research Projects Agency (DARPA) edge-based technology. First, the overridingto develop new technology for knowledge- goal of HPKB—to be able to select, compose,based systems.1 It is a three-year program, end- extend, specialize, and modify componentsing in fiscal year 1999, with funding totaling from a library of reusable ontologies, common$34 million. HPKB technology will enable domain theories, and generic problem-solvingdevelopers to rapidly build very large knowl- strategies—is not immediately achievable andedge bases—on the order of 106 rules, axioms, requires some research into foundations ofor frames—enabling a new level of intelligence very large knowledge bases, particularlyfor military systems. These knowledge bases research in knowledge representation andshould be comprehensive and reusable across ontological engineering. Second, there is theCopyright © 1998, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1998 / $2.00 WINTER 1998 25
  2. 2. Articles problem of building on these foundations to Often, one will accept an answer that is populate very large knowledge bases. The goal roughly correct, especially when the alterna- is for collaborating teams of domain experts tives are no answer at all or a very specific but (who might lack training in computer science) wrong answer. This is Lenat and Feigenbaum’s to easily extend the foundation theories, breadth hypothesis: “Intelligent performance define additional domain theories and prob- often requires the problem solver to fall back lem-solving strategies, and acquire domain on increasingly general knowledge, and/or to facts. Third, because knowledge is not enough, analogize to specific knowledge from far-flung one also requires efficient problem-solving domains” (Lenat and Feigenbaum 1987, p. methods. HPKB supports research on efficient, 1173). We must, therefore, augment high-pow- general inference methods and optimized task- er knowledge-based systems, which give spe- specific methods. cific and precise answers, with weaker but ade- HPKB is a timely impetus for knowledge- quate knowledge and inference. The inference based technology, although some might think methods might not all be sound and complete. it overdue. Some of the tenets of HPKB were Indeed, one might need a multitude of meth- voiced in 1987 by Doug Lenat and Ed Feigen- ods to implement what Polya called plausible baum (Lenat and Feigenbaum 1987), and some inference. HPKB encompasses work on a vari- have been around for longer. Lenat’s CYC Pro- ety of logical, probabilistic, and other infer- ject has also contributed much to our under- ence methods. standing of large knowledge bases and ontolo- It is one thing to recognize the need for gies. Now, 13 years into the CYC Project and commonsense knowledge, another to inte- more than a decade after Lenat and Feigen- grate it seamlessly into knowledge-based sys- baum’s paper, there seems to be consensus on tems. Lenat observes that ontologies often are the following points: missing a middle level, the purpose of which is The first and most intellectually taxing task to connect very general ontological concepts when building a large knowledge base is to such as human and activity with domain-specif- design an ontology. If you get it wrong, you ic concepts such as the person who is responsible can expect ongoing trouble organizing the for navigating a B-52 bomber. Because HPKB is knowledge you acquire in a natural way. grounded in domain-specific tasks, the focus of Whenever two or more systems are built for much ontological engineering is this middle related tasks (for example, medical expert sys- layer. tems, planning, modeling of physical process- es, scheduling and logistics, natural language understanding), the architects of the systems The Participants realize, often too late, that someone else has The HPKB participants are organized into three already done, or is in the process of doing, the groups: (1) technology developers, (2) integra- hard ontological work. HPKB challenges the tion teams, and (3) challenge problem devel- research community to share, merge, and col- opers. Roughly speaking, the integration teams lectively develop large ontologies for signifi- build systems with the new technologies to cant military problems. However, an ontology solve challenge problems. The integration alone is not sufficient. Axioms are required to teams are led by SAIC and Teknowledge. Each give meaning to the terms in an ontology. integration team fields systems to solve chal- Without them, users of the ontology can inter- lenge problems in an annual evaluation. Uni- pret the terms differently. versity participants include Stanford Universi- Most knowledge-based systems have no ty, Massachusetts Institute of Technology common sense; so, they cannot be trusted. (MIT), Carnegie Mellon University (CMU), Suppose you have a knowledge-based system Northwestern University, University of Massa- for scheduling resources such as heavy-lift heli- chusetts (UMass), George Mason University copters, and none of its knowledge concerns (GMU), and the University of Edinburgh noncombatant evacuation operations. Now, (AIAI). In addition, SRI International, the Uni- suppose you have to evacuate a lot of people. versity of Southern California Information Sci- Lacking common sense, your system is literally ences Institute (USC-ISI), the Kestrel Institute, useless. With a little common sense, it could and TextWise, Inc., have developed important not only support human planning but might components. Information Extraction and be superior to it because it could think outside Transport (IET), Inc., with Pacific Sierra the box and consider using the helicopters in Research (PSR), Inc., developed and evaluated an unconventional way. Common sense is the crisis-management challenge problem, and needed to recognize and exploit opportunities Alphatech, Inc., is responsible for the battle- as well as avoid foolish mistakes. space challenge problem.26 AI MAGAZINE
  3. 3. Articles Challenge Problems be assessed not only by technology developers but also by DARPA management and involvedA programmatic innovation of HPKB is chal- members of the DoD community.lenge problems. The crisis-management chal- The HPKB challenge problems are designedlenge problem, developed by IET and PSR, is to support new and ongoing DARPA initiativesdesigned to exercise broad, relatively shallow in intelligence analysis and battlespace infor-knowledge about international tensions. The mation systems. Crisis-management systemsbattlespace challenge problem, developed by will assist strategic analysts by evaluating theAlphatech, Inc., has two parts, each designed political, economic, and military courses ofto exercise relatively specific knowledge about action available to nations engaged at variousactivities in armed conflicts. Movement analysis levels of conflict. Battlespace systems will sup-involves interpreting vehicle movements port operations officers and intelligence ana-detected and tracked by idealized sensors. The lysts by inferring militarily significant targetsworkaround problem is concerned with finding and sites, reasoning about road network traffi-military engineering solutions to traffic- cability, and anticipating responses to militaryobstruction problems, such as destroyed strikes.bridges and blocked tunnels. Good challenge problems must satisfy sever- Crisis-Managemental, often conflicting, criteria. A challenge prob- Challenge Problemlem must be challenging: It must raise the bar The crisis-management challenge problem isfor both technology and science. A problem intended to drive the development of broad,that requires only technical ingenuity will not relatively shallow commonsense knowledgehold the attention of the technology develop- bases to facilitate intelligence analysis. Theers, nor will it help the United States maintain client program at DARPA for this problem isits preeminence in science. Equally important, Project GENOA—Collaborative Crisis Under-a challenge problem for a DARPA program standing and Management. GENOA is intendedmust have clear significance to the UnitedStates Department of Defense (DoD). Chal- to help analysts more rapidly understandlenge problems should serve for the duration emerging international crises to preserve U.S.of the program, becoming more challenging policy options. Proactive crisis management—each year. This continuity is preferable to before a situation has evolved into a crisis thatdesigning new problems every year because might engage the U.S. military—enables morethe infrastructure to support challenge prob- effective responses than reactive management.lems is expensive. Crisis-management systems will assist strategic A challenge problem should require little or analysts by evaluating the political, economic,no access to military subject-matter experts. It and military courses of action available toshould not introduce a knowledge-acquisition nations engaged at various levels of conflict.bottleneck that results in delays and low pro- The challenge problem development teamductivity from the technology developers. As worked with GENOA representatives to identifymuch as possible, the problem should be solv- areas for the application of HPKB technology.able with accessible, open-source material. A This work took three or four months, but thechallenge problem should exercise all (or crisis-management challenge problem specifi-most) of the contributions of the technology cation has remained fairly stable since its ini-developers, and it should exercise an integra- tial release in draft form in July 1997.tion of these technologies. A challenge prob- The first step in creating the challenge prob-lem should have unambiguous criteria for lem was to develop a scenario to provide con-evaluating its solutions. These criteria need text for intelligence analysis in time of crisis.not be so objective that one can write algo- To ensure that the problem should requirerithms to score performance (for example, development of real knowledge about thehuman judgment might be needed to assess world, the scenario includes real nationalscores), but they must be clear and they must actors with a fictional yet plausible story line.be published early in the program. In addition, The scenario, which takes place in the Persianalthough performance is important, challenge Gulf, involves hostilities between Saudi Arabiaproblems that value performance above all else and Iran that culminate in closing the Strait ofencourage “one-off” solutions (a solution Hormuz to international shipping.developed for a specific problem, once only) Next, IET worked with experts at PSR toand discourage researchers from trying to develop a description of the intelligence analy-understand why their technologies work well sis process, which involves the following tasks:and poorly. A challenge problem should pro- information gathering—what happened, situ-vide a steady stream of results, so progress can ation assessment—what does it mean, and sce- WINTER 1998 27
  4. 4. Articles III. What of significance might happen following the Saudi air strikes? B. Options evaluation Evaluate the options available to Iran. Close the Strait of Hormuz to shipping. Evaluation: Probable Motivation: Respond to Saudi air strikes and deter future strikes. Capability: (Q51) Can Iran close the Strait of Hormuz to international shipping? (Q83) Is Iran capable of firing upon tankers in the Strait of Hormuz? With what weapons? Negative outcomes: (Q53) What risks would Iran face in closing the strait? Figure 1. Sample Questions Pertaining to the Responses to an Event. nario development—what might happen next. rent and a historical context. Crises can be rep- Situation assessment (or interpretation) resented as events or as larger episodes tracking includes factors that pertain to the specific sit- the evolution of a conflict over time, from uation at hand, such as motives, intents, risks, inception or trigger, through any escalation, to rewards, and ramifications, and factors that eventual resolution or stasis. The representa- make up a general context, or “strategic cul- tions being developed in HPKB are intended to ture,” for a state actor’s behavior in interna- serve as a crisis corporate memory to help ana- tional relations, such as capabilities, interests, lysts discover historical precedents and analo- policies, ideologies, alliances, and enmities. gies for actions. Much of the challenge-prob- Scenario development, or speculative predic- lem specification is devoted to sample tion, starts with the generation of plausible questions that are intended to drive the devel- actions for each actor. Then, options are eval- opment of general models for reasoning about uated with respect to the same factors for situ- crisis events. ation assessment, and a likelihood rating is Sample questions are embedded in an ana- produced. The most plausible actions are lytic context. The question “What might hap- reported back to policy makers. pen next?” is instantiated as “What might These analytic tasks afford many opportuni- happen following the Saudi air strikes?” as ties for knowledge-based systems. One is to use shown in figure 1. Q51 is refined to Q83 in a knowledge bases to retain or multiply corpo- way that is characteristic of the analytic rate expertise; another is to use knowledge and process; that is, higher-level questions are reasoning to “think outside the box,” to gener- refined into sets of lower-level questions that ate analytic possibilities that a human analyst provide detail. might overlook. The latter task requires exten- The challenge-problem developers (IET with sive commonsense knowledge, or “analyst’s PSR) developed an answer key for sample ques- sense,” about the domain to rule out implausi- tions, a fragment of which is shown in figure 2. ble options. Although simple factual questions (for exam- The crisis-management challenge problem ple, “What is the gross national product of the includes an informal specification for a proto- United States?”) have just one answer; ques- type crisis-management assistant to support ana- tions such as Q53 usually have several. The lysts. The assistant is tested by asking ques- answer key actually lists five answers, two of tions. Some are simple requests for factual which are shown in figure 2. Each is accompa- information, others require the assistant to nied by suitable explanations, including source interpret the actions of nations in the context material. The first source (Convention on the of strategic culture. Actions are motivated by Law of the Sea) is electronic. IET maintains a interests, balancing risks and rewards. They web site with links to pages that are expected to have impacts and require capabilities. Interests be useful in answering the questions. The sec- drive the formation of alliances, the exercise of ond source is a fragment of a model developed influence, and the generation of tensions by IET and published in the challenge-problem among actors. These factors play out in a cur- specification. IET developed these fragments to28 AI MAGAZINE
  5. 5. Articles Answer(s): 1. Economic sanctions from {Saudi Arabia, GCC, U.S., U.N.} • The closure of the Strait of Hormuz would violate an international norm promoting freedom of the seas and would jeopardize the interests of many states. • In response, states might act unilaterally or jointly to impose economic sanctions on Iran to compel it to reopen the strait. • The United Nations Security Council might authorize economic sanctions against Iran. 2. Limited military response from {Saudi Arabia, GCC, U.S., others}… Source(s): • The Convention on the Law of the Sea. • (B5) States may act unilaterally or collectively to isolate and/or punish a group or state that violates interna- tional norms. Unilateral and collective action can involve a wide range of mechanisms, such as intelligence collection, military retaliation, economic sanction, and diplomatic censure/isolation. Figure 2. Part of the Answer Key for Question 53.indicate the kinds of reasoning they would betesting in the challenge problem. For the challenge-problem evaluation, held PQ53 [During/After <TimeInterval>,] what {risks, rewards}in June 1998, IET developed a way to generate would <InternationalAgent> face in <InternationalAction-test questions through parameterization. Test Type>?questions deviate from sample questions inspecified, controlled ways, so the teams partic- <InternationalActionType> =ipating in the challenge problem know the {[exposure of its] {supporting, sponsoring}space of questions from which test items will <InternationalAgentType> in <InternationalAgent2>,be selected. This space includes billions of successful terrorist attacks against <InternationalAgent2>’squestions so the challenge problem cannot be <EconomicSector>,solved by relying on question-specific knowl- <InternationalActionType>(PQ51),edge. The teams must rely on general knowl- taking hostage citizens of <InternationalAgent2>,edge to perform well in the evaluation. attacking targets <SpatialRelationship>(Semantics provide practical constraints on the <InternationalAgent2> with <Force>}number of reasonable instantiations of para-meterized questions, as do online sources pro- <InternationalAgentType> =vided by IET.) To illustrate, Q53 is parameter- {terrorist group, dissident group, political party, humani-ized in figure 3. Parameterized question 53, tarian organization}PQ53, actually covers 8 of the roughly 100sample questions in the specification. Parameterized questions and associated classdefinitions are based on natural language, giv- Figure 3. A Parameterized Question Suitable for Generatinging the integration teams responsibility for Sample Questions and Test Questions.developing (potentially different) formal rep-resentations of the questions. This decisionwas made at the request of the teams. Aninstance of a parameterized question, say,PQ53, is mechanically generated, then theteams must create a formal representation andreason with it—without human intervention.Battlespace Challenge ProblemsThe second challenge-problem domain forHPKB is battlespace reasoning. Battlespace is anabstract notion that includes not only the WINTER 1998 29
  6. 6. Articles physical geography of a conflict but also the form and an order of battle that describes the plans, goals, and activities of all combatants structure and composition of the enemy forces prior to, and during, a battle and during the in the scenario region. activities leading to the battle. Three battle- Given these input, movement analysis com- space programs within DARPA were identified prises the following tasks: as potential users of HPKB technologies: (1) the First is to distinguish military from nonmil- dynamic multiinformation fusion program, itary traffic. Almost all military traffic travels in (2) the dynamic database program, and (3) the convoys, which makes this task fairly straight- joint forces air-component commander forward except for very small convoys of two (JFACC) program. Two battlespace challenge or three vehicles. Second is to identify the sites problems have been developed. between which military convoys travel, deter- The second mine which of these sites are militarily signifi- The Movement-Analysis Challenge challenge- Problem The movement-analysis challenge cant, and determine the types of each militar- problem problem concerns high-level analysis of ideal- ily significant site. Site types include battle ized sensor data, particularly the airborne positions, command posts, support areas, air- domain for JSTARS moving target indicator radar. This defense sites, artillery sites, and assembly-stag- HPKB is Doppler radar can generate vast quantities of ing areas. information—one reading every minute for Third is to identify which units (or parts of battlespace each vehicle in motion within a 10,000- units) in the enemy order of battle are partici- reasoning. square-mile area.2 The movement-analysis sce- pating in each military convoy. Fourth is to determine the purpose of eachBattlespace is nario involves an enemy mobilizing a full divi- convoy movement. Purposes include recon- sion of ground forces—roughly 200 military an abstract units and 2000 vehicles—to defend against a naissance, movement of an entire unit toward notion that possible attack. A simulation of the vehicle a battle position, activities by command ele- ments, and support activities. movements of this division was developed, the includes not output of which includes reports of the posi- Fifth is to infer the exact types of the vehi- only the tions of all the vehicles in the division at 1- cles that make up each convoy. About 20 types minute intervals over a 4-day period for 18 of military vehicle are distinguished in the physical hours each day. These military vehicle move- enemy order of battle, all of which show up in geography of ments were then interspersed with plausible the scenario data. civilian traffic to add the problem of distin- To help the technology base and the integra- a conflict but guishing military from nonmilitary traffic. The tion teams develop their systems, a portion of also the movement-analysis task is to monitor the the simulation data was released in advance of plans, goals, movements of the enemy to detect and identi- the evaluation phase, accompanied by an answer key that supplied model answers for fy types of military site and convoy. and activities Because HPKB is not concerned with signal each of the inference tasks listed previously. of all processing, the input are not real JSTARS data Movement analysis is currently carried out but are instead generated by a simulator and manually by human intelligence analysts, combatants preprocessed into vehicle tracks. There is no who appear to rely on models of enemy prior to, and uncertainty in vehicle location and no radar behavior at several levels of abstraction. These shadowing, and each vehicle is always accu- include models of how different sites or con- during, a rately identified by a unique bumper number. voys are structured for different purposes and battle and However, vehicle tracks do not precisely iden- models of military systems such as logistics (supply and resupply). For example, in a logis- during the tify vehicle type but instead define each vehi- tics model, one might find the following frag- cle as either light wheeled, heavy wheeled, or activities tracked. Low-speed and stationary vehicles are ment: “Each echelon in a military organiza-leading to the not reported. tion is responsible for resupplying its Vehicle-track data are supplemented by subordinate echelons. Each echelon, from bat- battle. small quantities of high-value intelligence talion on up, has a designated area for storing data, including accurate identification of a few supplies. Supplies are provided by higher ech- key enemy sites, electronic intelligence reports of elons and transshipped to lower echelons at locations and times at which an enemy radar is these areas.” Model fragments such as these turned on, communications intelligence reports are thought to constitute the knowledge of that summarize information obtained by mon- intelligence analysts and, thus, should be the itoring enemy communications, and human content of HPKB movement-analysis systems. intelligence reports that provide detailed infor- Some such knowledge was elicited from mili- mation about the numbers and types of vehi- tary intelligence analysts during programwide cle passing a given location. Other input meetings. These same analysts also scripted include a detailed road network in electronic the simulation scenario.30 AI MAGAZINE
  7. 7. ArticlesThe Workaround Challenge Problem battle damage are carried out by Army engi-The workaround challenge problem supports air- neers; so, this description takes the form of acampaign planning by the JFACC and his/her detailed engineering order of battle.staff. One task for the JFACC is to determine All input are provided in a formal represen-suitable targets for air strikes. Good targets tation language.allow one to achieve maximum military effect The workaround generator is expected towith minimum risk to friendly forces and min- provide three output: First is a reconstitutionimum loss of life on all sides. Infrastructure schedule, giving the capacity of the damagedoften provides such targets: It can be sufficient link as a function of time since the damage wasto destroy supplies at a few key sites or critical inflicted. For example, the workaround gener-nodes in a transportation network, such as ator might conclude that the capacity of thebridges along supply routes. However, bridges link is zero for the first 48 hours, but thereafter,and other targets can be repaired, and there is a temporary bridge will be in place that canlittle point in destroying a bridge if an avail- sustain a capacity of 170 vehicles an hour. Sec-able fording site is nearby. If a plan requires an ond is a time line of engineering actions that theinterruption in traffic of several days, and the enemy might carry out to implement thebridge can be repaired in a few hours, then repair, the time these actions require, and theanother target might be more suitable. Target temporal constraints among them. If thereselection, then, requires some reasoning about appears to be more than one viable repair strat-how an enemy might “work around” the dam- egy, a time line should be provided for each.age to the target. Third is a set of required assets for each time line The task of the workaround challenge prob- of actions, a description of the engineeringlem is to automatically assess how rapidly and resources that are used to repair the damageby what method an enemy can reconstitute or and pointers to the actions in the time line The challengebypass damage to a target and, thereby, help that utilize these assets. The reconstitution problems areair-campaign planners rapidly choose effective schedule provides the minimal informationtargets. The focus of the workaround problem required to evaluate the suitability of a given solved byin the first year of HPKB is automatic target. The time line of actions provides an integratedworkaround generation. explanation to justify the reconstitution schedule. The set of required assets is easily systems The workaround task involves detailed rep-resentation of targets and the local terrain derived from the time line of actions and can fielded byaround the target and detailed reasoning about be used to suggest further targets for preemp- integrationactions the enemy can take to reconstitute or tive air strikes against the enemy to frustrate itsbypass this damage. Thus, the input to repair efforts. teams led byworkaround systems include the following ele- A training data set was provided to help Teknowledgements: developers build their systems. It supplied First is a description of a target (for example, input and output for several sample problems, and SAIC.a bridge or a tunnel), the damage to it (for together with detailed descriptions of the cal-example, one span of a bridge is dropped; the culations carried out to compute action dura-bridge and vicinity are mined), and key fea- tions; lists of simplifying assumptions made totures of the local terrain (for example, the facilitate these calculations; and pointers toslope and soil types of a terrain cross section text sources for information on engineeringcoincident with the road near the bridge, resources and their use (mainly Army fieldtogether with the maximum depth and the manuals available on the World Wide Web).speed of any river or stream the bridge crosses). Workaround generation requires detailed Second is a specific enemy unit or capability knowledge about what the capabilities of theto be interdicted, such as a particular armored enemy’s engineering equipment are and howbattalion or supply trucks carrying ammuni- it is typically used by enemy forces. For exam-tion. ple, repairing damage to a bridge typically Third is a time period over which this unit involves mobile bridging equipment, such asor capability is to be denied access to the tar- armored vehicle-launched bridges (AVLBs),geted route. The presumption is that the ene- medium girder bridges, Bailey bridges, or floatmy will try to repair the damage within this bridges such as ribbon bridges or M4T6time period; a target is considered to be effec- bridges, together with a range of earthmovingtive if there appears to be no way for the ene- equipment such as bulldozers. Each kind ofmy to make this repair. mobile bridge takes a characteristic amount of Fourth is a detailed description of the enemy time to deploy, requires different kinds of bankresources in the area that could be used to preparation, and is “owned” by different eche-repair the damage. For the most part, repairs to lons in the military hierarchy, all of which WINTER 1998 31
  8. 8. Articles affect the time it takes to bring the bridge to a object-oriented format (Pease and Carrico damage site and effect a repair. Because HPKB 1997a, 1997b), and applications of this generic operates in an entirely unclassified environ- semantics to domain-specific tasks are promis- ment, U.S. engineering resources and doctrine ing (Pease and Albericci 1998). The develop- were used throughout. Information from ment of ontologies for integrating manufactur- Army field manuals was supplemented by a ing planning applications (Tate 1998) and series of programwide meetings with an Army work flow (Lee et al. 1996) is ongoing. combat engineer, who also helped construct Another option for semantic integration is sample problems and solutions. software mediation (Park, Gennari, and Musen 1997). This software mediation can be seen as a variant on pairwise integration, but because Integrated Systems integration is done by knowledge-based The challenge problems are solved by integrat- means, one has an explicit expression of the ed systems fielded by integration teams led by semantics of the conversion. Researchers at Teknowledge and SAIC. Teknowledge favors a Kestrel Institute have successfully defined for- centralized architecture that contains a large mal specifications for data and used these the- commonsense ontology (CYC); SAIC has a dis- ories to integrate formally specified software. tributed architecture that relies on sharing spe- In addition, researchers at Cycorp have suc- cialized domain ontologies and knowledge cessfully applied CYC to the integration of mul- bases, including a large upper-level ontology tiple databases. based on the merging of CYC, SENSUS, and other The Teknowledge approach to integration is knowledge bases. to share knowledge among applications and create new knowledge to support the challenge Teknowledge Integration Teknowledge The Teknowledge integration team comprises problems. Teknowledge is defining formal semantics for the input and output of each favors a Teknowledge, Cycorp, and Kestrel Institute. Its application and the information in the chal- centralized focus is on semantic integration and the cre- lenge problems. ation of massive amounts of knowledge. Many concepts defy simple definitions. architecture Semantic Integration Three issues make Although there has been much success in that contains software integration difficult. Transport issues defining the semantics of mathematical con- cepts, it is harder to be precise about the a large concern mechanisms to get data from one semantics of the concepts people use every process or machine to another. Solutions commonsense include sockets, remote-method invocation day. These concepts seem to acquire meaning ontology (RMI), and CORBA. Syntactic issues concern how through their associations with other con- cepts, their use in situations and communica- to convert number formats, “syntactic sugar,” (CYC) …. and the labels of data. The more challenging tion, and their relations to instances. To give issues concern semantic integration: To integrate the concepts in our integrated system real elements properly, one must understand the meaning, we must provide a rich set of associ- meaning of each. The database community ations, which requires an extremely large has addressed this issue (Wiederhold 1996); it knowledge base. CYC offers just such a knowl- is even more pressing in knowledge-based sys- edge base. tems. CYC (Lenat 1995; Lenat and Guha 1990) The current state of practice in software inte- consists of an immense, multicontextual gration consists largely of interfacing pairs of knowledge base; an efficient inference engine; systems, as needed. Pairwise integration of this and associated tools and interfaces for acquir- kind does not scale up, unanticipated uses are ing, browsing, editing, and combining knowl- hard to cover later, and chains of integrated edge. Its premise is that knowledge-based soft- systems at best evolve into stovepipe systems. ware will be less brittle if and only if it has Each integration is only as general as it needs access to a foundation of basic commonsense to be to solve the problem at hand. knowledge. This semantic substratum of Some success has been achieved in low-level terms, rules, and relations enables application integration and reuse; for example, systems programs to cope with unforeseen circum- that share scientific subroutine libraries or stances and situations. graphics packages are often forced into similar The CYC knowledge base represents millions representational choices for low-level data. of hand-crafted axioms entered during the 13 DARPA has invested in early efforts to create years since CYC’s inception. Through careful reuse libraries for integrating large systems at policing and generalizing, there are now higher levels. Some effort has gone into slightly fewer than 1 million axioms in the expressing a generic semantics of plans in an knowledge base, interrelating roughly 50,00032 AI MAGAZINE
  9. 9. Articlesatomic terms. Fewer than two percent of these Teknowledge has developed a template intoaxioms represent simple facts about proper which user-specified parameters can be insert-nouns of the sort one might find in an ed. START parses English queries for a few of thealmanac. Most embody general consensus crisis-management questions to fill in theseinformation about the concepts. For example, templates. Each filled template is a legal CYCone axiom says one cannot perform volitional query. TextWise Corporation has been devel-actions while one sleeps, another says one can- oping natural language information-retrievalnot be in two places at once, and another says software primarily for news articles (Liddy,you must be at the same place as a tool to use Paik, and McKenna 1995). Teknowledgeit. The knowledge base spans human capabili- intends to use the TextWise knowledge baseties and limitations, including information on information tools (KNOW-IT) to supply manyemotions, beliefs, expectations, dreads, and instances to CYC of facts discovered from newsgoals; common everyday objects, processes, stories. The system can parse English text andand situations; and the physical universe, return a series of binary relations that expressincluding such phenomena as time, space, the content of the sentences. There are severalcausality, and motion. dozen relation types, and the constants that The CYC inference engine comprises an epis- instantiate each relation are WORDNET synsettemological and a heuristic level. The epistemo- mappings (Miller et al. 1993). Each of the con-logical level is an expressive nth-order logical cepts has been mapped to a CYC expression,language with clean formal semantics. The and a portion of WORDNET has been mapped toheuristic level is a set of some three dozen spe- CYC. For those synsets not in CYC, the WORDNETcial-purpose modules that each contains its hyponym links are traversed until a mapped …own algorithms and data structures and can CYC term is found.recognize and handle some commonly occur- Battlespace Integration Teknowledge SAIC has aring sorts of inference. For example, one supported the movement-analysis workaround distributedheuristic-level module handles temporal rea-soning efficiently by converting temporal rela- problem. Movement analysis: Several movement- architecturetions into a before-and-after graph and then analysis systems were to be integrated, and that reliesdoing graph searching rather than theoremproving to derive an answer. A truth mainte- much preliminary integration work was done. on sharing Ultimately, the time pressure of the challengenance system and an argumentation-based problem evaluation precluded a full integra- specializedexplanation and justification system are tight-ly integrated into the system and are efficient tion. The MIT and UMass movement-analysis domain systems are described briefly here; the SMI andenough to be in operation at all times. In addi- SRI systems are described in the section enti- ontologiestion to these inference engines, CYC includes tled SAIC Integrated Systems. andnumerous browsers, editors, and consistency The MIT MAITA system provides tools forcheckers. A rich interface has been defined. constructing and controlling networks of dis- knowledgeCrisis-Management Integration The cri- tributed-monitoring processes. These tools bases ….sis-management challenge problem involves provide access to large knowledge bases ofanswering test questions presented in a struc- monitoring methods, organized around thetured grammar. The first step in answering a hierarchies of tasks performed, knowledgetest question is to convert it to a form that CYC used, contexts of application, the alerting ofcan reason with, a declarative decision tree. utility models, and other dimensions. Individ-When the tree is applied to the test question ual monitoring processes can also make use ofinput, a CYC query is generated and sent to CYC. knowledge bases representing commonsense Answering the challenge problem questions or expert knowledge in conducting their rea-takes a great deal of knowledge. For the first soning or reporting their findings. MIT builtyear’s challenge problem alone, the Cycorp monitoring processes for sites and convoysand Teknowledge team added some 8,000 con- with these tools.cepts and 80,000 assertions to CYC. To meet the The UMass group tried to identify convoysneeds of this challenge problem, the team cre- and sites with very simple rules. Rules wereated significant amounts of new knowledge, developed for three site types: (1) battle posi-some developed by collaborators and merged tions, (2) command posts, and (3) assembly-into CYC, some added by automated processes. staging areas. The convoy detector simply The Teknowledge integrated system in- looked for vehicles traveling at fixed distancescludes two natural language components: from one another. Initially, UMass was goingSTART and TextWise. The START system was cre- to recognize convoys from their dynamics, inated by Boris Katz (1997) and his group at MIT. which the distances between vehicles fluctuateFor each of the crisis-management questions, in a characteristic way, but in the simulated WINTER 1998 33
  10. 10. Articles data, the distances between vehicles remained soning process. For this reason, a hierarchical fixed. UMass also intended to detect sites by task network (HTN) approach was taken. A the dynamics of vehicle movements between planning-specific ontology was defined within them, but no significant dynamic patterns the larger CYC ontology, and planning rules could be found in the movement data. only referenced concepts within this more Workarounds: Teknowledge developed two constrained context. The planning application workaround integrations, one an internal was essentially embedded in CYC. Teknowledge system, the other from AIAI at CYC had to be extended to represent com- the University of Edinburgh. posite actions that have several alternative Teknowledge developed a planning tool decompositions and complex preconditions- based on CYC, essentially a wrapper around effects. Although it is not a commonsense CYC’s existing knowledge base and inference approach, AIAI decided to explore HTN plan- engine. A plan is a proof that there is a path ning because it appeared suitable for the from the final goal to the initial situation workaround domain. It was possible to repre- through a partially ordered set of actions. The sent actions, their conditions and effects, the rules in the knowledge base driving the plan- plan-node network, and plan resources in a ner are rules about action preconditions and relational style. The structure of a plan was about which actions can bring about a certain implicitly represented in the proof that the state of affairs. There is no explicit temporal corresponding composite action was a relation reasoning; the partial order of temporal prece- between particular sets of conditions and dence between actions is established on the effects. Once proved, action relations are basis of the rules about preconditions and retained by CYC and are potentially reusable. effects. An advantage of implementing the AIAI plan- The planner is a new kind of inference ner in CYC was the ability to remove brittleness engine, performing its own search but in a from the planner-input knowledge format; for much smaller search space. However, each step example, it was not necessary to account for all in the search involves interaction with the the possible permutations of argument order existing inference engine by hypothesizing in predicates such as bordersOn and between. actions and microtheories and doing asks and asserts in these microtheories. This hypothe- SAIC Integrated System sizing and asserting on the fly in effect SAIC built an HPKB integrated knowledge envi- amounts to dynamically updating the knowl- ronment (HIKE) to support both crisis-manage- edge base in the course of inference; this capa- ment and battlespace challenge problems. The bility is new for the CYC inference engine. architecture of HIKE for crisis management is Consistent with the goals of HPKB, the shown in figure 4. For battlespace, the architec- Teknowledge workaround planner reused CYC’s ture is similar in that it is distributed and relies knowledge, although it was not knowledge on the open knowledge base connectivity specific to workarounds. In fact, CYC had never (OKBC) protocol, but of course, the compo- been the basis of a planner before, so even stat- nents integrated by the battlespace architecture ing things in terms of an action’s precondi- are different. HIKE’s goals are to address the dis- tions was new. What CYC provided, however, tributed communications and interoperability was a rich basis on which to build workaround requirements among the HPKB technology knowledge. For example, the Teknowledge components—knowledge servers, knowledge- team needed to write only one rule to state “to acquisition tools, question-and-answering sys- use something as a device you must have con- tems, problem solvers, process monitors, and so trol over that device,” and this rule covered the on—and provide a graphic user interface (GUI) cases of using an M88 to clear rubble, a mine tailored to the end users of the HPKB environ- plow to breach a minefield, a bulldozer to cut ment. into a bank or narrow the gap, and so on. The HIKE provides a distributed computing infra- reason one rule can cover so many cases is structure that addresses two types of commu- because clearing rubble, demining an area, nications needs: First are input and output narrowing a gap, and cutting into a bank are data-transportation and software connectivi- all specializations of IntrinsicStateChange- ties. These include connections between the Event, an extant part of the CYC ontology. HIKE server and technology components, con- The AIAI workaround planner was also nections between components, and connec- implemented in CYC and took data from tions between servers. HIKE encapsulates infor- Teknowledge’s FIRE&ISE-TO-MELD translator as its mation content and data transportation input. The central idea was to use the scriptlike through JAVA objects, hypertext transfer proto- structure of workaround plans to guide the rea- col (HTTP), remote-method invocation ( JAVA34 AI MAGAZINE
  11. 11. Articles Electronic WWW archival sources Real time news feeds Question Answering Qu es KNOW tio SKC n e -IT An f ac START sw r er te in In ATP g s er U OKBC Hike (Open Knowledge SNARK Servers Base Connectivity) BUS SPOOK HIKE Clients Analyst GKB- webKB Editor Ontolingua WWW Knowledge Services Training Data Knowledge Engineer Figure 4. SAIC Crisis-Management Challenge Problem Architecture.RMI), and database access (JDBC). Second, HIKE system for GMU. SAIC built a front end to theprovides for knowledge-content assertion and OKBC server for LOOM that was extensivelydistribution and query requests to knowledge used by the members of the battlespace chal-services. lenge problem team. The OKBC protocol proved essential. SRI With OKBC and other methods, the HIKEused it to interface the theorem prover SNARK to infrastructure permits the integration of newan OKBC server storing the Central Intelli- technology components (either clients orgence Agency (CIA) World Fact Book knowledge servers) in the integrated end-to-end HPKB sys-base. Because this knowledge base is large, SRI tem without introducing major changes, pro-did not want to incorporate it into SNARK but vided that the new components adhere to theinstead used the procedural attachment fea- specified protocols.ture of SNARK to look up facts that were avail- Crisis-Management Integration Theable only in the World Fact Book. MIT’s START SAIC crisis-management architecture issystem used OKBC to connect to SRI’s OCELOT- focused around a central OKBC bus, as shownSNARK OKBC server. This connection will even- in figure 4. The technology components pro-tually give users the ability to pose questions vide user interfaces, question answering, andin English, which are then transformed to a knowledge services. Some components haveformal representation by START and shipped to overlapping roles. For example, MIT’s START sys-SNARK using OKBC; the result is returned using tem serves both as a user interface and a ques-OKBC. ISI built an OKBC server for their LOOM tion-answering component. Similarly, CMU’s WINTER 1998 35
  12. 12. Articles WEBKB supports both question answering and to reuse knowledge whenever it made sense. knowledge services. The SAIC team reused three knowledge bases: HIKE provides a form-based GUI with which (1) the HPKB upper-level ontology developed users can construct queries with pull-down by Cycorp, (2) the World Fact Book knowledge menus. Query-construction templates corre- base from the CIA, and the Units and Measures spond to the templates defined in the crisis- Ontology from Stanford. Reusing the upper- management challenge problem specification. level ontology required translation, compre- Questions also can be entered in natural lan- hension, and reformulation. The ontology was guage. START and the TextWise component released in MELD (a language used by Cycorp) accept natural language queries and then and was not directly readable by the SAIC sys- attempt to answer the questions. To answer tem. In conjunction with Stanford, SRI devel- questions that involve more complex types of oped a translator to load the upper-level ontol- reasoning, START generates a formal representa- ogy into any OKBC-compliant server. Once tion of the query and passes it to one of the loaded into the OCELOT server, the GKB editor theorem provers. was used to comprehend the upper ontology. The Stanford University Knowledge Systems The graphic nature of the GKB editor illuminat- Laboratory ONTOLINGUA, SRI International’s ed the interrelationships between classes and graphic knowledge base (GKB) editor, WEBKB, predicates of the upper-level ontology. Because and TextWise provide the knowledge service the upper-level ontology represents functional components. The GKB editor is a graphic tool relationships as predicates but SNARK reasons for browsing and editing large knowledge bases, efficiently with functions, it was necessary to used primarily for manual knowledge acquisi- reformulate the ontology to use functions The guiding tion. WEBKB supports semiautomatic knowledge whenever a functional relationship existed. philosophy acquisition. Given some training data and an ontology as input, a web spider searches in a Battlespace Integration The distributed HIKE infrastructure is well suited to support an during directed manner and populates instances of integrated battlespace challenge problem as it knowledge classes and relations defined in the ontology. was originally designed: a single information Probabilistic rules are also extracted. TextWise system for movement analysis, trafficability,base develop- extracts information from text and newswire and workaround reasoning. However, the traffi-ment for crisis feeds, converting them into knowledge inter- cability problem (establishing routes for various change format (KIF) triples, which are then kinds of vehicle given the characteristics) was management loaded into ONTOLINGUA. ONTOLINGUA is SAIC’s dropped, and the integration of the other prob- was to reuse central knowledge server and information lems was delayed. The components that solved knowledge repository for the crisis-management challenge problem. ONTOLINGUA supports KIF as well as these problems are described briefly later. Movement analysis: The movement-analy- whenever it compositional modeling language (CML). Flow sis problem is solved by MIT’s monitoring, made sense. models developed by Northwestern University analysis, and interpretation tools arsenal (MAI- (NWU) answer challenge problem questions TA); Stanford University’s Section on Medical related to world oil-transportation networks Informatics’ (SMI) problem-solving methods; and reside within ONTOLINGUA. Stanford’s system and SRI International’s Bayesian networks. The for probabilistic object-oriented knowledge MIT effort was described briefly in the section (SPOOK) provides a language for class frames to entitled Teknowledge Integration. Here, we be annotated with probabilistic information, focus on the SMI and SRI movement-analysis representing uncertainty about the properties systems. of instances in this class. SPOOK is capable of rea- For scalability, SMI adopted a three-layered soning with the probabilistic information based approach to the challenge problem: The first on Bayesian networks. layer consisted primarily of simple, context- Question answering is implemented in sev- free data processing that attempted to find eral ways. SRI International’s SNARK and Stan- important preliminary abstractions in the data ford’s abstract theorem prover (ATP) are first- set. The most important of these were traffic order theorem provers. WEBKB answers centers (locations that were either the starting questions based on the information it gathers. or stopping points for a significant number of Question answering is also accomplished by vehicles) and convoy segments (a number of START and TextWise taking a query in English as vehicles that depart from the same traffic cen- input and using information retrieval to ter at roughly the same time, going in roughly extract the answers from text-based sources the same direction). Spotting these abstractions (such as the web, newswire feeds). required setting a number of parameters (for The guiding philosophy during knowledge example, how big a traffic center is). Once base development for crisis management was trained, these first-layer algorithms are linear in36 AI MAGAZINE
  13. 13. Articlesthe size of the data set and enabled SMI to use ers were developed. One is a novel plannerknowledge-intensive techniques on the result- whose knowledge base is represented in theing (much smaller) set of data abstractions. ontologies, including its operators, state The second layer was a repair layer, which descriptions, and problem-specific informa-used knowledge of typical convoy behaviors tion. It uses a novel partial-match capabilityand locations on the battlespace to construct a developed in LOOM (MacGregor 1991). The oth-“map” of militarily significant traffic and traf- er is based on a state-of-the-art planner (Velosofic centers. The end result was a network of et al. 1995). Each solution lists several engi-traffic connected by traffic. Three main tasks neering actions for this workaround (for exam-remain: (1) classify the traffic centers, (2) figure ple, deslope the banks of the river, install aout what the convoys are doing, and (3) iden- temporary bridge), includes information abouttify which units are involved. SMI iteratively the sources used (for example, what kind ofanswered these questions by using repeated earthmoving equipment or bridge is used), andlayers of heuristic classification and constraint asserts temporal constraints among the indi-satisfaction. The heuristic-classification com- vidual actions to indicate which can be execut-ponents operated independently of the net- ed in parallel. A temporal estimation-assess-work, using known (and deduced) facts about ment problem solver evaluates each of thesingle convoys or traffic centers. Consider the alternatives and selects one as the most likelyfollowing rule for trying to identify a main choice for an enemy workaround. This prob-supply brigade (MSB) site (paraphrased into lem solver was developed in EXPECT (SwartoutEnglish, with abstractions in boldface): and Gil 1995; Gil 1994). If we have a current site which is unclas- Several general battlespace ontologies (for sified example, military units, vehicles), anchored and it’s in the Division support area, on the HPKB upper ontology, were used and and the traffic is high enough augmented with ontologies needed to reason and the traffic is balanced about workarounds (for example, engineering and the site is persistent with no major equipment). Besides these ontologies, the deployments emanating from it knowledge bases used included a number of then it’s probably an MSB problem-solving methods to represent knowl-SMI used similar rules for the constraint-satis- edge about how to solve the task. Both ontolo-faction component of its system, allowing gies and problem-solving knowledge were usedinformation to propagate through the network by two main problem solvers.in a manner similar to Waltz’s (1975) well- EXPECT’s knowledge-acquisition tools wereknown constraint-satisfaction algorithm for used throughout the evaluation to detect miss-edge labeling. ing knowledge. EXPECT uses problem-solving The goal of the SRI group was to induce a knowledge and ontologies to analyze whichknowledge base to characterize and identify information is needed to solve the task. Thistypes of site such as command posts and battle capability allows EXPECT to alert a user whenpositions. Its approach was to induce a there is missing knowledge about a problemBayesian classifier and use a generative model (for example, unspecified bridge lengths) or aapproach, producing a Bayesian network that situation. It also helps debug and refinecould serve as a knowledge base. This model- ontologies by detecting missing axioms anding required transforming raw vehicle tracks overgeneral definitions.into features (for example, the frequency of GMU developed the DISCIPLE98 system. DISCI-certain vehicles at sites, number of stops) that PLE is an apprenticeship multistrategy learningcould be used to predict sites. Thus, it was also system that learns from examples, from expla-necessary to have hypothetical sites to test. SRI nations, and by analogy and can be taught byrelied on SMI to provide hypothetical sites, an expert how to perform domain-specificand it also used some of the features that SMI tasks through examples and explanations in acomputed. As a classifier, SRI used tree-aug- way that resembles how experts teach appren-mented naive (TAN) Bayes (Friedman, Geiger, tices (Tecuci 1998). For the workaroundand Goldszmidt 1997). domain, DISCIPLE was extended into a baseline- Workarounds: The SAIC team integrated integrated system that creates an ontology bytwo approaches to workaround generation, acquiring concepts from a domain expert orone developed by USC-ISI, the other by GMU. importing them (through OKBC) from shared ISI developed course-of-action–generation ontologies. It learns task-decomposition rulesproblem solvers to create alternative solutions from a domain expert and uses this knowledgeto workaround problems. In fact, two alterna- to solve workaround problems through hierar-tive course-of-action–generation problem solv- chical nonlinear planning. WINTER 1998 37
  14. 14. Articles First, with DISCIPLE’s ontology-building tools, ferent metrics. The test items for crisis manage- a domain expert assisted by a knowledge engi- ment were questions, and the test was similar neer built the object ontology from several to an exam. Overall competence is a function sources, including expert’s manuals, Alphate- of the number of questions answered correctly, ch’s FIRE&ISE document and ISI’s LOOM ontology. but the crisis-management systems are also Second, a task taxonomy was defined by refin- expected to “show their work” and provide ing the task taxonomy provided by Alphatech. justifications (including sources) for their This taxonomy indicates principled decomposi- answers. Examples of questions, answers, and tions of generic workaround tasks into subtasks justifications for crisis management are shown but does not indicate the conditions under in the section entitled Crisis-Management which such decompositions should be per- Challenge Problem. formed. Third, the examples of hierarchical Performance metrics for the movement- workaround plans provided by Alphatech were analysis problem are related to recall and pre- used to teach DISCIPLE. Each such plan provided cision. The basic problem is to identify sites, DISCIPLE with specific examples of decomposi- vehicles, and purposes given vehicle trackWe claim that tions of tasks into subtasks, and the expert guid- data; so, performance is a function of how HPKB ed DISCIPLE to “understand” why each task many of these entities are correctly identified decomposition was appropriate in a particular technology situation. From these examples and the expla- and how many incorrect identifications are made. In general, identifications can be facilitates nations of why they are appropriate in the giv- marked down on three dimensions: First, the en situations, DISCIPLE learned general task- rapid decomposition rules. After a knowledge base identified entity can be more or less like the actual entity; second, the location of the iden- modification consisting of an object ontology and task- tified entity can be displaced from the actual decomposition rules was built, the hierarchical of nonlinear planner of DISCIPLE was used to auto- entity’s true location; and third, the identifica- knowledge- matically generate workaround plans for new tion can be more or less timely. The workaround problem involves generat- based workaround problems. ing workarounds to military actions such assystems. This bombing a bridge. Here, the criteria for suc- claim was Evaluation cessful performance include coverage (the generation of all workarounds generated), The SAIC and Teknowledge integrated systemstested in both for crisis management, movement analysis, appropriateness (the generation of work- arounds appropriate given the action), speci- phases of the and workarounds were tested in an extensive ficity (the exact implementation of the work- experiment study in June 1998. The study followed a two- around), and accuracy of timing inferences phase, test-retest schedule. In the first phase, because each the systems were tested on problems similar to (the length each step in the workaround takes to implement). phase allows those used for system development, but in the Performance evaluation, although essential, second, the problems required significant time to modifications to the systems. Within each tells us relatively little about the HPKB inte- grated systems, still less about the component improve phase, the systems were tested and retested on technologies. We also want to know why the the same problems. The test at the beginning performance of each phase established a baseline level of systems perform well or poorly. Answering this on test performance, but the test at the end measured question requires credit assignment because the systems comprise many technologies. We problems. improvement during the phase. We claim that HPKB technology facilitates also want to gather evidence pertinent to sev- rapid modification of knowledge-based sys- eral important, general claims. One claim is tems. This claim was tested in both phases of that HPKB facilitates rapid construction of the experiment because each phase allows knowledge-based systems because ontologies time to improve performance on test prob- and knowledge bases can be reused. The chal- lems. Phase 2 provides a more stringent test: lenge problems by design involve broad, rela- Only some of the phase 2 problems can be tively shallow knowledge in the case of crisis solved by the phase 1 systems, so the systems management and deep, fairly specific knowl- were expected to perform poorly in the test at edge in the battlespace problems. It is unclear the beginning of phase 2. The improvement in which kind of problem most favors the reuse performance on these problems during phase claim and why. We are developing analytic 2 is a direct measure of how well HPKB tech- models of reuse. Although the predictions of nology facilitates knowledge capture, represen- these models will not be directly tested in the tation, merging, and modification. first year’s evaluation, we will gather data to Each challenge problem is evaluated by dif- calibrate these models for a later evaluation.38 AI MAGAZINE
  15. 15. Articles Results of the Challenge ity of the presentation of the explanation, the automatic production by the system of a repre- Problem Evaluation sentation of the question, source novelty, andWe present the results of the crisis-manage- reconciliation of multiple sources. Each ques-ment evaluation first, followed by the results tion could garner a score between 0 and 3 onof the battle-space evaluation. each criterion, and the criteria were themselves weighted. Some questions had multiple parts, When youCrisis Management and the number of parts was a further weight-The evaluation of the SAIC and Teknowledge ing criterion. In retrospect, it might have been consider thecrisis-management systems involved 7 trials or clearer to assign each question a percentage of difficulty of the the points available, thus standardizing allbatches of roughly 110 questions. Thus, morethan 1500 answers were manually graded by scores, but in the data that follow, scores are task, boththe challenge problem developer, IET, and sub- on an open-ended scale. Subject-matter systemsject matter experts at PSR on criteria ranging experts were assisted with scoring the quality performedfrom correctness to completeness of source of knowledge representations when necessary.material to the quality of the representation of A web-based form was developed for scor- remarkablythe question. Each question in a batch was ing, with clear instructions on how to assign well. Scores onposed in English accompanied by the syntax of scores. For example, on the correct-answer cri-the corresponding parameterized question (fig- terion, the subject-matter expert was instruct- the sampleure 3). The crisis-management systems were ed to award “zero points if no top-level answer questions weresupposed to translate these questions into an is provided and you cannot infer an intendedinternal representation, MELD for the Teknowl- answer; one point for a wrong answer without relatively high,edge system and KIF for the SAIC system. The any convincing arguments, or most required which is notMELD translator was operational for all the tri- answer elements; two points for a partially cor- rect answer; three points for a correct answer surprisingals; the KIF translator was used to a limitedextent on later trials. addressing most required elements.” because these The first trial involved testing the systems When you consider the difficulty of the task, questions hadon the sample questions that had been avail- both systems performed remarkably well.able for several months for training. The Scores on the sample questions were relatively been availableremaining trials implemented the “test and high, which is not surprising because these for training forretest with scenario modification” strategy dis- questions had been available for training forcussed earlier. The first batch of test questions, several months (figure 5). It is also not surpris- several monthsTQA, was repeated four days later as a retest; it ing that scores on the first batch of test ques- ….was designated TQA’ for scoring purposes. The tions (TQA) were not high. It is gratifying, however, to see how scores improve steadily It is also notdifference in scores between TQA and TQA’represents improvements in the systems. After between test and retest (TQA and TQA’, TQC surprising thatsolving the questions in TQA’, the systems and TQC’) and that these gains are general: scores on thetackled a new set, TQB, designed to be “close The scores on TQA’ and TQB and TQC’ andto” TQA. The purpose of TQB was to check TQD were similar. first batch ofwhether the improvements to the systems gen- The scores designated auto in figure 5 refer test questionseralized to new questions. After a short break, to questions that were translated automaticallya modification was introduced into the crisis- from English into a formal representation. The (TQA) were notmanagement scenario, and new fragments of Teknowledge system translated all questions high. It is automatically, the SAIC system very few. Ini-knowledge about the scenario were released.Then, the cycle repeated: A new batch of ques- tially, the Teknowledge team did not manipu- gratifying,tions, TQC, tested how well the systems coped late the resulting representations, but in later however, to seewith the scenario modification; then after four batches, they permitted themselves minor how scoresdays, the systems were retested on the same modifications. The effects of these can be seenquestions, TQC’, and on the same day, a final in the differences between TQB and TQB-Auto, improvebatch, TQD, was released and answered. TQC and TQC-Auto, and TQD and TQD-Auto. steadily Each question in a trial was scored according Although the scores of the Teknowledge andto several criteria, some official and others SAIC systems appear close in figure 5, differ- between testoptional. The four official criteria were (1) the ences between the systems appear in other and retest…correctness of the answer, (2) the quality of the views of the data. Figure 6 shows the perfor-explanation of the answer, (3) the complete- mance of the systems on all official questionsness and quality of cited sources, and (4) the plus a few optional questions. Although thesequality of the representation of the question. extra questions widen the gap between the sys-The optional criteria included lay intelligibility tems, the real effect comes from addingof explanations, novelty of assumptions, qual- optional components to the scores. Here, WINTER 1998 39