Semantic alignment paper


Published on

The AQUA Question Answering System uses two separate ontologically based systems in its operation. The first system, a knowledge-based information extraction system, derives the content from text documents (and queries) and converts them into an internal text meaning representation form (TMR). The second ontologically based system is the answer formulation unit, which maintains a separate ontology in a different form from the first. Answers produced by the answer formulation system are in Knowledge Interchange Format (KIF).

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Semantic alignment paper

  1. 1. The AQUA Question-Answering System: Bi-Directional Dynamic Semantic Alignment in a Multiple-Ontology Environment Maureen Caudill SAIC Enterprise Information Systems Division 10260 Campus Point Drive, MS C-2 San Diego, CA 92121 Barbara Starr SAIC Enterprise Information Systems Division 10260 Campus Point Drive, MS C-2 San Diego, CA 92121 Teresa Buss SAIC Enterprise Information Systems Division 10260 Campus Point Drive, MS C-2 San Diego, CA 92121 ABSTRACT the solution do not determine the nature of the semantic alignment process.The AQUA Question Answering System uses twoseparate ontologically based systems in its operation. Keywords: Semantic Alignment, Knowledge-BasedThe first system, a knowledge-based information System Development, Question-Answering Systems,extraction system, derives the content from text Event extraction, Event understanding, Automatic Eventdocuments (and queries) and converts them into an Extraction.internal text meaning representation form (TMR). Thesecond ontologically based system is the answer 1. BACKGROUNDformulation unit, which maintains a separate ontology ina different form from the first. Answers produced by the The AQUA Question Answering System is based on twoanswer formulation system are in Knowledge separate ontologically based systems usingInterchange Format (KIF). These answers must be independently developed ontologies. The process ofconverted back into the text meaning representation ontological development is developer-specific in that itwhere they can be translated by an answer generation depends strongly on the perspective or view of theunit into natural language responses to the user query. person or persons developing the ontology. This meansThe key technical challenge of this system is bi- that two ontologies developed by different peopledirectionally translating between the two ontologies independently of each other are unlikely to have directwhen each is dynamically changing in the course of correspondences.operations. This paper presents a description of ourtechnical solution to this challenging problem. Yet at the same time, ontologies reflect the reality of theFurthermore, the approach we have taken is one that can world. Thus there typically must be some ontologicalbe extended to other ontologically based systems, or to term that reflects basic concepts such as “seeing,”systems with more than two ontologies. It is generic in “saying,” “feeling,” “eating,” and so on. The challengeits approach, in that the specific ontologies involved in is to find ways of aligning these multiple world views into a single method—without requiring changes in
  2. 2. either ontology in the system. The system itself must be based on a narrow-but-deep ontology concept. Figure 1able to adapt to the ontologies used in its subparts. illustrates how this process works.In developing the AQUA system, we have tried to A set of documents relevant to a specific knowledgeachieve exactly this type of semantic alignment in a domain is collected. Each of these documents isflexible, semi-automated way that enables independent processed by the information extraction system, whichdevelopment of the text extraction unit, the reasoner unit, generates an intermediate, Interlingua representation ofand the answer generation unit. the knowledge. This extraction requires the use of the first ontology in the system.In the sections that follow, we describe the dynamicsemantic alignment system developed for AQUA. First, As with other ontologically based systems, thethe conversion of text data sources into knowledge will “ontology” actually consists of three interconnected setsbe described; next, the question-answering process will of definitions. The first is the conceptual ontology—thebe presented to illustrate the operation of the dynamic class structure—of the concepts known to the system.semantic alignment system; finally, the architecture of For example, a “sedan” is a subclass of the ontologicalthe dynamic semantic alignment system will be concept “automobile” and so on. The second set ofpresented, along with commentary on how it can be constructs is the lexicon—the specific words that areextended for use with more than two ontologies. understood by the system and classified as lexical terms in the ontological system. Thus, “Ford Taurus” might be 2. CONVERTING TEXT INTO KNOWLEDGE a lexical term that is an instance of the class “sedan.” Finally, the system has relations that can be used toA knowledge-based question-answering system such as define the linkages between two or more lexicalAQUA bases its answers on the knowledge contained constructs. Thus the relation “owned-by” connects thewithin its knowledge bases. These answers are lexical terms “Ford-Taurus” and “Tom,” and would bedeveloped by encoding the contents of text documents derived from a sentence such as Tom owns a Fordand other sources into knowledge representations that Taurus.” To correctly translate between two ontologicalcan be used to infer answers for queries. Thus, before systems requires semantically aligning all three of theseany query can be answered, the knowledge bases must be types of constructs: ontological classes, lexical terms,populated with significant, relevant data. This process is and lexical relations.commonly identified as an “information extraction”process, though in fact it is much more complex than The text-to-knowledge conversion process first considerssimply extracting nouns and verbs from sentences. the relations established by the information extraction system. Within those relations, slots are filled by variousIn the case of AQUA, we make use of a sophisticated lexical terms that are instances of various ontologicalknowledge-based information extraction system that is classes. Each term must independently be considered and translated into the corresponding term in the other Ontology Translator Clkjv. Andl a Hlskdj LSk (defobject EVENT LSd lsd Sdll ald sldkj (instance-of EVENT attack) Lskdj aslkd (performed-by EVENT I ran) Dskf aldjf (is-defender EVENT Iraq) Df;la;akfj … Large Number (takes-place-when EVENT Of Possible Select Relevant Retrieve Data (month May) (year 1993)) :documentation “CI A World Data Sources Data Sources From Sources Fact Book”) Convert Data I nto Knowledge (defobject EVENT (instance-of EVENT attack) (performed-by EVENT I ran) (is-defender EVENT Iraq) … Domain (takes-place-when EVENT Knowledge (month May) (year 1993)) :documentation “CI A World KBs Fact Book”) Knowledge KI F Stored into OKS Knowledge I nterface Knowledge Bases Representations Figur e 1 . Con ver t i ng t ex t int o k n ow ledge
  3. 3. ontology. If no such lexical term exists, one must be processor translates the query into an Interlingua form,created for it in the other ontological class structure. which is then translated into KIF. The KIF query isThis process requires considering the ontological presented to the Answer Formulation system, whichinheritance of the term in the first ontology, and infers the correct answer based on the knowledge contentcomparing that to the ontological class structure in the of the documents it has processed combined with thesecond ontology. When the most specific possible match world-knowledge contained within its knowledge found, that becomes the basis for generating a new That answer is provided in KIF, which is then translatedlexical term for the missing item. More details of this back into the Interlingua form. The Interlingua responseknowledge conversion process are presented in the subsequently goes to an answer generation system thatcompanion paper to this one, [12] “Event Templates for generates a natural language reply to the originalImproved Narrative Understanding in Question question.Answering Systems,” also presented in theseproceedings. This query response process thus requires two translations, one from the Interlingua representation toOnce the information extracted from the documents has KIF (the query) and one from KIF to the Interlinguabeen translated into the new ontological system and representation (the answer).stored as knowledge representations, the system is readyto answer questions about the documents it has As with translating between natural languages, reversingprocessed. the direction of the translation process (i.e., KIF to Interlingua or Interlingua to KIF) is not as simple as 3. PROCESSING A QUERY merely substituting the reverse concepts. The answer to a complex query may easily contain concepts that do not Query processing in AQUA is performed according to appear in any previously processed document or query.the dataflow shown in Figure 2. The query processing To complete these translations accurately, it is necessaryrequires two translations: The first is the translation of to separately map Interlingua  KIF and KIF the query from the information extraction system’s Interlingua translations. Also, each of these ontologicalInterlingua to KIF, while the second is the translation of systems are dynamic and mutable. Thus, the process ofthe answer from KIF into Interlingua (from which, in a mapping between them must be similarly dynamic.separate process, it will be converted into a naturallanguage response to the user). A user enters a natural The following section describes the dynamic semanticlanguage query. alignment system. 4. DYNAMIC SEMANTIC ALIGNMENT SYSTEMThe user begins the question-answering operation byentering a natural language question. The query I nterlingua NL Query Query Query I nterlingua  KI F Processor Translation Answer KI F Query NL Answer I nterlingua KI F Answer Answer Answer Generation KI F  I nterlingua Translation Answer Formulation System System Figu r e 2 . Pr ocessin g a Quer y
  4. 4. The dynamic semantic alignment system uses databases encountered allows the system to quickly identify whatto keep track of the correct translations across ontologies. documents are relevant to a specific query using thatFive key databases are used in this process, illustrated in term. For example, if a document defines a term “us-Figure 3. special-forces-in-philippines” and a later query asks about the purpose of having U.S. Special Forces troopsThe first database, the Ontology Definition database, is in the Philippines, it will be easy to identify exactlyone that keeps track of which ontological systems are which document is relevant to that query. Thisknown. It contains key information about the ontology significantly reduces the burden on the inferencingtables and other data needed to access the ontological system by providing automatic dynamic partitioning ofinformation. the knowledge base. Each of the known ontologies are stored separately in The dynamic semantic alignment process consists of acorresponding Ontology Databases, which are multi-stage comparison between the two ontologicalpartitioned into ontological classes, lexical terms, and systems. A step through this process will illuminate therelations. Other data residing in the Ontology Databases methodology. The actions in the example below assumeinclude the immediate parent class of the term, and the that the semantic alignment process has just begun anddocumentation for those terms. This information is the system has few mappings between ontologies tocritical since processing new documents will often work with.generate new lexical instances. When new terms areadded to the lexicon (or to the ontology), they are When a term must be translated from Interlingua to KIF,annotated with the document identification number in the first step is to look up the term in the Interlingua-to-which the terms were first encountered. See [12] for KIF Ontology Mapping table. If it has previously beenseveral examples of lexical terms defined in the process translated, the results of that translation are stored in thisof representing the knowledge of a document table. For example, suppose the term is “speech-act” in Interlingua. What is the corresponding term in KIF forThe key translation database is the Ontology Mapping that concept?database. As mappings from one ontological system toanother are discovered, they are stored here. This allows If we assume that the Interlingua term has never beforemuch more efficient processing of the same term if it is been encountered, the first step is to consider the classencountered again. structure for “speech-act” in the Interlingua ontology. Since each ontological term includes information on itsThe final database is the Processed Documents database. immediate parent(s), this is easy to look up in theThis keeps track of those documents that have been Interlingua ontology tables. The parent of “speech-act”translated from one system to another. This database might be, for example, “communication-act.”also keeps track of what new terms, if any, are defined ineach document. While not essential for the translation Again, these are compared to the mapping tables to see ifprocess, keeping track of where new terms are a correlation has been discovered between Defines the ontologies that the Ontology dynamic semantic alignment Definition system knows about. Tables Ontology Ontology I nclude separate tables for Tables Tables ontological classes, lexical terms, I nterlingua KI F and relations. Contains mappings from Maintains information one ontology to the other; about which lexical Ontology Processed mappings are not and ontological terms Mapping Document necessarily bi-directional. were defined in which Tables Tables documents. Figu r e 3 . Dyn am i c Sem an t ic Ali gn m en t Syst em
  5. 5. “communication-act” and some concept in the KIF semantic alignment system will generate a new lexicalontology. This comparison might discover a link term based on the definition of the Interlingua term. Thebetween the Interlingua “communication-act” and the term will be an instance of some other ontological class,KIF “communicating” action. This implies that the and a mapping will first be made between the parentalproper link to KIF would be some subclass or instance of ontological class in Interlingua and KIF. Once that“communicating.” Under “communicating” there are mapping is ascertained, a new lexical entryseveral possible correlations, including one of “saying.” corresponding to the Interlingua entry will be made andAn alternative discovery may also identify the correct the mapping table updated to reflect that new entry. Thislinkage as an “inform-communication-act.” also updates the lexical tables for KIF, as well as the processed document tables to note that this new term hasIf there are no mappings discovered after backtracking been defined in this file.through the inheritance tree for a specified number ofgenerations, or until a specific ontological level is Because of the generic methods used to identifyachieved, a message is flagged to a knowledge engineer, mappings, it would be entirely possible to implement awho must enter an appropriate mapping by hand. This third mapping system if it were deemed appropriate to dopermits oversight of the ontological mapping method, so. For example, if the answer generation system used aand need be done only once for any specified mapping. separate ontology from the answer formulation system and from the information extraction system, the dynamicWhile this is a semi-automated process, allowing semantic alignment system could easily be extended toknowledge engineers to oversee the mapping process handle that set of translations.ensures that the correct mappings are made atfundamental levels. Preliminary mappings can also be The system could equally be extended to do mappings toaccomplished in a basic batch mode when the ontologies more than one other ontological system at any stage ofare first entered. It is not at all necessary to map every the effort. This would permit, for example, a directconcept to every other concept—only the most basic comparison between the answers produced by two orconcepts and actions need be mapped by hand to give the more separate answer formulation systems, or asystem a head start in mapping between ontologies. comparison of the quality of the natural language answers generated by two or more answer generationIn this example, two potential mappings exist, one to systems.“saying” and one to “inform-speech-act.” Both mappingswould be entered into the mapping table, thus ensuring 5. CONCLUSIONSthat as little information is lost as possible from theoriginal document. This generates a mapping table entry The semi-automated dynamic semantic alignment systemthat maps “speech-act”  ”saying” and “speech-act”  provides a methodology for translating between two”inform-speech-act.” (In this instance, it is highly likely ontologies in a highly consistent manner. Althoughthat the “inform-speech-act” mapping would be knowledge engineer input is required when the system isautomatically discovered by the system without human first brought on line, fewer and fewer instances must beintervention. Assuming no mappings exist between hand-mapped as corresponding concepts are discoveredparent ontological entries, discovery of the “saying” between the systems. In addition, the system can processmapping is less likely without intervention.) The result changes to the ontologies because new terms and classesof this in terms of converting the Interlingua to a are noted whenever the ontological system changes.knowledge representation would be a double definition: This in turn erases any previous mappings so that the new mapping process can begin again. A separate(defobject Event-E1 mapping tool permits knowledge engineers to be (instance-of Event-E1 inform-speech-act) apprised of any changes to the ontologies, and be (instance-of Event-E1 saying)) presented with a list of previously existing mappings for those terms, if any, for editorial review and alteration.The double definition has the advantage that if a query That allows the knowledge engineer to determine ifarrives about this event and it is phrased in a manner that changes to those mappings are appropriate, or if theit is mapped only to “inform-speech-act,” this event will ontological changes do not justify changing thestill be identified as a match. Similarly, a query that is mappings. This is much more efficient than having amapped only to “saying” will also be identified as a person review all ontological mappings every time therematch. is an ontology update. Only the mappings of changed terms (in either ontology) are reviewed for editing.If the entry is a locally defined term (i.e., defined withinthis document as specified with a variable name in The AQUA Question Answering system is underInterlingua along with a local definition of that variable), development and has shown preliminary results of beingand if no corresponding entry can be found, the dynamic
  6. 6. able to accurately and efficiently map across two [11] Amir, E. and McIlraith, S. “Partition-Based Logicalontological systems. Reasoning for First-Order and Propositional Theories,” Submitted for Publication. 6. REFERENCES [12] Caudill, M. and Starr, B. “Event Templates for[1] Adam Farquhar, Richard Fikes, and James P. Rice. Improved Narrative Understanding in Question“A Collaborative Tool for Ontology Construction.,” Answering Systems,” in Proceedings of Systemics,International Journal of Human Computer Studies, Cybernetics, and Informatics, 2002, July 14–18, 2002.46:707-727, 1997. [13] Raskin, V. and S. Nirenburg. “An Applied[2] Peter D. Karp, Vinay K. Chaudhri, and Suzanne M. Ontological Semantic Microtheory of AdjectivalPaley. “A Collaborative Environment for Authoring Meaning for Natural Language Processing.” MachineLarge Knowledge Bases.” Journal of Intelligent Translation. 1999.Information Systems, 1998. [14] Raskin, V. and S. Nirenburg. 1996. Ten Choices for[3] Paul Cohen, Robert Schrag, Eric Jones, Adam Pease, Lexical Semantics. NMSU CRL MCCS-96-304.Albert Lin, Barbara Starr, David Gunning, and MurrayBurke. The DARPA High Performance Knowledge [15] Viegas, E., Mahesh, S. Nirenburg and S. Beale.Bases Project. AI Magazine, Winter, 1998. pp. 25-49 1999. “Semantics in Action.” In P. Saint-Dizier (ed.), Predicative Forms in Natural Language and in Lexical[4] B. Katz, “From Sentence Processing to Information Knowledge Bases. Dordrecht: Kluwer Academic Press.Access on the World Wide Web,” AAAI SpringSymposium on Natural Language Processing for the [16] M. Genesereth and R. Fikes; KnowledgeWorld Wide Web, Stanford University, Stanford CA Interchange Format, Version 3.0 Reference Manual;(1997). Technical Report Logic-92-1,, Computer Science Department, Stanford University, Stanford, CA, 1992.[5] Gentner, D. and K. Forbus “MAC/FAC: A Model of Also, KSL Technical Report 92-86Similarity-based Retrieval,” Proceedings of theCognitiveScience Society. 1991.[6] Forbus, K. and D. Oblinger “Making SME Greedy This work was supported by the Advanced Research andand Pragmatic,” Proceedings of the Cognitive Science Development Activity (ARDA) as part of its AQUAINTSociety. 1990. Program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of[7] V. K. Chaudhri, J. D. Lowrance, M. E. Stickel, J. F. the author(s) and do not necessarily reflect the views ofThomere, and R. J. Waldinger the U.S. Government.“Ontology construction toolkit,” Technical NoteOntology, AI Center, SRI International, 333Ravenswood Ave., Menlo Park, CA 94025, 2000.[8] Deborah McGuinness “Description Logics Emergefrom Ivory Tower”’ Stanford Knowledge SystemsLaboratory Technical Report KSL-01-08 2001. In theProceedings of the International Workshop onDescription Logics. Stanford, CA, August 2001.[9] Deborah McGuinness “Conceptual Modeling forDistributed Ontology Environments.”(Word format)To appear in Proceedings of the Eighth InternationalConference on Conceptual Structures Logical,Linguistic,and Computational Issues (ICCS 2000). Darmstadt,Germany. August 14-18, 2000. [10] McIlraith, S. and Amir, E. “Theorem Proving withStructured Theories,” Proceedings of theSeventeenth International Conference on ArtificialIntelligence (IJCAI-01). pp. 624 -- 631, August, 2001.