Saic aqua summary


Published on

Question Anwsering for AQUAINT, by Barbara Starrr and Maureen Caudill

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Saic aqua summary

  1. 1. AQUAINT Question & Answering System (AQUA) Science Applications International Corporation Stanford University Knowledge Systems Laboratory AQUAINT Question Answering (AQUA) System Project Summary Prime Contractor: Science Applications International Corporation With Subcontractor: Stanford University – Knowledge Systems Laboratory Technical Points of Contact: Ms. Maureen Caudill Phone: (858) 826-5743 E-Mail: Fax: (858) 826-5517 10260 Campus Point Court San Diego, CA 92121 Ms. Barbara Starr Phone: (858) 826-3047 E-Mail: Fax: (858) 826-5517 10260 Campus Point Court San Diego, CA 92121 1
  2. 2. AQUAINT Question & Answering System (AQUA) Science Applications International Corporation Stanford University Knowledge Systems Laboratory AQUAINT QUESTION ANSWERING SYSTEM PROJECT SUMMARY The main goal of the AQUAINT Question Answering (AQUA) System’s technical approachis to incorporate breakthrough advancements in question-answering technologies that ultimatelycan be transitioned for use at a variety of U.S. government agencies. We will develop a question-answering system that will seek sources of information across a variety of information genres,assimilate that knowledge into sophisticated knowledge bases, and produce relevant, timely, andhelpful answers to complex questions. We will create advanced technologies to search and retrieve relevant text from unstructuredtext, structured databases, and metadata markups, as well as provide a capability to interpret andintegrate highly colloquial and informal text such as chat room or message board text. We willprovide source reliability assessments for that new data, and will translate their contents fromtext to knowledge base representations. The knowledge bases (KBs) will have a sophisticatedsuite of tools to automatically partition them into context- and query-sensitive segments, to rea-son efficiently across those segments, and to identify and resolve conflicts between new infor-mation and that already stored. Furthermore, we will provide answer explanations that arecarefully pruned and edited for readability, conciseness, and interestingness. The SAIC Team is confident that the new technology components that comprise the AQUAsystem will achieve the goals of this AQUAINT Program Phase 1 effort. The AQUA system willbe delivered as an integrated component solution, using a variety of technology approaches. Wewill advance the existing state-of-the-art in question answering, focusing on importantopportunities to leverage multiple synergistic approaches and encompassing a variety ofpromising research topics. In the course of our AQUA system development efforts, we will operate on multidimensionaldata, with a focus primarily on unstructured text, but also including structured data sources(including numerical and statistical sources), and metadata sources, especially in terms ofDARPA Agent Markup Language (DAML) metadata. As a third data dimension, the AQUAsystem will provide access to degraded text, such as that derived from closed-captioning videosources. The use of message board text, complete with its misspellings and ungrammatical andcolloquial forms, will provide text that is midway between “clean” and “degraded.” The SAIC Team is dedicated to making the AQUAINT Program a success. To that end, wehave brought together a talented, experienced staff, developed a unique and innovative solution,and defined a radical new research and development approach that will allow us to achieve thegoals of the Determining the answer technical area of AQUAINT for this first phase of theprogram. We look forward to the opportunity of developing the AQUA system for ARDA and ofworking with the other contractors and contractor teams not only through this first phase, butalso in succeeding phases of the AQUAINT Program. 2
  3. 3. AQUAINT Question & Answering System (AQUA) Science Applications International Corporation Stanford University Knowledge Systems Laboratory The key innovations the SAIC Team expects to contribute toward the AQUAINT Programgoals include the following: • Context-relevant search and retrieval removes the largest inefficiency in a question- answering system, and thus reduces the level of effort required by all other system components. • Using novel information search and retrieval techniques specially designed to handle large volumes of documents, ensures that the AQUA system will provide robust functionality and real-world practicality. • Novel source reliability assessment techniques restrict the use of incorrect information, allowing a broader and deeper overall understanding of the answer. Allowing unreliable but potentially information-rich sources gives the AQUA system the ability to extend beyond formally written documents and interface with informal text. • Conflict resolution automates logical consistency checking for data items extracted and retrieved from unstructured sources. • Context-aware KB partitioning means the AQUA system must only reference material within a single KB context, which significantly narrows the potential answer space that must be processed to determine the answer to a question. • New algorithms to reason across multiple partitioned KBs will improve the efficiency of query answering in partitioned KBs. Reasoning efficiently across multiple KB partitions means that query context can be more closely tracked. • Techniques to markedly improve and shorten explanation proof trees and the output of theorem provers will significantly increase overall system ease of use, believability, and usefulness to analysts. • The AQUA system’s use of generated markup with embedded semantic content increases the precision of search results. More accurate search of documents means returned documents are more relevant to the search intent and thus reduces the effort needed to determine the final answer. • The AQUA system in this Phase I effort will lay the groundwork for the future development of techniques offering reliable detection of misstatements in source documents. This will raise question-answering systems to a new threshold of achievement and provide a real-world capability that does not exist today. Not all documents and data sources are truthful, and if misstatements can be flagged early, it not only prevents corruption of KBs, but also sets the stage for advanced behavioral modeling and predictions of goals, actions, and intentions of untruthful sources. 3