From TREC to Watson: is open domain question answering a solved problem?

ConstantinOrasanResearch Group in Computational Linguistics,University of Wolverhampton, UKhttp://www.wlv.ac.uk/~in6093/From TREC to Watson: is open domain question answering a solved problem?

Structure of the talk4 July 2011Constantin Orasan - KEPT 2011Brief introduction to QAVideo 1: Where are we now – IBM WatsonThe structure of a QA systemVideo 2: Watson vs. humansOverview of WatsonQA from the point of view of users/companiesConclusions

Information overload4 July 2011Constantin Orasan - KEPT 2011“Getting information off the Internet is like taking a drink from a fire hydrant”Mitchell Kapor

What is question answering?4 July 2011Constantin Orasan - KEPT 2011A way to address the problem of information overloadQuestion answering aims at identifying the answer to a question posed in natural languagein a large collection of documentsThe information provided by QA is more focused than information retrievalThe output can be the exact answer or a text snippet which contains the answerThe domain took off as a result of the introduction of QA track in TREC, whilst cross-lingual QA as a result of CLEF

Types of QA systems4 July 2011Constantin Orasan - KEPT 2011open-domain QA systems: can answer any question from any collection+ can potentially answer any question- very low accuracy (especially in cross-lingual settings)canned QA systems: rely on a very large repository of questions for which the answer is known+ very little language processing necessary- limited to the answers in the databaseclosed-domain QA systems: are built for very specific domains and exploit expert knowledge in them+ very high accuracy- can require extensive language processing and limited to one domain

Evolution of QA domain4 July 2011Constantin Orasan - KEPT 2011Early QA systems date as back as 1960s and were mainly front ends to databaseshad limited usability Open-domain QA emerged as a result of the increasing amount of data availableto answer a question need to find and extract the answerdeveloped last 1990s as a result of the QA track at Text REtrieval Conferencesemphasis on factoid questions, but other types of questions were also exploredCLEF competitions have encouraged development of cross-lingual systems.

Where are we now?4 July 2011Constantin Orasan - KEPT 2011IBM and the Jeopardy ChallengeJeopardy! is an American quiz show where participants are given clues and need to guess the question (e.g. if the clue is The Father of Our Country; he didn't really chop down a cherry tree the contestant would respond Who is George Washington?)Watson is a QA system developed by IBMhttp://www.youtube.com/watch?v=FC3IryWr4c8

Structure of an open domain QA system4 July 2011Constantin Orasan - KEPT 2011A typical open domain QA system consists of:Question processorDocument processorAnswer extractor (and validation)Can have components for cross-lingual processingHas access to several external resources

Question processor4 July 2011Constantin Orasan - KEPT 2011Produces an interpretation of the questionDetermines the Question Type (e.g. factoid, definition, procedure, etc.)Determines the Expected Answer Type (EAT)On the basis of the question it produces a queryDetermines syntactic and semantic relations between the words from the questionsExpands the query with synonymsMay perform translation of the keywords in the query in the case of cross-lingual QA

Expected answer type calculation4 July 2011Constantin Orasan - KEPT 2011Relies on the existence of an answer type taxonomyThis taxonomy can be made open-domain by linking to general ontologies such as WordNetThe EAT can be determined using rule-based as well as machine learning approachesWho is the president of Romania?Where is Paris?Knowledge of domain can greatly improve the identification of EAT and help deal with ambiguities

Query formulation4 July 2011Constantin Orasan - KEPT 2011Produces a query from the questionAs a list of keywordsAs a list of phrasesIdentifies entities present in the questionProduce variants of the query by introducing morphological, lexical and semantic variationsDomain knowledge is very important for identification of entities and generation of valid variations andvital in cross-lingual scenarios

Document processing4 July 2011Constantin Orasan - KEPT 2011Uses the query produced in the previous step to retrieve paragraphs which may contain the answerIt is largely domain independent as it relies on text retrieval enginesRanks results, but this is largely independent of the QA taskFor limited collections of texts it is possible to enrich the index with various linguistic information which can help further processingWhen the domain is known, characteristics of the input files can improve the retrieval (e.g. presence of metadata)

Answer extraction4 July 2011Constantin Orasan - KEPT 2011Uses a variety of techniques to identify the answer of a questionThe answer should have the type of EATVery often rely on previously created patterns (e.g. When was the telephone invented? can be answered if there is a sentence that matches the pattern The telephone was invented in <date>), Many patterns can express the same answer (e.g. the telephone, invented in <date>)Relations identified in the question between the expected answer and entities from the question can be exploited by patterns

Answer extraction (II)4 July 2011Constantin Orasan - KEPT 2011Potential answers are ranked according to functions which are usually learned from the dataThe ranking and validation of answers can be done using external sources such as the InternetQA for well defined domains can rely on better patternsThe functions learned usually work well only on the type of data used for training

Open domain QA - evaluation4 July 2011Constantin Orasan - KEPT 2011Great coverage, but low accuracyFor example:EPHYRA QA system in TRAC 2007 reports an accuracy of 0.20 for factoid questions (Schlaefer et al. 2007)OpenEphyra was used for a cross-lingual Romanian – English QA system and we obtained 0.11 accuracy for factoid questions (Dornescu et al. 2008) – the best performing system for all cross-lingual QA tasks in CLEF 2008The results are not directly comparable (different QA engines, tuned differently, different collections, different tasks)But does it make sense to do open domain question answering?

How did Watson perform?4 July 2011Constantin Orasan - KEPT 2011http://www.youtube.com/watch?v=Puhs2LuO3Zc

How was this achieved?4 July 2011Constantin Orasan - KEPT 2011Starting point the Practical Intelligent Question Answering Technology (PIQUANT) developed by IBM to participate in TRECHas been under development at IBM for more than 6 years by a team of 4 full time researchersWas one of the top three to five in many TRECsPIQUANT was performing around 0.33 on the TREC dataPIQUANT used a standard architecture for QA

How was this achieved? (II)4 July 2011Constantin Orasan - KEPT 2011Lots of extra work was put in the system: a core team of 20 researchers working for almost 4 yearsPIQUANT system was enriched with a large number of modules for language processingThe processing was parallelised heavilyLots of components were developed to deal with specific problems (lots of experts)Watson tries to combine deep and shallow knowledgeHad access to large data sets and very good hardware

Overview of Watson’s structure4 July 2011Constantin Orasan - KEPT 2011

Hardware used4 July 2011Constantin Orasan - KEPT 2011Watson is a workload optimized system designed for complex analytics, made possible by integrating massively parallel POWER7 processors and the IBM DeepQA software to answer Jeopardy! questions in under three seconds. Watson is made up of a cluster of ninety IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM. Each Power 750 server uses a 3.5 GHz POWER7 eight core processor, with four threads per core. The POWER7 processor's massively parallel processing capability is an ideal match for Watson's IBM DeepQA software which is embarrassingly parallel (that is a workload that is easily split up into multiple parallel tasks).According to John Rennie, Watson can process 500 gigabytes, the equivalent of a million books, per second. IBM's master inventor and senior consultant Tony Pearson estimated Watson's hardware cost at about $3 million and with 80 TeraFLOPs would be placed 94th on the Top 500 Supercomputers list.From: http://en.wikipedia.org/wiki/Watson_(computer)

Speed of answer4 July 2011Constantin Orasan - KEPT 2011In Jeopardy! an answer needs to be provided in 3-5 secondsIn initial experiments with running Watson on a single processor an answer was obtained in about 2 hoursThe system was implemented using Apache UIMA Asynchronous ScaleoutMassively parallel architectureIndexes used to answer the questions had to be pre-processed using Hadoop

Watson was not only NLP4 July 2011Constantin Orasan - KEPT 2011Betting strategyhttp://www.youtube.com/watch?v=vA9aqAd2iso

To sum up, Watson is:4 July 2011Constantin Orasan - KEPT 2011An amazing engineering projectA massive investmentResearch in many domains of NLPA big PR stuntA way to improve the IBM position in text analyticsBut it is not really a technology ready to be deployedBut was it a real progress in open-domain QA?

So is open domain QA a solved problem?Can we really solve open domain QA?Do we really need open domain QA?Do we care?

QA from user perspectiveReal user questions

Can rarely be formulated in one go

Do not always contain answers from only one source

Have access to previously asked questions

Most of them cannot afford to invest millions of dollars The QALL-ME projectQuestion Answering Learning technologies in a multiLingual and Multimodal Environment (QALL-ME) – FP6 funded project on Multilingual and Multimodal Question AnsweringFBK, Trento, Italy – coordinatorUniversity of Wolverhampton, UKDFKI, GermanyUniversity of Alicante, SpainComdata, ItalyUbiest, ItalyWaycom, Italyhttp://qallme.fbk.euHas established an infrastructure for multilingual and multimodal question answering

The QALL-ME projectdemonstrators in domain of tourism – can answer questions in the domain of cinema/movies and accommodation. E.g. What movies can I see in Wolverhampton this week? How can I get to Novotel Hotel, Wolverhampton?the questions can be asked in any of the four languages in the consortiumsmall scale demonstrator built for Romanian

QALL-ME framework4 July 2011Constantin Orasan - KEPT 2011

The QALL-ME ontology4 July 2011Constantin Orasan - KEPT 2011All the reasoning and processing is done using a domain ontologyThe ontology also provides the means of achieving cross-lingual QADetermines the way data is stored in the databaseOntologies need to be developed for each domain

30Part of the tourism ontology

Evaluation of the QALL-ME prototype4 July 2011Constantin Orasan - KEPT 2011For the cinema domain the accuracy ranged between 60% to 85% depending on the languageThe system was tested on real questions posed by the users which were completely independent from the ones used to develop the systemThe error were mainly caused by wrongly identified named entities, missing patterns and mistakes of the entailment engineIn an commercial environment this system can be revised every day in order to obtain much higher performance

Closed domain QA for commercial companies4 July 2011Constantin Orasan - KEPT 2011Closed domain QA has a certain appeal with companiesThese companies normally have large databases of questions and answers from customersThe domain can be very clearly definedIn some cases the systems needed are actually canned QA systems

From TREC to Watson: is open domain question answering a solved problem?

More Related Content

What's hot

Viewers also liked

Similar to From TREC to Watson: is open domain question answering a solved problem?

More from Constantin Orasan

Recently uploaded

From TREC to Watson: is open domain question answering a solved problem?