Your SlideShare is downloading. ×
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

From TREC to Watson: is open domain question answering a solved problem?

3,200

Published on

Invited talk at KEPT 2011, Cluj-Napoca, Romania discussing the current state-of-the-art in question answering.

Invited talk at KEPT 2011, Cluj-Napoca, Romania discussing the current state-of-the-art in question answering.

Published in: Education, Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,200
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
92
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ConstantinOrasan
    Research Group in Computational Linguistics,
    University of Wolverhampton, UK
    http://www.wlv.ac.uk/~in6093/
    From TREC to Watson: is open domain question answering a solved problem?
  • 2. Structure of the talk
    4 July 2011
    Constantin Orasan - KEPT 2011
    Brief introduction to QA
    Video 1: Where are we now – IBM Watson
    The structure of a QA system
    Video 2: Watson vs. humans
    Overview of Watson
    QA from the point of view of users/companies
    Conclusions
  • 3. Information overload
    4 July 2011
    Constantin Orasan - KEPT 2011
    “Getting information off the Internet is like taking a drink from a fire hydrant”
    Mitchell Kapor
  • 4. What is question answering?
    4 July 2011
    Constantin Orasan - KEPT 2011
    A way to address the problem of information overload
    Question answering aims at identifying the answer to a question posed in natural languagein a large collection of documents
    The information provided by QA is more focused than information retrieval
    The output can be the exact answer or a text snippet which contains the answer
    The domain took off as a result of the introduction of QA track in TREC, whilst cross-lingual QA as a result of CLEF
  • 5. Types of QA systems
    4 July 2011
    Constantin Orasan - KEPT 2011
    • open-domain QA systems: can answer any question from any collection
    + can potentially answer any question
    - very low accuracy (especially in cross-lingual settings)
    • canned QA systems: rely on a very large repository of questions for which the answer is known
    + very little language processing necessary
    - limited to the answers in the database
    • closed-domain QA systems: are built for very specific domains and exploit expert knowledge in them
    + very high accuracy
    - can require extensive language processing and limited to one domain
  • 6. Evolution of QA domain
    4 July 2011
    Constantin Orasan - KEPT 2011
    Early QA systems
    date as back as 1960s and were mainly front ends to databases
    had limited usability
    Open-domain QA
    emerged as a result of the increasing amount of data available
    to answer a question need to find and extract the answer
    developed last 1990s as a result of the QA track at Text REtrieval Conferences
    emphasis on factoid questions, but other types of questions were also explored
    CLEF competitions have encouraged development of cross-lingual systems.
  • 7. Where are we now?
    4 July 2011
    Constantin Orasan - KEPT 2011
    IBM and the Jeopardy Challenge
    Jeopardy! is an American quiz show where participants are given clues and need to guess the question (e.g. if the clue is The Father of Our Country; he didn't really chop down a cherry tree the contestant would respond Who is George Washington?)
    Watson is a QA system developed by IBM
    http://www.youtube.com/watch?v=FC3IryWr4c8
  • 8. Structure of an open domain QA system
    4 July 2011
    Constantin Orasan - KEPT 2011
    A typical open domain QA system consists of:
    Question processor
    Document processor
    Answer extractor (and validation)
    Can have components for cross-lingual processing
    Has access to several external resources
  • 9. Question processor
    4 July 2011
    Constantin Orasan - KEPT 2011
    Produces an interpretation of the question
    Determines the Question Type (e.g. factoid, definition, procedure, etc.)
    Determines the Expected Answer Type (EAT)
    On the basis of the question it produces a query
    Determines syntactic and semantic relations between the words from the questions
    Expands the query with synonyms
    May perform translation of the keywords in the query in the case of cross-lingual QA
  • 10. Expected answer type calculation
    4 July 2011
    Constantin Orasan - KEPT 2011
    Relies on the existence of an answer type taxonomy
    This taxonomy can be made open-domain by linking to general ontologies such as WordNet
    The EAT can be determined using rule-based as well as machine learning approaches
    Who is the president of Romania?
    Where is Paris?
    Knowledge of domain can greatly improve the identification of EAT and help deal with ambiguities
  • 11. Query formulation
    4 July 2011
    Constantin Orasan - KEPT 2011
    Produces a query from the question
    As a list of keywords
    As a list of phrases
    Identifies entities present in the question
    Produce variants of the query by introducing morphological, lexical and semantic variations
    Domain knowledge is
    very important for identification of entities and generation of valid variations and
    vital in cross-lingual scenarios
  • 12. Document processing
    4 July 2011
    Constantin Orasan - KEPT 2011
    Uses the query produced in the previous step to retrieve paragraphs which may contain the answer
    It is largely domain independent as it relies on text retrieval engines
    Ranks results, but this is largely independent of the QA task
    For limited collections of texts it is possible to enrich the index with various linguistic information which can help further processing
    When the domain is known, characteristics of the input files can improve the retrieval (e.g. presence of metadata)
  • 13. Answer extraction
    4 July 2011
    Constantin Orasan - KEPT 2011
    Uses a variety of techniques to identify the answer of a question
    The answer should have the type of EAT
    Very often rely on previously created patterns (e.g. When was the telephone invented? can be answered if there is a sentence that matches the pattern The telephone was invented in <date>),
    Many patterns can express the same answer (e.g. the telephone, invented in <date>)
    Relations identified in the question between the expected answer and entities from the question can be exploited by patterns
  • 14. Answer extraction (II)
    4 July 2011
    Constantin Orasan - KEPT 2011
    Potential answers are ranked according to functions which are usually learned from the data
    The ranking and validation of answers can be done using external sources such as the Internet
    QA for well defined domains can rely on better patterns
    The functions learned usually work well only on the type of data used for training
  • 15. Open domain QA - evaluation
    4 July 2011
    Constantin Orasan - KEPT 2011
    Great coverage, but low accuracy
    For example:
    EPHYRA QA system in TRAC 2007 reports an accuracy of 0.20 for factoid questions (Schlaefer et al. 2007)
    OpenEphyra was used for a cross-lingual Romanian – English QA system and we obtained 0.11 accuracy for factoid questions (Dornescu et al. 2008) – the best performing system for all cross-lingual QA tasks in CLEF 2008
    The results are not directly comparable (different QA engines, tuned differently, different collections, different tasks)
    But does it make sense to do open domain question answering?
  • 16. How did Watson perform?
    4 July 2011
    Constantin Orasan - KEPT 2011
    http://www.youtube.com/watch?v=Puhs2LuO3Zc
  • 17. How was this achieved?
    4 July 2011
    Constantin Orasan - KEPT 2011
    Starting point the Practical Intelligent Question Answering Technology (PIQUANT) developed by IBM to participate in TREC
    Has been under development at IBM for more than 6 years by a team of 4 full time researchers
    Was one of the top three to five in many TRECs
    PIQUANT was performing around 0.33 on the TREC data
    PIQUANT used a standard architecture for QA
  • 18. How was this achieved? (II)
    4 July 2011
    Constantin Orasan - KEPT 2011
    Lots of extra work was put in the system: a core team of 20 researchers working for almost 4 years
    PIQUANT system was enriched with a large number of modules for language processing
    The processing was parallelised heavily
    Lots of components were developed to deal with specific problems (lots of experts)
    Watson tries to combine deep and shallow knowledge
    Had access to large data sets and very good hardware
  • 19. Overview of Watson’s structure
    4 July 2011
    Constantin Orasan - KEPT 2011
  • 20. Hardware used
    4 July 2011
    Constantin Orasan - KEPT 2011
    Watson is a workload optimized system designed for complex analytics, made possible by integrating massively parallel POWER7 processors and the IBM DeepQA software to answer Jeopardy! questions in under three seconds. Watson is made up of a cluster of ninety IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM. Each Power 750 server uses a 3.5 GHz POWER7 eight core processor, with four threads per core. The POWER7 processor's massively parallel processing capability is an ideal match for Watson's IBM DeepQA software which is embarrassingly parallel (that is a workload that is easily split up into multiple parallel tasks).
    According to John Rennie, Watson can process 500 gigabytes, the equivalent of a million books, per second. IBM's master inventor and senior consultant Tony Pearson estimated Watson's hardware cost at about $3 million and with 80 TeraFLOPs would be placed 94th on the Top 500 Supercomputers list.
    From: http://en.wikipedia.org/wiki/Watson_(computer)
  • 21. Speed of answer
    4 July 2011
    Constantin Orasan - KEPT 2011
    In Jeopardy! an answer needs to be provided in 3-5 seconds
    In initial experiments with running Watson on a single processor an answer was obtained in about 2 hours
    The system was implemented using Apache UIMA Asynchronous Scaleout
    Massively parallel architecture
    Indexes used to answer the questions had to be pre-processed using Hadoop
  • 22. Watson was not only NLP
    4 July 2011
    Constantin Orasan - KEPT 2011
    Betting strategyhttp://www.youtube.com/watch?v=vA9aqAd2iso
  • 23. To sum up, Watson is:
    4 July 2011
    Constantin Orasan - KEPT 2011
    An amazing engineering project
    A massive investment
    Research in many domains of NLP
    A big PR stunt
    A way to improve the IBM position in text analytics
    But it is not really a technology ready to be deployed
    But was it a real progress in open-domain QA?
  • 24. So is open domain QA a solved problem?
    Can we really solve open domain QA?
    Do we really need open domain QA?
    Do we care?
  • 25. QA from user perspective
    • Real user questions
    • 26. Are rarely open domain
    • 27. Can rarely be formulated in one go
    • 28. Do not always contain answers from only one source
    • 29. Companies
    • 30. Have very well defined needs
    • 31. Have access to previously asked questions
    • 32. Need very high accuracy
    • 33. Most of them cannot afford to invest millions of dollars
  • The QALL-ME project
    Question Answering Learning technologies in a multiLingual and Multimodal Environment (QALL-ME) – FP6 funded project on Multilingual and Multimodal Question Answering
    FBK, Trento, Italy – coordinator
    University of Wolverhampton, UK
    DFKI, Germany
    University of Alicante, Spain
    Comdata, Italy
    Ubiest, Italy
    Waycom, Italy
    http://qallme.fbk.eu
    Has established an infrastructure for multilingual and multimodal question answering
  • 34. The QALL-ME project
    demonstrators in domain of tourism – can answer questions in the domain of cinema/movies and accommodation.
    E.g.
    What movies can I see in Wolverhampton this week?
    How can I get to Novotel Hotel, Wolverhampton?
    the questions can be asked in any of the four languages in the consortium
    small scale demonstrator built for Romanian
  • 35. QALL-ME framework
    4 July 2011
    Constantin Orasan - KEPT 2011
  • 36. The QALL-ME ontology
    4 July 2011
    Constantin Orasan - KEPT 2011
    All the reasoning and processing is done using a domain ontology
    The ontology also provides the means of achieving cross-lingual QA
    Determines the way data is stored in the database
    Ontologies need to be developed for each domain
  • 37. 30
    Part of the tourism ontology
  • 38. Evaluation of the QALL-ME prototype
    4 July 2011
    Constantin Orasan - KEPT 2011
    For the cinema domain the accuracy ranged between 60% to 85% depending on the language
    The system was tested on real questions posed by the users which were completely independent from the ones used to develop the system
    The error were mainly caused by wrongly identified named entities, missing patterns and mistakes of the entailment engine
    In an commercial environment this system can be revised every day in order to obtain much higher performance
  • 39. Closed domain QA for commercial companies
    4 July 2011
    Constantin Orasan - KEPT 2011
    Closed domain QA has a certain appeal with companies
    These companies normally have large databases of questions and answers from customers
    The domain can be very clearly defined
    In some cases the systems needed are actually canned QA systems
  • 40. Interactive QA
    4 July 2011
    Constantin Orasan - KEPT 2011
    It is easy to ask:
    Where can I eat paella tonight?
    but what about:
    What mobile phones are smart phones with a camera, have GPS, have touch screen, are produced by HTC or Nokia, but definitely not Samsung, have 1G internal memory ….
    Interactivity is needed to deal with ambiguities, wrong answers, user feedback, etc.
  • 41. Interactive QA
    4 July 2011
    Constantin Orasan - KEPT 2011
    “process where the user is a continual part of the information loop”
    At intersection of Dialogue Systems and Question Answering
    Initiate dialogue with the user in cases where there are too many or too few answers, or there is some ambiguity in the request
    Can suggest follow up questions
    Provide a more natural way of locating a needed information
    Companies are interested in developing IQA systems as a way for providing customer services
  • 42. Hypothetical IQA session
    [1] SYS: Hi!
    [2] USER: Can you find me a smart phone with a camera?
    [3] SYS: There are [NUMBER][big number] of smart phones featuring a camera, would you like to search for some [BRAND]?
    [4] USER: No, which have GPS?
    [5] SYS: [STILL BIG NUMBER] have GPS, are you interested in ones having TOUCH SCREEN?[Suggesting new constraints]
    [6] USER: Yes, it would be great.
    [7] SYS: [NUMBER] of Nokia phones, [NUMBER] HTC phones, [NUMBER] Samsung phones, [NUMBER] of other brands.
    [8] USER: Ok, what about HTC phones?
    [9] SYS: [NUMBER] of them have [feature], [NUMBER] of them have [feature].
    [10] USER: What are the ones with [feature]?
    [11] SYS: Here you are: [RESULTS].
    26/05/2011
    Knowledge acquisition from Wikipedia for IQA
    35
  • 43. Answers from more than one source
    4 July 2011
    Constantin Orasan - KEPT 2011
    Many complex questions need to compose the answer to a question from several sources:
    List questions: List all the cantons in Switzerland which border Germany
    Sentiment questions:What features people like in Vista?
    This is part of the new trend in “deep QA”
    Even though users probably really need such answers, the technology is still at the stage of research projects
  • 44. To sum up …
    4 July 2011
    Constantin Orasan - KEPT 2011
    Some researchers believe that search is dead and “deep QA” is the future
    This was largely fuelled by IBM’s Watson’s winning the Jeopardy!
    Watson is a fantastic QA system, but it does not solve the problem of open domain QA
    For real applications we still want to focus on very well defined domains
    We still want to have the user in the loop to facilitate asking questions
    Watson may have revived the interest in QA
  • 45. Watson is not always right
    4 July 2011
    Constantin Orasan - KEPT 2011
    but it kind of knows this ….
    http://www.youtube.com/watch?v=7h4baBEi0iA
  • 46. Thank you for your attention
    4 July 2011
    Constantin Orasan - KEPT 2011

×