Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Question Answering System for Quiz Bowl


Published on

Presented at NAACL 2016 Workshop on Human-Computer Question Answering

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Question Answering System for Quiz Bowl

  1. 1. OUSIA Ikuya Yamada STUDIO OUSIA Question Answering System for Quiz Bowl
  2. 2. STUDIO OUSIA Our Background ‣ Semantic Kernel: Entity Linking System ✦ Winner of past two competitions: - NEEL Challenge @ WWW 2015 - W-NUT Shared Task #1 @ ACL 2015 ‣ OUSIA: Open Question Answering System ✦ #1 @ Human-Computer QA Shared Task @ NAACL 2016 ✦ #6 @ Kaggle’s Allen AI Science Challenge 2 Tokyo-based tech startup working on QA
  3. 3. STUDIO OUSIA Summary of the Task ‣ Given question sentences, the task is predict the answer that is implicitly described by the sentences ‣ Answers are Wikipedia entries ‣ The dataset of the shared task contains 20,693 questions and the corresponding answers 3 With%the%assistence%of%his%chief%minister,%the%Duc%de%Sully,%he%lowered% taxes%on%peasantry,%promoted%economic%recovery,%and%ins:tuted%a%tax%on% the%Paule<e.%Victor%at%Ivry%and%Arquet,%he%was%excluded%from%succession% by%the%Treaty%of%Nemours,%but%won%a%great%victory%at%Coutras. Henry%IV%of%France
  4. 4. STUDIO OUSIA System Components ‣ Two models are combined to answer questions: ✦ QB Ranker selects the most relevant answer from the answer candidates in the shared task’s dataset ✦ Wikipedia Ranker selects an answer from popular Wikipedia entries using word matching features 4
  5. 5. STUDIO OUSIA Overview ‣ Candidate answers are generated using top-n results of a search engine that contains Wikipedia articles of answer candidates ‣ Features are generated using two components: a convolutional neural network and an IR-based feature generator ‣ Random forest is used to learn the ranking function (assigning a point-wise relevance score to a candidate answer) ‣ The model reads the question sequentially and buzzes an answer if its relevance score surpasses a threshold 5 (Question, Candidate Answer) Convolutional Neural Network IR-based
 Feature Generator Random Forest Relevance Score
  6. 6. STUDIO OUSIA Our Neural Network Model ‣ Words and candidate answers are jointly mapped into a same vector space using skip-gram trained on Wikipedia text and anchors ‣ A question text is encoded into a vector q using a convolutional neural network [Kim 2014] with max pooling and dropouts ‣ Given a question vector q and a candidate answer vector a, the probability of the answer being correct is defined as the following [Yu 2014]: ‣ The model is trained on the dataset using Adam optimizer 6
  7. 7. STUDIO OUSIA Machine Learning Features ✦ CNN-based features: Features based on the estimated probability using the convolutional neural network model (10-fold cross validation is used for stacking) ✦ IR-based features: BM25 and TF-IDF-based matching scores between the question and the various texts (PPDB is used to improve these matching scores) - Past questions in the dataset - Candidate answer’s Wikipedia page, paragraphs, and sentences - Paragraphs and sentences that contain links to the candidate answer’s Wikipedia page ✦ Binary value representing if the question text contains the candidate answer ✦ #words, #sentences, etc. 7
  8. 8. STUDIO OUSIA Results ‣ The system is evaluated using 85 questions ‣ We successfully solved 64 questions (accuracy: 75.3%) ‣ The system is ranked #1 on the leaderboard 8