Playing Trivia with a Bot

542 views

Published on

A short description of my "Watson" python bot that plays trivia on IRC - and wins!

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
542
On SlideShare
0
From Embeds
0
Number of Embeds
37
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Playing Trivia with a Bot

  1. 1. Playing Trivia with a Bot Jose Nazario <jose@monkey.org>
  2. 2. History ~2005? Created "#trivia" for my wife Uses Blitzed Trivia bot, "brainiac", and a 110k question/answer DB Winter 2012 had an interest in NLP for potential project Decided to tackle a "toy problem" "Let's play trivia!"
  3. 3. Goals Learn NLP via NLTK Build a bot that can play trivia "competitively"
  4. 4. Natural Language Processing (NLP) Algorithms that can parse and process human language Major field of study related to AI, useful in ● Machine translation ● Grammar induction ● Information extraction ● Sentence understanding
  5. 5. Challenges & Advantages Unlike Jeopardy ● Can answer question wrong and not get penalized, try multiple times ● No puns or wordplay, straightforward questions Still ... ● Have to have a knowledge base - Google ● Have to be able to figure out the right answer
  6. 6. Watson Components Simple IRC library (not irclib) NLTK - Natural Language Toolkit Logic Hand crafted
  7. 7. Base Assumptions "Google knows all" - no need to make a local knowledge database The right answer will be commonly seen, exploit that repetition
  8. 8. Watson 1.0 ~100 LoC, "an evening of futzing around" "Strategy" 1. Read the question 2. Throw it at Google, get a result page 3. Find all the proper names (via NLTK) from page titles, rank by frequency 4. Guess those sequentially
  9. 9. Watson 1.0 Results Very poor performance Not surprising
  10. 10. Watson 2.0 Written a few days later ~300 LoC, "actually had to think this time" Strategy ● Check a DB of cached questions and answers (from observations), use similar ones if possible ● Read question, throw at Google (or Bing) ● Figure out what kind of answer is expected, extract matching text via NLTK and scoring ● If we get a hint, use it (as a regex)
  11. 11. Extracting Answers from Web Pages Challenge Web pages contain a lot of junk around the answer How do we find what the answer in the sea of words? Simple strategy - extract proper names! (The trivia DB often has proper names for answers)
  12. 12. Where is Watson these days? 00:08 < brainiac> Congratulations to rogueclown who has won this round! What a brain! 00:08 < brainiac> Final scores: 00:08 < brainiac> rogueclown: 10 00:08 < brainiac> watson: 9 00:08 < brainiac> purge: 2 irc://coffee.ofdoom.org:6667/#trivia
  13. 13. Additional Ideas for Watson 2.0 New search engines Bing, Ask, Wolfram Alpha Prune knowledge base Weed out useless “answers” New/different named entity recognition engine Experiment with scoring algorithms for guesses
  14. 14. Disappointments Only a minor increase in my knowledge of NLP I did not become an NLP maestro No one else built a bot Was hoping for a competition
  15. 15. Watson 3 .. sorta in the works Ideas Natural language interface to semantic web (e.g. QuestIO, Quepy), SPARQL endpoints Wolfram Alpha-like UI, research prototypes available Teach the bot what kind of answer to look for Quantity, dates, names, etc Probabalistic programming? Marry answers with confidence
  16. 16. IBM Watson Links http://www.kurzweilai.net/how-watson-works-aconversation-with-eric-brown-ibm-researchmanager http://researcher.ibm. com/researcher/view_project.php?id=2099 (Special issue of IBM JR&D on Watson)

×