• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Watson at RPI - Summer 2013
 

Watson at RPI - Summer 2013

on

  • 5,804 views

Implementing an open source version of the Watson program - with thanks to IBM and others.

Implementing an open source version of the Watson program - with thanks to IBM and others.

Statistics

Views

Total Views
5,804
Views on SlideShare
5,517
Embed Views
287

Actions

Likes
18
Downloads
0
Comments
1

9 Embeds 287

https://twitter.com 120
http://www.interface.ru 80
http://www.scoop.it 59
http://www.linkedin.com 14
http://kred.com 5
http://leaderboards.kred.com 3
http://www.kred.com 2
http://aws.w3db.us 2
https://translate.googleusercontent.com 2
More...

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • If the RPI version of Watson is open source, is it available somewhere?
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Our project differs from IBM’s Watson in that it draws from the internet each run to generate a question specific corpus.The reason we only take 5 documents is because of time it takes to parse these documents with the Stanford Parser. This is our main limiting factor, and as we parse
  • You can navigate HTML pages with jsoup very similar to jquery, selecting items by ID/class/property
  • Cache stores the URL and entire HTML body of the pagepassages are then used in different ways for Candidate Generation, Answer Scoring, and Supporting Evidence Retrieval
  • Online knowledgebase, stores information as triples.We use a language called SPARQL (SPARQL Protocol and RDF Query Language) to query DBPedia and a library called Jena to do this from our Java Application
  • Enough information to be confident that the entities we’re asking about will appear in dbpedia.Don’t have to check whether URL is valid if we already have the wikipedia URL.We can learn other names for people like, Louis 14 is the Sun King
  • Verifying we have the right length for the River Nile, correct birth date of Harrison Ford
  • The DeepQA architecture, which both IBM Watson and RPI MiniDeepQA implement, is a QA (Question Answering) system that answers questions by generating as many potential answers as is practical, then filtering them with multiple evidence scorers in parallel.

Watson at RPI - Summer 2013 Watson at RPI - Summer 2013 Presentation Transcript

  • TECHNICAL PROJECT REVIEW 14 TH AUGUST, 2013 WATSON @ RPI WATSON RESEARCH LAB PROFESSOR JIM HENDLER SIMON ELLIS KATE MCGUIRE  NICOLE NEGEDLY DILLON BURNS  MATT KLAWONN AVI WEINSTOCK
  • WATSON RPI Simon Ellis INTRODUCTION
  • ???IBM Watson
  • ???Watson is…  … a piece of software that will run on your laptop  Though very slowly  Specialised hardware and control platform  … an implementation of the DeepQA concept  … the first iteration of the „cognitive computing‟ platform  … a very clever artificial intelligence  A very clever application of human intelligence
  • ???Background  IBM agrees to give RPI a version of Watson  Watson team is set up to undertake summer research on the Watson system  Watson hardware/software configuration not ready at beginning of summer session  So what do we do with: 10 weeks, 5 undergraduates and 1 graduate…
  • ???Challenge accepted!  Build a new version of Watson  Based on research published in IBM J Res & Dev  With support and input from IBM Research  Use open source libraries wherever possible  Faster development  No IP issues  Turns out to be a very useful project  Trains team in the details of the operation of Watson system  Can be used in education, training, testing, evaluation
  • ???Sample output  Demo run of RPI version of Watson  Shows output representing most of the “pipeline”
  • ???Inside Watson Watson pipeline as published by IBM; see IBM J Res & Dev 56 (3/4), May/July 2012, p. 15:2
  • WATSON RPI Nicole Negedly QUESTION ANALYSIS
  • ???Question Analysis
  • ???Question Analysis  What is the question asking for?  What structured information can be determined from the unstructured text of the question?  Topics  Parsers  Syntactic and Semantic Analysis Tools  Focus and Lexical Answer Type Detection  Future Work
  • ???Parsing  Open-source parsers  Stanford Parser  Berkeley Parser  Functions  Determine grammatical structure of text  Parse trees, part-of-speech Tags, dependency relations
  • ???Coreference Resolution  What terms in the question refer to the same entity?
  • ???Named Entity Extraction  Identifies people, places, organizations, and time spans.
  • ???Focus and Lexical Answer Type POETS & POETRY He was a bank clerk in the Yukon before he published Songs of a Sourdough in 1907  Focus: “he”  LAT: “he”, “clerk”, “poet”
  • ???Future Work  Adding additional parsers to the system  Comparison of parser output  Relation extraction  Prolog code and database  Improved focus and LAT detection  Princeton WordNet
  • WATSON RPI Dillon Burns PRIMARY SEARCH
  • ???Primary Search
  • ???Primary Search & Corpus Generation  Primary search is used to generate our corpus of information from which to take candidate answers, passages, supporting evidence, and essentially all textual input to the system.  Search Wikipedia for the focus identified during the Question Analysis phase.  Grab first 5 documents returned back as corpus.  Uses Jsoup library to collect and parse HTML.
  • ???JSoup String[] results = {“/wiki/Snapple”,”/wiki/Dr_Pepper_Snapple_Group”,”/wiki/Snapple_Theater….”
  • ???JSoup String[] results = {“/wiki/Snapple”,”/wiki/Dr_Pepper_Snapple_Group”,”/wiki/Snapple_Theater….” To Cache
  • ???DBpedia
  • ???DBpedia  As of 2011 it had 3.64 million things categorized in its database  URLs are a direct map to Wikipedia‟s  Wikipedia redirect lists help with alternate names for entities and closely related concepts to certain entities or people
  • ???Future Directions  Use DBpedia to fact-check answers about entities in the database  Making use of the DBpedia subject matching
  • WATSON RPI Kate McGuire CANDIDATE GENERATION
  • ???Search Result Processing and Candidate Generation
  • ???Search Result Processing and Candidate Generation
  • ???Search Result Processing  Passage Retrieval  Watson: Indri and Lucene  Identifies each HTML sentence and adds both the HTML and the clean text to the passage type  Adds information about each passage  Passage Parsing  Forms parse trees for each individual sentence  Add an array of passages to each document <p><b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy" title="Doctor of Philosophy">Ph.D.</a> in physics from the <a href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts Institute of Technology">Massachusetts Institute of Technology</a> in 1973, becoming the first <a href="/wiki/African_American" title="African American">African American</a> woman to earn a doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a href="#cite_note-profile-2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref-AAAS_3-0" class="reference"><a href="#cite_note-AAAS-3"><span>[</span>3<span>]</span></a></sup></p> <div id="toc" class="toc"> <div id="toctitle"> <h2>Contents</h2> </div> <ul> <li class="toclevel-1 tocsection-1"><a href="#Early_life_and_schooling"><span class="tocnumber">1</span> <span class="toctext">Early life and schooling</span></a></li> <li class="toclevel-1 tocsection-2"><a href="#Career"><span class="tocnumber">2</span> <span class="toctext">Career</span></a> <ul> <li class="toclevel-2 tocsection-3"><a href="#Rensselaer_Polytechnic_Institute"><span class="tocnumber">2.1</span> <span class="toctext">Rensselaer Polytechnic Institute</span></a></li> </ul> </li> <li class="toclevel-1 tocsection-4"><a href="#Honors_and_distinctions"><span class="tocnumber">3</span> <span class="toctext">Honors and distinctions</span></a> <ul> <li class="toclevel-2 tocsection-5"><a href="#Boards_of_directors"><span class="tocnumber">3.1</span> <span class="toctext">Boards of directors</span></a></li> </ul> </li> <li class="toclevel-1 tocsection-6"><a href="#Personal"><span class="tocnumber">4</span> <span class="toctext">Personal</span></a></li> <li class="toclevel-1 tocsection-7"><a href="#References"><span class="tocnumber">5</span> <span class="toctext">References</span></a></li> <li class="toclevel-1 tocsection-8"><a href="#External_links"><span class="tocnumber">6</span> <span class="toctext">External links</span></a></li> </ul> </div> <h2><span class="mw-headline" id="Early_life_and_schooling">Early life and schooling</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=Shirley_Ann_Jackson&amp;action=edit&amp;section=1" title="Edit section: Early life and schooling">edit source</a><span class="mw-editsection-divider"> | </span><a href="/w/index.php?title=Shirley_Ann_Jackson&amp;veaction=edit&amp;section=1" title="Edit section: Early life and schooling" class="mw-editsection-visualeditor">edit</a><span class="mw-editsection-bracket">]</span></span></h2> <p>Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged her in school.<sup id="cite_ref-diaspora_4-0" class="reference"><a href="#cite_note-diaspora-4"><span>[</span>4<span>]</span></a></sup> Her father spurred on her interest in science by helping her with projects for her science classes. At Roosevelt High School, Jackson attended accelerated programs in both math and science, and graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1" class="reference"><a href="#cite_note-diaspora-4"><span>[</span>4<span>]</span></a></sup></p>
  • ???Search Result Processing  Passage Retrieval  Watson: Indri and Lucene  Identifies each HTML sentence and adds both the HTML and the clean text to the passage type  Adds information about each passage  Passage Parsing  Forms parse trees for each individual sentence  Add an array of passages to each document <p><b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy" title="Doctor of Philosophy">Ph.D.</a> in physics from the <a href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts Institute of Technology">Massachusetts Institute of Technology</a> in 1973, becoming the first <a href="/wiki/African_American" title="African American">African American</a> woman to earn a doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a href="#cite_note-profile- 2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref-AAAS_3-0" class="reference"><a href="#cite_note-AAAS- 3"><span>[</span>3<span>]</span></a></sup></p> <div id="toc" class="toc"> <div id="toctitle"> <h2>Contents</h2> </div> <ul> <li class="toclevel-1 tocsection-1"><a href="#Early_life_and_schooling"><span class="tocnumber">1</span> <span class="toctext">Early life and schooling</span></a></li> <li class="toclevel-1 tocsection-2"><a href="#Career"><span class="tocnumber">2</span> <span class="toctext">Career</span></a> <ul> <li class="toclevel-2 tocsection-3"><a href="#Rensselaer_Polytechnic_Institute"><span class="tocnumber">2.1</span> <span class="toctext">Rensselaer Polytechnic Institute</span></a></li> </ul> </li> <li class="toclevel-1 tocsection-4"><a href="#Honors_and_distinctions"><span class="tocnumber">3</span> <span class="toctext">Honors and distinctions</span></a> <ul> <li class="toclevel-2 tocsection-5"><a href="#Boards_of_directors"><span class="tocnumber">3.1</span> <span class="toctext">Boards of directors</span></a></li> </ul> </li> <li class="toclevel-1 tocsection-6"><a href="#Personal"><span class="tocnumber">4</span> <span class="toctext">Personal</span></a></li> <li class="toclevel-1 tocsection-7"><a href="#References"><span class="tocnumber">5</span> <span class="toctext">References</span></a></li> <li class="toclevel-1 tocsection-8"><a href="#External_links"><span class="tocnumber">6</span> <span class="toctext">External links</span></a></li> </ul> </div> <h2><span class="mw-headline" id="Early_life_and_schooling">Early life and schooling</span><span class="mw- editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=Shirley_Ann_Jackson&amp;action=edit&amp;section=1" title="Edit section: Early life and schooling">edit source</a><span class="mw-editsection-divider"> | </span><a href="/w/index.php?title=Shirley_Ann_Jackson&amp;veaction=edit&amp;section=1" title="Edit section: Early life and schooling" class="mw-editsection-visualeditor">edit</a><span class="mw-editsection-bracket">]</span></span></h2> <p>Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged her in school.<sup id="cite_ref-diaspora_4-0" class="reference"><a href="#cite_note-diaspora- 4"><span>[</span>4<span>]</span></a></sup> Her father spurred on her interest in science by helping her with projects for her science classes. At Roosevelt High School, Jackson attended accelerated programs in both math and science, and graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1" class="reference"><a href="#cite_note-diaspora- 4"><span>[</span>4<span>]</span></a></sup></p>
  • ???Search Result Processing  Passage Retrieval  Watson: Indri and Lucene  Identifies each HTML sentence and adds both the HTML and the clean text to the passage type  Adds information about each passage  Passage Parsing  Forms parse trees for each individual sentence  Add an array of passages to each document <b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy" title="Doctor of Philosophy">Ph.D.</a> in physics from the <a href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts Institute of Technology">Massachusetts Institute of Technology</a> in 1973, becoming the first <a href="/wiki/African_American" title="African American">African American</a> woman to earn a doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a href="#cite_note-profile- 2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref- AAAS_3-0" class="reference"><a href="#cite_note-AAAS- 3"><span>[</span>3<span>]</span></a></sup> <div id="toc" class="toc">">edit</a><span class="mw- edit]</span></span></h2> Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged her in school.<sup id="cite_ref-diaspora_4-0" class="reference"><a href="#cite_note- diaspora-4"><span>[</span>4<span>]</span></a></sup> Her father spurred on her interest in science by helping her with projects for her science classes. At Roosevelt High School, Jackson attended accelerated programs in both math and science, and graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1" class="reference"><a href="#cite_note-diaspora- 4"><span>[</span>4<span>]</span></a></sup>
  • ???Search Result Processing  Passage Retrieval  Watson: Indri and Lucene  Identifies each HTML sentence and adds both the HTML and the clean text to the passage type  Adds information about each passage  Passage Parsing  Forms parse trees for each individual sentence  Add an array of passages to each document <b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy" title="Doctor of Philosophy">Ph.D.</a> in physics from the <a href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts Institute of Technology">Massachusetts Institute of Technology</a> in 1973, becoming the first <a href="/wiki/African_American" title="African American">African American</a> woman to earn a doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a href="#cite_note-profile- 2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref- AAAS_3-0" class="reference"><a href="#cite_note-AAAS- 3"><span>[</span>3<span>]</span></a></sup>Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged her in school.<sup id="cite_ref- diaspora_4-0" class="reference"><a href="#cite_note-diaspora- 4"><span>[</span>4<span>]</span></a></sup> Her father spurred on her interest in science by helping her with projects for her science classes. At Roosevelt High School, Jackson attended accelerated programs in both math and science, and graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1" class="reference"><a href="#cite_note- diaspora-4"><span>[</span>4<span>]</span></a></sup></p>
  • ???Search Result Processing  Passage Retrieval  Watson: Indri and Lucene  Identifies each HTML sentence and adds both the HTML and the clean text to the passage type  Adds information about each passage  Passage Parsing  Forms parse trees for each individual sentence  Add an array of passages to each document <b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy" title="Doctor of Philosophy">Ph.D.</a> in physics from the <a href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts Institute of Technology">Massachusetts Institute of Technology</a> in 1973, becoming the first <a href="/wiki/African_American" title="African American">African American</a> woman to earn a doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a href="#cite_note-profile- 2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref- AAAS_3-0" class="reference"><a href="#cite_note-AAAS- 3"><span>[</span>3<span>]</span></a></sup>Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged her in school.<sup id="cite_ref- diaspora_4-0" class="reference"><a href="#cite_note-diaspora- 4"><span>[</span>4<span>]</span></a></sup> Her father spurred on her interest in science by helping her with projects for her science classes. At Roosevelt High School, Jackson attended accelerated programs in both math and science, and graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1" class="reference"><a href="#cite_note- diaspora-4"><span>[</span>4<span>]</span></a></sup></p>
  • ???Search Result Processing  Passage Retrieval  Watson: Indri and Lucene  Identifies each HTML sentence and adds both the HTML and the clean text to the passage type  Adds information about each passage  Passage Parsing  Forms parse trees for each individual sentence  Add an array of passages to each document Text: <b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>.
  • ???Search Result Processing  Passage Retrieval  Watson: Indri and Lucene  Identifies each HTML sentence and adds both the HTML and the clean text to the passage type  Adds information about each passage  Passage Parsing  Forms parse trees for each individual sentence  Add an array of passages to each document Text: <b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. Cleaned Text: <b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>.
  • ???Search Result Processing  Passage Retrieval  Watson: Indri and Lucene  Identifies each HTML sentence and adds both the HTML and the clean text to the passage type  Adds information about each passage  Passage Parsing  Forms parse trees for each individual sentence  Add an array of passages to each document Text: <b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. Cleaned Text: <b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>.
  • ???Search Result Processing  Passage Retrieval  Watson: Indri and Lucene  Identifies each HTML sentence and adds both the HTML and the clean text to the passage type  Adds information about each passage  Passage Parsing  Forms parse trees for each individual sentence  Add an array of passages to each document Text: <b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. Cleaned Text: <b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. Parse Tree: (ROOT (S (NP (NP (NNP ) (NNP Shirley) (NNP Ann) (NNP Jackson) (NNP )) (PRN (-LRB- -LRB-) (VP (VBN born) (NP (NNP August) (CD 5) (, ,) (CD 1946))) (-RRB- -RRB-))) (VP (VBZ is) (NP (NP (DT an) (JJ ) (NNP American) (NNP ) (NNP ) (NN physicist) (NNS )) (, ,) (CC and) (NP (NP (DT the) (JJ 18th) (NN president)) (PP (IN of) (NP (NNP ) (NNP Rensselaer) (NNP Polytechnic) (NNP Institute) (NNP )))))) (. .)))
  • ???Candidate Generation  Using each document, and the passages created by Search Result Processing, we generate candidates using three techniques: 1. Title of Document (T.O.D.): Adds the title of the document as a candidate. 2. Wikipedia Title Candidate Generation: Adds any noun phrases within the document‟s passage texts that are also the titles of Wikipedia articles. 3. Anchor Text Candidate Generation: Adds candidates based on the hyperlinks and metadata within the document.
  • ???Wikipedia Title Candidate Generation  Runs on the passage array from each search result.  Using the parse tree, retrieves all the noun phrases in each passage.  Checks if each Noun Phrase is the title of a Wikipedia Article  Adds the verified candidates along with an array of the passages that contained them Array of Passages Retrieving Noun Phrases Check against Previous Data Wikipedia URL Check Candidate and Containing Passages (ROOT (S (NP (NP (NNP ) (NNP Shirley) (NNP Ann) (NNP Jackson) (NNP )) (PRN (- LRB- -LRB-) (VP (VBN born) (NP (NNP August) (CD 5) (, ,) (CD 1946))) (-RRB- -RRB- ))) (VP (VBZ is) (NP (NP (DT an) (JJ ) (NNP American) (NNP ) (NNP ) (NN physicist) (NNS )) (, ,) (CC and) (NP (NP (DT the) (JJ 18th) (NN president)) (PP (IN of) (NP (NNP ) (NNP Rensselaer) (NNP Polytechnic) (NNP Institute) (NNP )))))) (. .))) b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>.
  • ???Wikipedia Title Candidate Generation  Runs on the passage array from each search result.  Using the parse tree, retrieves all the noun phrases in each passage.  Checks if each Noun Phrase is the title of a Wikipedia Article  Adds the verified candidates along with an array of the passages that contained them Array of Passages Retrieving Noun Phrases Check against Previous Data Wikipedia URL Check Candidate and Containing Passages (ROOT (S (NP (NP (NNP ) (NNP Shirley) (NNP Ann) (NNP Jackson) (NNP )) (PRN (- LRB- -LRB-) (VP (VBN born) (NP (NNP August) (CD 5) (, ,) (CD 1946))) (-RRB- -RRB- ))) (VP (VBZ is) (NP (NP (DT an) (JJ ) (NNP American) (NNP ) (NNP ) (NN physicist) (NNS )) (, ,) (CC and) (NP (NP (DT the) (JJ 18th) (NN president)) (PP (IN of) (NP (NNP ) (NNP Rensselaer) (NNP Polytechnic) (NNP Institute) (NNP )))))) (. .))) Shirley Ann Jackson Shirley Ann Jackson (born August 5, 1946) August 5, 1946 An American Physicist An American Physicist, and the 18th president of Rensselaer Polytechnic Institute The 18th president The 18th president of Rensselaer Polytechnic Institute Rensselaer Polytechnic Institute
  • ???Wikipedia Title Candidate Generation  Runs on the passage array from each search result.  Using the parse tree, retrieves all the noun phrases in each passage.  Checks if each Noun Phrase is the title of a Wikipedia Article  Adds the verified candidates along with an array of the passages that contained them Array of Passages Retrieving Noun Phrases Check against Previous Data Wikipedia URL Check Candidate and Containing Passages (ROOT (S (NP (NP (NNP ) (NNP Shirley) (NNP Ann) (NNP Jackson) (NNP )) (PRN (- LRB- -LRB-) (VP (VBN born) (NP (NNP August) (CD 5) (, ,) (CD 1946))) (-RRB- -RRB- ))) (VP (VBZ is) (NP (NP (DT an) (JJ ) (NNP American) (NNP ) (NNP ) (NN physicist) (NNS )) (, ,) (CC and) (NP (NP (DT the) (JJ 18th) (NN president)) (PP (IN of) (NP (NNP ) (NNP Rensselaer) (NNP Polytechnic) (NNP Institute) (NNP )))))) (. .))) Shirley Ann Jackson Shirley Ann Jackson (born August 5, 1946) August 5, 1946 An American Physicist An American Physicist, and the 18th president of Rensselaer Polytechnic Institute The 18th president The 18th president of Rensselaer Polytechnic Institute Rensselaer Polytechnic Institute
  • ???Wikipedia Title Candidate Generation  Runs on the passage array from each search result.  Using the parse tree, retrieves all the noun phrases in each passage.  Checks if each Noun Phrase is the title of a Wikipedia Article  Adds the verified candidates along with an array of the passages that contained them Array of Passages Retrieving Noun Phrases Check against Previous Data Wikipedia URL Check Candidate and Containing Passages http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search= Shirley+Ann+Jackson Shirley Ann Jackson Shirley Ann Jackson (born August 5, 1946) August 5, 1946 An American Physicist An American Physicist, and the 18th president of Rensselaer Polytechnic Institute The 18th president The 18th president of Rensselaer Polytechnic Institute Rensselaer Polytechnic Institute
  • ???Wikipedia Title Candidate Generation  Runs on the passage array from each search result.  Using the parse tree, retrieves all the noun phrases in each passage.  Checks if each Noun Phrase is the title of a Wikipedia Article  Adds the verified candidates along with an array of the passages that contained them Array of Passages Retrieving Noun Phrases Check against Previous Data Wikipedia URL Check Candidate and Containing Passages http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search= Shirley+Ann+Jackson Shirley Ann Jackson Shirley Ann Jackson (born August 5, 1946) August 5, 1946 An American Physicist An American Physicist, and the 18th president of Rensselaer Polytechnic Institute The 18th president The 18th president of Rensselaer Polytechnic Institute Rensselaer Polytechnic Institute
  • ???Anchor Text Candidate Generation  Runs on the passage array from each search result.  Checks for hyperlinks within the HTML text of each passage.  Adds the title of the hyperlinked article as a candidate  Adds each passage containing the candidate to an array <b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy" title="Doctor of Philosophy">Ph.D.</a> in physics from the <a href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts Institute of Technology">Massachusetts Institute of Technology</a> in 1973, becoming the first <a href="/wiki/African_American" title="African American">African American</a> woman to earn a doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a href="#cite_note-profile- 2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref- AAAS_3-0" class="reference"><a href="#cite_note-AAAS- 3"><span>[</span>3<span>]</span></a></sup>Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged her in school.<sup id="cite_ref-diaspora_4-0" class="reference"><a href="#cite_note- diaspora-4"><span>[</span>4<span>]</span></a></sup> Her father spurred on her interest in science by helping her with projects for her science classes. At Roosevelt High School, Jackson attended accelerated programs in both math and science, and graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1" class="reference"><a href="#cite_note-diaspora- 4"><span>[</span>4<span>]</span></a></sup>
  • ???Anchor Text Candidate Generation  Runs on the passage array from each search result.  Checks for hyperlinks within the HTML text of each passage.  Adds the title of the hyperlinked article as a candidate  Adds each passage containing the candidate to an array <b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy" title="Doctor of Philosophy">Ph.D.</a> in physics from the <a href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts Institute of Technology">Massachusetts Institute of Technology</a> in 1973, becoming the first <a href="/wiki/African_American" title="African American">African American</a> woman to earn a doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a href="#cite_note-profile- 2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref- AAAS_3-0" class="reference"><a href="#cite_note-AAAS- 3"><span>[</span>3<span>]</span></a></sup>Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged her in school.<sup id="cite_ref-diaspora_4-0" class="reference"><a href="#cite_note- diaspora-4"><span>[</span>4<span>]</span></a></sup> Her father spurred on her interest in science by helping her with projects for her science classes. At Roosevelt High School, Jackson attended accelerated programs in both math and science, and graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1" class="reference"><a href="#cite_note-diaspora- 4"><span>[</span>4<span>]</span></a></sup>
  • ???Search Result Processing <p><b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy" title="Doctor of Philosophy">Ph.D.</a> in physics from the <a href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts Institute of Technology">Massachusetts Institute of Technology</a> in 1973, becoming the first <a href="/wiki/African_American" title="African American">African American</a> woman to earn a doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a href="#cite_note-profile-2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref- AAAS_3-0" class="reference"><a href="#cite_note-AAAS-3"><span>[</span>3<span>]</span></a></sup></p> <div id="toc" class="toc"> <div id="toctitle"> <h2>Contents</h2> </div> <ul> <li class="toclevel-1 tocsection-1"><a href="#Early_life_and_schooling"><span class="tocnumber">1</span> <span class="toctext">Early life and schooling</span></a></li> <li class="toclevel-1 tocsection-2"><a href="#Career"><span class="tocnumber">2</span> <span class="toctext">Career</span></a> <ul> <li class="toclevel-2 tocsection-3"><a href="#Rensselaer_Polytechnic_Institute"><span class="tocnumber">2.1</span> <span class="toctext">Rensselaer Polytechnic Institute</span></a></li> </ul> </li> <li class="toclevel-1 tocsection-4"><a href="#Honors_and_distinctions"><span class="tocnumber">3</span> <span class="toctext">Honors and distinctions</span></a> <ul> <li class="toclevel-2 tocsection-5"><a href="#Boards_of_directors"><span class="tocnumber">3.1</span> <span class="toctext">Boards of directors</span></a></li> </ul> </li> <li class="toclevel-1 tocsection-6"><a href="#Personal"><span class="tocnumber">4</span> <span class="toctext">Personal</span></a></li> <li class="toclevel-1 tocsection-7"><a href="#References"><span class="tocnumber">5</span> <span class="toctext">References</span></a></li> <li class="toclevel-1 tocsection-8"><a href="#External_links"><span class="tocnumber">6</span> <span class="toctext">External links</span></a></li> </ul> </div> <h2><span class="mw-headline" id="Early_life_and_schooling">Early life and schooling</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=Shirley_Ann_Jackson&amp;action=edit&amp;section=1" title="Edit section: Early life and schooling">edit source</a><span class="mw-editsection-divider"> | </span><a href="/w/index.php?title=Shirley_Ann_Jackson&amp;veaction=edit&amp;section=1" title="Edit section: Early life and schooling" class="mw-editsection- visualeditor">edit</a><span class="mw-editsection-bracket">]</span></span></h2> <p>Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged her in school.<sup id="cite_ref-diaspora_4-0" class="reference"><a href="#cite_note-diaspora-4"><span>[</span>4<span>]</span></a></sup> Her father spurred on her interest in science by helping her with projects for her science classes. At Roosevelt High School, Jackson attended accelerated programs in both math and science, and graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1" class="reference"><a href="#cite_note-diaspora-4"><span>[</span>4<span>]</span></a></sup></p>
  • ???Search Result Processing <p><b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy" title="Doctor of Philosophy">Ph.D.</a> in physics from the <a href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts Institute of Technology">Massachusetts Institute of Technology</a> in 1973, becoming the first <a href="/wiki/African_American" title="African American">African American</a> woman to earn a doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a href="#cite_note-profile-2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref- AAAS_3-0" class="reference"><a href="#cite_note-AAAS-3"><span>[</span>3<span>]</span></a></sup></p> <div id="toc" class="toc"> <div id="toctitle"> <h2>Contents</h2> </div> <ul> <li class="toclevel-1 tocsection-1"><a href="#Early_life_and_schooling"><span class="tocnumber">1</span> <span class="toctext">Early life and schooling</span></a></li> <li class="toclevel-1 tocsection-2"><a href="#Career"><span class="tocnumber">2</span> <span class="toctext">Career</span></a> <ul> <li class="toclevel-2 tocsection-3"><a href="#Rensselaer_Polytechnic_Institute"><span class="tocnumber">2.1</span> <span class="toctext">Rensselaer Polytechnic Institute</span></a></li> </ul> </li> <li class="toclevel-1 tocsection-4"><a href="#Honors_and_distinctions"><span class="tocnumber">3</span> <span class="toctext">Honors and distinctions</span></a> <ul> <li class="toclevel-2 tocsection-5"><a href="#Boards_of_directors"><span class="tocnumber">3.1</span> <span class="toctext">Boards of directors</span></a></li> </ul> </li> <li class="toclevel-1 tocsection-6"><a href="#Personal"><span class="tocnumber">4</span> <span class="toctext">Personal</span></a></li> <li class="toclevel-1 tocsection-7"><a href="#References"><span class="tocnumber">5</span> <span class="toctext">References</span></a></li> <li class="toclevel-1 tocsection-8"><a href="#External_links"><span class="tocnumber">6</span> <span class="toctext">External links</span></a></li> </ul> </div> <h2><span class="mw-headline" id="Early_life_and_schooling">Early life and schooling</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=Shirley_Ann_Jackson&amp;action=edit&amp;section=1" title="Edit section: Early life and schooling">edit source</a><span class="mw-editsection-divider"> | </span><a href="/w/index.php?title=Shirley_Ann_Jackson&amp;veaction=edit&amp;section=1" title="Edit section: Early life and schooling" class="mw-editsection- visualeditor">edit</a><span class="mw-editsection-bracket">]</span></span></h2> <p>Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged her in school.<sup id="cite_ref-diaspora_4-0" class="reference"><a href="#cite_note-diaspora-4"><span>[</span>4<span>]</span></a></sup> Her father spurred on her interest in science by helping her with projects for her science classes. At Roosevelt High School, Jackson attended accelerated programs in both math and science, and graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1" class="reference"><a href="#cite_note-diaspora-4"><span>[</span>4<span>]</span></a></sup></p>
  • ???Future Work  Search Result Processing  Improve methods of imitating Indri or Lucene passage retrieval without a corpus.  Create a passage score and rank.  Candidate Generation  Continue to improve the speed and quality of Candidate Generation  Research and implement Candidate Generation from Structured Sources (Prismatic, Answer Lookup)  Record and measure recall in comparison with Watson and other Question answering software.
  • WATSON RPI Matt Klawonn SCORING & RANKING
  • ???Scoring & Ranking
  • ???Differentiating between answers  Making sense of candidates  Filtering  Supporting Evidence Retrieval (SER)  Scoring (passage-based)
  • ???Scorers  Passage Term Match  Textual Alignment  Skip-Bigram  Each of these scores supportive evidence  These scores are then merged to produce a single candidate score
  • ???Passage Term Search  Question Terms Extracted  Passage is searched for those terms  Score calculated for that passage  Done per passage  “Where is Toronto?” “Where” “is” “Toronto”  “Toronto is in Southern Ontario” “Toronto is ”  Score = IDF(Toronto) + IDF(is)
  • ???Textual Alignment  Finds an optimal alignment of a question and a passage  Assigns “partial credit” for close matches  “Who is the President of RPI?”  Shirley Ann Jackson is the President of RPI.  Who is the President of RPI.
  • ???Skip-Bigram  Constructs a graph  Nodes represent terms (syntactic objects)  Edges represent relations  Extracts skip-bigrams  A skip-bigram is a pair of nodes either directly connected or which have only one intermediate node  Skip-bigrams represent close relationships between terms  Scores based on number of common skip-bigrams
  • ???Example  Who authored “The Good Earth”?  “Pearl Buck, author of the good earth…”
  • ???Future Directions  More algorithms  Logical form answer candidate scoring  Improved Type Coercion scoring  Begin implementing machine learning  Temporal/Spatial reasoning
  • WATSON RPI Avi Weinstock UIMA PIPELINE
  • ???UIMA  The DeepQA architecture is built on top of another architecture, UIMA (Unstructured Information Management Architecture).  A UIMA CAS (Common Analysis Structure) contains a contiguous block of data (normally text), and annotations, which contain start & end indexes into the data, and additional data (strings, integers, doubles, arrays, annotation references).
  • ???UIMA  CAS Multipliers output multiple CASes based on the data in the input CAS; this facilitates parallelization, which is the key to Watson‟s response time.
  • ???DeepQA Architecture  The DeepQA architecture, which both IBM Watson and RPI MiniDeepQA implement, is a QA (Question Answering) system that answers questions by generating as many potential answers as is practical, then filtering them with multiple evidence scorers in parallel.
  • ???Data cache  IBM Watson has a pre-processed corpus of information, generated automatically by a subset of the DeepQA pipeline from an enormous volume of raw text, which the remainder of the pipeline uses at question time.  As our system retrieves information from the internet on a per-question basis, it cannot (practically) process the whole corpus in advance.
  • ???Data cache  Since parsing the documents takes a large amount of time, in order to test/demonstrate the system, it is beneficial to store webpages and associated parses locally. This allows a question that has been asked before, and candidates that come up for multiple questions, to be processed faster.  As a side-benefit of the caching, if a website is temporarily down, its data can still be used (if it was not down at some point in the past).
  • ???Graphical User Interface  Towards the start of the project, we ran our system using the Document Analyzer (a UIMA-provided tool).  While it was useful, once we had the entire pipeline set up, testing the full system required more input than necessary.  Additionally, there wasn't a convenient way to display just the intended output, nor intermediate output at a level suitable for monitoring progress/giving demonstrations.
  • ???Graphical User Interface  The GUI addresses these concerns, and has additionally been extensively tweaked to be demonstration-friendly.