1. TECHNICAL PROJECT REVIEW
14
TH
AUGUST, 2013
WATSON
@ RPI
WATSON RESEARCH LAB
PROFESSOR JIM HENDLER
SIMON ELLIS
KATE MCGUIRE NICOLE NEGEDLY
DILLON BURNS MATT KLAWONN
AVI WEINSTOCK
4. ???Watson is…
… a piece of software that will run on your laptop
Though very slowly
Specialised hardware and control platform
… an implementation of the DeepQA concept
… the first iteration of the „cognitive computing‟ platform
… a very clever artificial intelligence
A very clever application of human intelligence
5. ???Background
IBM agrees to give RPI a version of Watson
Watson team is set up to undertake summer research on
the Watson system
Watson hardware/software configuration not ready at
beginning of summer session
So what do we do with:
10 weeks, 5 undergraduates and 1 graduate…
6. ???Challenge accepted!
Build a new version of Watson
Based on research published in IBM J Res & Dev
With support and input from IBM Research
Use open source libraries wherever possible
Faster development
No IP issues
Turns out to be a very useful project
Trains team in the details of the operation of Watson system
Can be used in education, training, testing, evaluation
7. ???Sample output
Demo run of RPI
version of Watson
Shows output
representing most of
the “pipeline”
11. ???Question Analysis
What is the question asking for?
What structured information can be determined from the
unstructured text of the question?
Topics
Parsers
Syntactic and Semantic Analysis Tools
Focus and Lexical Answer Type Detection
Future Work
15. ???Focus and Lexical Answer Type
POETS & POETRY
He was a bank clerk in the Yukon before
he published Songs of a Sourdough in 1907
Focus: “he”
LAT: “he”, “clerk”, “poet”
16. ???Future Work
Adding additional parsers to the system
Comparison of parser output
Relation extraction
Prolog code and database
Improved focus and LAT detection
Princeton WordNet
19. ???Primary Search & Corpus
Generation
Primary search is used to generate our corpus of
information from which to take candidate answers,
passages, supporting evidence, and essentially all textual
input to the system.
Search Wikipedia for the focus identified during the
Question Analysis phase.
Grab first 5 documents returned back as corpus.
Uses Jsoup library to collect and parse HTML.
23. ???DBpedia
As of 2011 it had 3.64 million things categorized in its
database
URLs are a direct map to Wikipedia‟s
Wikipedia redirect lists
help with alternate
names for entities and
closely related concepts
to certain entities or
people
24. ???Future Directions
Use DBpedia to fact-check answers about entities in the
database
Making use of the DBpedia subject matching
28. ???Search Result Processing
Passage Retrieval
Watson: Indri and
Lucene
Identifies each HTML
sentence and adds both
the HTML and the clean
text to the passage type
Adds information about
each passage
Passage Parsing
Forms parse trees for
each individual
sentence
Add an array of
passages to each
document
<p><b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute"
title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy"
title="Doctor of Philosophy">Ph.D.</a> in physics from the <a href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts
Institute of Technology">Massachusetts Institute of Technology</a> in 1973, becoming the first <a href="/wiki/African_American"
title="African American">African American</a> woman to earn a doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0"
class="reference"><a href="#cite_note-profile-2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref-AAAS_3-0"
class="reference"><a href="#cite_note-AAAS-3"><span>[</span>3<span>]</span></a></sup></p>
<div id="toc" class="toc">
<div id="toctitle">
<h2>Contents</h2>
</div>
<ul>
<li class="toclevel-1 tocsection-1"><a href="#Early_life_and_schooling"><span class="tocnumber">1</span> <span class="toctext">Early
life and schooling</span></a></li>
<li class="toclevel-1 tocsection-2"><a href="#Career"><span class="tocnumber">2</span> <span class="toctext">Career</span></a>
<ul>
<li class="toclevel-2 tocsection-3"><a href="#Rensselaer_Polytechnic_Institute"><span class="tocnumber">2.1</span> <span
class="toctext">Rensselaer Polytechnic Institute</span></a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-4"><a href="#Honors_and_distinctions"><span class="tocnumber">3</span> <span
class="toctext">Honors and distinctions</span></a>
<ul>
<li class="toclevel-2 tocsection-5"><a href="#Boards_of_directors"><span class="tocnumber">3.1</span> <span class="toctext">Boards
of directors</span></a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-6"><a href="#Personal"><span class="tocnumber">4</span> <span
class="toctext">Personal</span></a></li>
<li class="toclevel-1 tocsection-7"><a href="#References"><span class="tocnumber">5</span> <span
class="toctext">References</span></a></li>
<li class="toclevel-1 tocsection-8"><a href="#External_links"><span class="tocnumber">6</span> <span class="toctext">External
links</span></a></li>
</ul>
</div>
<h2><span class="mw-headline" id="Early_life_and_schooling">Early life and schooling</span><span class="mw-editsection"><span
class="mw-editsection-bracket">[</span><a href="/w/index.php?title=Shirley_Ann_Jackson&action=edit&section=1" title="Edit
section: Early life and schooling">edit source</a><span class="mw-editsection-divider"> | </span><a
href="/w/index.php?title=Shirley_Ann_Jackson&veaction=edit&section=1" title="Edit section: Early life and schooling"
class="mw-editsection-visualeditor">edit</a><span class="mw-editsection-bracket">]</span></span></h2>
<p>Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged her in
school.<sup id="cite_ref-diaspora_4-0" class="reference"><a href="#cite_note-diaspora-4"><span>[</span>4<span>]</span></a></sup>
Her father spurred on her interest in science by helping her with projects for her science classes. At Roosevelt High School, Jackson
attended accelerated programs in both math and science, and graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1"
class="reference"><a href="#cite_note-diaspora-4"><span>[</span>4<span>]</span></a></sup></p>
29. ???Search Result Processing
Passage Retrieval
Watson: Indri and
Lucene
Identifies each HTML
sentence and adds both
the HTML and the clean
text to the passage type
Adds information about
each passage
Passage Parsing
Forms parse trees for
each individual
sentence
Add an array of
passages to each
document
<p><b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute"
title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a
href="/wiki/Doctor_of_Philosophy" title="Doctor of Philosophy">Ph.D.</a> in physics from the <a
href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts Institute of Technology">Massachusetts Institute of
Technology</a> in 1973, becoming the first <a href="/wiki/African_American" title="African American">African American</a>
woman to earn a doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a href="#cite_note-profile-
2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref-AAAS_3-0" class="reference"><a href="#cite_note-AAAS-
3"><span>[</span>3<span>]</span></a></sup></p>
<div id="toc" class="toc">
<div id="toctitle">
<h2>Contents</h2>
</div>
<ul>
<li class="toclevel-1 tocsection-1"><a href="#Early_life_and_schooling"><span class="tocnumber">1</span> <span
class="toctext">Early life and schooling</span></a></li>
<li class="toclevel-1 tocsection-2"><a href="#Career"><span class="tocnumber">2</span> <span
class="toctext">Career</span></a>
<ul>
<li class="toclevel-2 tocsection-3"><a href="#Rensselaer_Polytechnic_Institute"><span class="tocnumber">2.1</span> <span
class="toctext">Rensselaer Polytechnic Institute</span></a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-4"><a href="#Honors_and_distinctions"><span class="tocnumber">3</span> <span
class="toctext">Honors and distinctions</span></a>
<ul>
<li class="toclevel-2 tocsection-5"><a href="#Boards_of_directors"><span class="tocnumber">3.1</span> <span
class="toctext">Boards of directors</span></a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-6"><a href="#Personal"><span class="tocnumber">4</span> <span
class="toctext">Personal</span></a></li>
<li class="toclevel-1 tocsection-7"><a href="#References"><span class="tocnumber">5</span> <span
class="toctext">References</span></a></li>
<li class="toclevel-1 tocsection-8"><a href="#External_links"><span class="tocnumber">6</span> <span class="toctext">External
links</span></a></li>
</ul>
</div>
<h2><span class="mw-headline" id="Early_life_and_schooling">Early life and schooling</span><span class="mw-
editsection"><span class="mw-editsection-bracket">[</span><a
href="/w/index.php?title=Shirley_Ann_Jackson&action=edit&section=1" title="Edit section: Early life and schooling">edit
source</a><span class="mw-editsection-divider"> | </span><a
href="/w/index.php?title=Shirley_Ann_Jackson&veaction=edit&section=1" title="Edit section: Early life and schooling"
class="mw-editsection-visualeditor">edit</a><span class="mw-editsection-bracket">]</span></span></h2>
<p>Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged
her in school.<sup id="cite_ref-diaspora_4-0" class="reference"><a href="#cite_note-diaspora-
4"><span>[</span>4<span>]</span></a></sup> Her father spurred on her interest in science by helping her with projects for her
science classes. At Roosevelt High School, Jackson attended accelerated programs in both math and science, and graduated in
1964 as valedictorian.<sup id="cite_ref-diaspora_4-1" class="reference"><a href="#cite_note-diaspora-
4"><span>[</span>4<span>]</span></a></sup></p>
30. ???Search Result Processing
Passage Retrieval
Watson: Indri and
Lucene
Identifies each HTML
sentence and adds both
the HTML and the clean
text to the passage type
Adds information about
each passage
Passage Parsing
Forms parse trees for
each individual
sentence
Add an array of
passages to each
document
<b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a
href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th
president of <a href="/wiki/Rensselaer_Polytechnic_Institute"
title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic
Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy"
title="Doctor of Philosophy">Ph.D.</a> in physics from the <a
href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts
Institute of Technology">Massachusetts Institute of Technology</a> in
1973, becoming the first <a href="/wiki/African_American" title="African
American">African American</a> woman to earn a doctorate from MIT in
nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a
href="#cite_note-profile-
2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref-
AAAS_3-0" class="reference"><a href="#cite_note-AAAS-
3"><span>[</span>3<span>]</span></a></sup>
<div id="toc" class="toc">">edit</a><span class="mw-
edit]</span></span></h2>
Jackson was born in Washington D.C. Her parents, Beatrice and George
Jackson, strongly valued education and encouraged her in school.<sup
id="cite_ref-diaspora_4-0" class="reference"><a href="#cite_note-
diaspora-4"><span>[</span>4<span>]</span></a></sup> Her father
spurred on her interest in science by helping her with projects for her
science classes. At Roosevelt High School, Jackson attended accelerated
programs in both math and science, and graduated in 1964 as
valedictorian.<sup id="cite_ref-diaspora_4-1" class="reference"><a
href="#cite_note-diaspora-
4"><span>[</span>4<span>]</span></a></sup>
31. ???Search Result Processing
Passage Retrieval
Watson: Indri and
Lucene
Identifies each HTML
sentence and adds both
the HTML and the clean
text to the passage type
Adds information about
each passage
Passage Parsing
Forms parse trees for
each individual
sentence
Add an array of
passages to each
document
<b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a
href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th
president of <a href="/wiki/Rensselaer_Polytechnic_Institute"
title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic
Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy"
title="Doctor of Philosophy">Ph.D.</a> in physics from the <a
href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts
Institute of Technology">Massachusetts Institute of Technology</a> in
1973, becoming the first <a href="/wiki/African_American" title="African
American">African American</a> woman to earn a doctorate from MIT in
nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a
href="#cite_note-profile-
2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref-
AAAS_3-0" class="reference"><a href="#cite_note-AAAS-
3"><span>[</span>3<span>]</span></a></sup>Jackson was born in
Washington D.C. Her parents, Beatrice and George Jackson, strongly
valued education and encouraged her in school.<sup id="cite_ref-
diaspora_4-0" class="reference"><a href="#cite_note-diaspora-
4"><span>[</span>4<span>]</span></a></sup> Her father spurred on
her interest in science by helping her with projects for her science classes.
At Roosevelt High School, Jackson attended accelerated programs in
both math and science, and graduated in 1964 as valedictorian.<sup
id="cite_ref-diaspora_4-1" class="reference"><a href="#cite_note-
diaspora-4"><span>[</span>4<span>]</span></a></sup></p>
32. ???Search Result Processing
Passage Retrieval
Watson: Indri and
Lucene
Identifies each HTML
sentence and adds both
the HTML and the clean
text to the passage type
Adds information about
each passage
Passage Parsing
Forms parse trees for
each individual
sentence
Add an array of
passages to each
document
<b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a
href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th
president of <a href="/wiki/Rensselaer_Polytechnic_Institute"
title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic
Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy"
title="Doctor of Philosophy">Ph.D.</a> in physics from the <a
href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts
Institute of Technology">Massachusetts Institute of Technology</a> in
1973, becoming the first <a href="/wiki/African_American" title="African
American">African American</a> woman to earn a doctorate from MIT in
nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a
href="#cite_note-profile-
2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref-
AAAS_3-0" class="reference"><a href="#cite_note-AAAS-
3"><span>[</span>3<span>]</span></a></sup>Jackson was born in
Washington D.C. Her parents, Beatrice and George Jackson, strongly
valued education and encouraged her in school.<sup id="cite_ref-
diaspora_4-0" class="reference"><a href="#cite_note-diaspora-
4"><span>[</span>4<span>]</span></a></sup> Her father spurred on
her interest in science by helping her with projects for her science classes.
At Roosevelt High School, Jackson attended accelerated programs in
both math and science, and graduated in 1964 as valedictorian.<sup
id="cite_ref-diaspora_4-1" class="reference"><a href="#cite_note-
diaspora-4"><span>[</span>4<span>]</span></a></sup></p>
33. ???Search Result Processing
Passage Retrieval
Watson: Indri and
Lucene
Identifies each HTML
sentence and adds both
the HTML and the clean
text to the passage type
Adds information about
each passage
Passage Parsing
Forms parse trees for
each individual
sentence
Add an array of
passages to each
document
Text:
<b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a
href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a
href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic
Institute">Rensselaer Polytechnic Institute</a>.
34. ???Search Result Processing
Passage Retrieval
Watson: Indri and
Lucene
Identifies each HTML
sentence and adds both
the HTML and the clean
text to the passage type
Adds information about
each passage
Passage Parsing
Forms parse trees for
each individual
sentence
Add an array of
passages to each
document
Text:
<b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a
href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a
href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic
Institute">Rensselaer Polytechnic Institute</a>.
Cleaned Text:
<b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a
href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a
href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic
Institute">Rensselaer Polytechnic Institute</a>.
35. ???Search Result Processing
Passage Retrieval
Watson: Indri and
Lucene
Identifies each HTML
sentence and adds both
the HTML and the clean
text to the passage type
Adds information about
each passage
Passage Parsing
Forms parse trees for
each individual
sentence
Add an array of
passages to each
document
Text:
<b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a
href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a
href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic
Institute">Rensselaer Polytechnic Institute</a>.
Cleaned Text:
<b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a
href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a
href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic
Institute">Rensselaer Polytechnic Institute</a>.
36. ???Search Result Processing
Passage Retrieval
Watson: Indri and
Lucene
Identifies each HTML
sentence and adds both
the HTML and the clean
text to the passage type
Adds information about
each passage
Passage Parsing
Forms parse trees for
each individual
sentence
Add an array of
passages to each
document
Text:
<b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a
href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a
href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic
Institute">Rensselaer Polytechnic Institute</a>.
Cleaned Text:
<b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a
href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th president of <a
href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic
Institute">Rensselaer Polytechnic Institute</a>.
Parse Tree:
(ROOT
(S
(NP
(NP (NNP ) (NNP Shirley) (NNP Ann) (NNP Jackson) (NNP ))
(PRN (-LRB- -LRB-)
(VP (VBN born)
(NP (NNP August) (CD 5) (, ,) (CD 1946)))
(-RRB- -RRB-)))
(VP (VBZ is)
(NP
(NP (DT an) (JJ ) (NNP American) (NNP ) (NNP ) (NN physicist) (NNS ))
(, ,)
(CC and)
(NP
(NP (DT the) (JJ 18th) (NN president))
(PP (IN of)
(NP (NNP ) (NNP Rensselaer) (NNP Polytechnic) (NNP Institute) (NNP ))))))
(. .)))
37. ???Candidate Generation
Using each document, and the passages created by
Search Result Processing, we generate candidates using
three techniques:
1. Title of Document (T.O.D.): Adds the title of the document
as a candidate.
2. Wikipedia Title Candidate Generation: Adds any noun
phrases within the document‟s passage texts that are also
the titles of Wikipedia articles.
3. Anchor Text Candidate Generation: Adds candidates
based on the hyperlinks and metadata within the document.
38. ???Wikipedia Title Candidate
Generation
Runs on the passage array from each search result.
Using the parse tree, retrieves all the noun phrases
in each passage.
Checks if each Noun Phrase is the title of a
Wikipedia Article
Adds the verified candidates along with an array of
the passages that contained them
Array of
Passages
Retrieving
Noun Phrases
Check against
Previous Data
Wikipedia URL
Check
Candidate and
Containing
Passages
(ROOT (S (NP (NP (NNP ) (NNP Shirley) (NNP Ann) (NNP Jackson) (NNP )) (PRN (-
LRB- -LRB-) (VP (VBN born) (NP (NNP August) (CD 5) (, ,) (CD 1946))) (-RRB- -RRB-
))) (VP (VBZ is) (NP (NP (DT an) (JJ ) (NNP American) (NNP ) (NNP ) (NN physicist)
(NNS )) (, ,) (CC and) (NP (NP (DT the) (JJ 18th) (NN president)) (PP (IN of) (NP
(NNP ) (NNP Rensselaer) (NNP Polytechnic) (NNP Institute) (NNP )))))) (. .)))
b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States"
title="United States">American</a> <a href="/wiki/Physicist"
title="Physicist">physicist</a>, and the 18th president of <a
href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic
Institute">Rensselaer Polytechnic Institute</a>.
39. ???Wikipedia Title Candidate
Generation
Runs on the passage array from each search result.
Using the parse tree, retrieves all the noun phrases
in each passage.
Checks if each Noun Phrase is the title of a
Wikipedia Article
Adds the verified candidates along with an array of
the passages that contained them
Array of
Passages
Retrieving
Noun Phrases
Check against
Previous Data
Wikipedia URL
Check
Candidate and
Containing
Passages
(ROOT (S (NP (NP (NNP ) (NNP Shirley) (NNP Ann) (NNP Jackson) (NNP )) (PRN (-
LRB- -LRB-) (VP (VBN born) (NP (NNP August) (CD 5) (, ,) (CD 1946))) (-RRB- -RRB-
))) (VP (VBZ is) (NP (NP (DT an) (JJ ) (NNP American) (NNP ) (NNP ) (NN physicist)
(NNS )) (, ,) (CC and) (NP (NP (DT the) (JJ 18th) (NN president)) (PP (IN of) (NP
(NNP ) (NNP Rensselaer) (NNP Polytechnic) (NNP Institute) (NNP )))))) (. .)))
Shirley Ann Jackson
Shirley Ann Jackson (born August 5, 1946)
August 5, 1946
An American Physicist
An American Physicist, and the 18th president of Rensselaer Polytechnic Institute
The 18th president
The 18th president of Rensselaer Polytechnic Institute
Rensselaer Polytechnic Institute
40. ???Wikipedia Title Candidate
Generation
Runs on the passage array from each search result.
Using the parse tree, retrieves all the noun phrases
in each passage.
Checks if each Noun Phrase is the title of a
Wikipedia Article
Adds the verified candidates along with an array of
the passages that contained them
Array of
Passages
Retrieving
Noun Phrases
Check against
Previous Data
Wikipedia URL
Check
Candidate and
Containing
Passages
(ROOT (S (NP (NP (NNP ) (NNP Shirley) (NNP Ann) (NNP Jackson) (NNP )) (PRN (-
LRB- -LRB-) (VP (VBN born) (NP (NNP August) (CD 5) (, ,) (CD 1946))) (-RRB- -RRB-
))) (VP (VBZ is) (NP (NP (DT an) (JJ ) (NNP American) (NNP ) (NNP ) (NN physicist)
(NNS )) (, ,) (CC and) (NP (NP (DT the) (JJ 18th) (NN president)) (PP (IN of) (NP
(NNP ) (NNP Rensselaer) (NNP Polytechnic) (NNP Institute) (NNP )))))) (. .)))
Shirley Ann Jackson
Shirley Ann Jackson (born August 5, 1946)
August 5, 1946
An American Physicist
An American Physicist, and the 18th president of Rensselaer Polytechnic Institute
The 18th president
The 18th president of Rensselaer Polytechnic Institute
Rensselaer Polytechnic Institute
41. ???Wikipedia Title Candidate
Generation
Runs on the passage array from each search result.
Using the parse tree, retrieves all the noun phrases
in each passage.
Checks if each Noun Phrase is the title of a
Wikipedia Article
Adds the verified candidates along with an array of
the passages that contained them
Array of
Passages
Retrieving
Noun Phrases
Check against
Previous Data
Wikipedia URL
Check
Candidate and
Containing
Passages
http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=
Shirley+Ann+Jackson
Shirley Ann Jackson
Shirley Ann Jackson (born August 5, 1946)
August 5, 1946
An American Physicist
An American Physicist, and the 18th president of Rensselaer Polytechnic Institute
The 18th president
The 18th president of Rensselaer Polytechnic Institute
Rensselaer Polytechnic Institute
42. ???Wikipedia Title Candidate
Generation
Runs on the passage array from each search result.
Using the parse tree, retrieves all the noun phrases
in each passage.
Checks if each Noun Phrase is the title of a
Wikipedia Article
Adds the verified candidates along with an array of
the passages that contained them
Array of
Passages
Retrieving
Noun Phrases
Check against
Previous Data
Wikipedia URL
Check
Candidate and
Containing
Passages
http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=
Shirley+Ann+Jackson
Shirley Ann Jackson
Shirley Ann Jackson (born August 5, 1946)
August 5, 1946
An American Physicist
An American Physicist, and the 18th president of Rensselaer Polytechnic Institute
The 18th president
The 18th president of Rensselaer Polytechnic Institute
Rensselaer Polytechnic Institute
43. ???Anchor Text Candidate Generation
Runs on the passage array from each
search result.
Checks for hyperlinks within the HTML
text of each passage.
Adds the title of the hyperlinked article as
a candidate
Adds each passage containing the
candidate to an array
<b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a
href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th
president of <a href="/wiki/Rensselaer_Polytechnic_Institute"
title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic
Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy"
title="Doctor of Philosophy">Ph.D.</a> in physics from the <a
href="/wiki/Massachusetts_Institute_of_Technology"
title="Massachusetts Institute of Technology">Massachusetts
Institute of Technology</a> in 1973, becoming the first <a
href="/wiki/African_American" title="African American">African
American</a> woman to earn a doctorate from MIT in nuclear
physics.<sup id="cite_ref-profile_2-0" class="reference"><a
href="#cite_note-profile-
2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref-
AAAS_3-0" class="reference"><a href="#cite_note-AAAS-
3"><span>[</span>3<span>]</span></a></sup>Jackson was born
in Washington D.C. Her parents, Beatrice and George Jackson,
strongly valued education and encouraged her in school.<sup
id="cite_ref-diaspora_4-0" class="reference"><a href="#cite_note-
diaspora-4"><span>[</span>4<span>]</span></a></sup> Her
father spurred on her interest in science by helping her with projects
for her science classes. At Roosevelt High School, Jackson
attended accelerated programs in both math and science, and
graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1"
class="reference"><a href="#cite_note-diaspora-
4"><span>[</span>4<span>]</span></a></sup>
44. ???Anchor Text Candidate Generation
Runs on the passage array from each
search result.
Checks for hyperlinks within the HTML
text of each passage.
Adds the title of the hyperlinked article as
a candidate
Adds each passage containing the
candidate to an array
<b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a
href="/wiki/United_States" title="United States">American</a> <a
href="/wiki/Physicist" title="Physicist">physicist</a>, and the 18th
president of <a href="/wiki/Rensselaer_Polytechnic_Institute"
title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic
Institute</a>. She received her <a href="/wiki/Doctor_of_Philosophy"
title="Doctor of Philosophy">Ph.D.</a> in physics from the <a
href="/wiki/Massachusetts_Institute_of_Technology"
title="Massachusetts Institute of Technology">Massachusetts
Institute of Technology</a> in 1973, becoming the first <a
href="/wiki/African_American" title="African American">African
American</a> woman to earn a doctorate from MIT in nuclear
physics.<sup id="cite_ref-profile_2-0" class="reference"><a
href="#cite_note-profile-
2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref-
AAAS_3-0" class="reference"><a href="#cite_note-AAAS-
3"><span>[</span>3<span>]</span></a></sup>Jackson was born
in Washington D.C. Her parents, Beatrice and George Jackson,
strongly valued education and encouraged her in school.<sup
id="cite_ref-diaspora_4-0" class="reference"><a href="#cite_note-
diaspora-4"><span>[</span>4<span>]</span></a></sup> Her
father spurred on her interest in science by helping her with projects
for her science classes. At Roosevelt High School, Jackson
attended accelerated programs in both math and science, and
graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1"
class="reference"><a href="#cite_note-diaspora-
4"><span>[</span>4<span>]</span></a></sup>
45. ???Search Result Processing
<p><b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and
the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a
href="/wiki/Doctor_of_Philosophy" title="Doctor of Philosophy">Ph.D.</a> in physics from the <a href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts Institute of
Technology">Massachusetts Institute of Technology</a> in 1973, becoming the first <a href="/wiki/African_American" title="African American">African American</a> woman to earn a
doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a href="#cite_note-profile-2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref-
AAAS_3-0" class="reference"><a href="#cite_note-AAAS-3"><span>[</span>3<span>]</span></a></sup></p>
<div id="toc" class="toc">
<div id="toctitle">
<h2>Contents</h2>
</div>
<ul>
<li class="toclevel-1 tocsection-1"><a href="#Early_life_and_schooling"><span class="tocnumber">1</span> <span class="toctext">Early life and schooling</span></a></li>
<li class="toclevel-1 tocsection-2"><a href="#Career"><span class="tocnumber">2</span> <span class="toctext">Career</span></a>
<ul>
<li class="toclevel-2 tocsection-3"><a href="#Rensselaer_Polytechnic_Institute"><span class="tocnumber">2.1</span> <span class="toctext">Rensselaer Polytechnic
Institute</span></a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-4"><a href="#Honors_and_distinctions"><span class="tocnumber">3</span> <span class="toctext">Honors and distinctions</span></a>
<ul>
<li class="toclevel-2 tocsection-5"><a href="#Boards_of_directors"><span class="tocnumber">3.1</span> <span class="toctext">Boards of directors</span></a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-6"><a href="#Personal"><span class="tocnumber">4</span> <span class="toctext">Personal</span></a></li>
<li class="toclevel-1 tocsection-7"><a href="#References"><span class="tocnumber">5</span> <span class="toctext">References</span></a></li>
<li class="toclevel-1 tocsection-8"><a href="#External_links"><span class="tocnumber">6</span> <span class="toctext">External links</span></a></li>
</ul>
</div>
<h2><span class="mw-headline" id="Early_life_and_schooling">Early life and schooling</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a
href="/w/index.php?title=Shirley_Ann_Jackson&action=edit&section=1" title="Edit section: Early life and schooling">edit source</a><span class="mw-editsection-divider"> |
</span><a href="/w/index.php?title=Shirley_Ann_Jackson&veaction=edit&section=1" title="Edit section: Early life and schooling" class="mw-editsection-
visualeditor">edit</a><span class="mw-editsection-bracket">]</span></span></h2>
<p>Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged her in school.<sup id="cite_ref-diaspora_4-0"
class="reference"><a href="#cite_note-diaspora-4"><span>[</span>4<span>]</span></a></sup> Her father spurred on her interest in science by helping her with projects for her science
classes. At Roosevelt High School, Jackson attended accelerated programs in both math and science, and graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1"
class="reference"><a href="#cite_note-diaspora-4"><span>[</span>4<span>]</span></a></sup></p>
46. ???Search Result Processing
<p><b>Shirley Ann Jackson</b> (born August 5, 1946) is an <a href="/wiki/United_States" title="United States">American</a> <a href="/wiki/Physicist" title="Physicist">physicist</a>, and
the 18th president of <a href="/wiki/Rensselaer_Polytechnic_Institute" title="Rensselaer Polytechnic Institute">Rensselaer Polytechnic Institute</a>. She received her <a
href="/wiki/Doctor_of_Philosophy" title="Doctor of Philosophy">Ph.D.</a> in physics from the <a href="/wiki/Massachusetts_Institute_of_Technology" title="Massachusetts Institute of
Technology">Massachusetts Institute of Technology</a> in 1973, becoming the first <a href="/wiki/African_American" title="African American">African American</a> woman to earn a
doctorate from MIT in nuclear physics.<sup id="cite_ref-profile_2-0" class="reference"><a href="#cite_note-profile-2"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref-
AAAS_3-0" class="reference"><a href="#cite_note-AAAS-3"><span>[</span>3<span>]</span></a></sup></p>
<div id="toc" class="toc">
<div id="toctitle">
<h2>Contents</h2>
</div>
<ul>
<li class="toclevel-1 tocsection-1"><a href="#Early_life_and_schooling"><span class="tocnumber">1</span> <span class="toctext">Early life and schooling</span></a></li>
<li class="toclevel-1 tocsection-2"><a href="#Career"><span class="tocnumber">2</span> <span class="toctext">Career</span></a>
<ul>
<li class="toclevel-2 tocsection-3"><a href="#Rensselaer_Polytechnic_Institute"><span class="tocnumber">2.1</span> <span class="toctext">Rensselaer Polytechnic
Institute</span></a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-4"><a href="#Honors_and_distinctions"><span class="tocnumber">3</span> <span class="toctext">Honors and distinctions</span></a>
<ul>
<li class="toclevel-2 tocsection-5"><a href="#Boards_of_directors"><span class="tocnumber">3.1</span> <span class="toctext">Boards of directors</span></a></li>
</ul>
</li>
<li class="toclevel-1 tocsection-6"><a href="#Personal"><span class="tocnumber">4</span> <span class="toctext">Personal</span></a></li>
<li class="toclevel-1 tocsection-7"><a href="#References"><span class="tocnumber">5</span> <span class="toctext">References</span></a></li>
<li class="toclevel-1 tocsection-8"><a href="#External_links"><span class="tocnumber">6</span> <span class="toctext">External links</span></a></li>
</ul>
</div>
<h2><span class="mw-headline" id="Early_life_and_schooling">Early life and schooling</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a
href="/w/index.php?title=Shirley_Ann_Jackson&action=edit&section=1" title="Edit section: Early life and schooling">edit source</a><span class="mw-editsection-divider"> |
</span><a href="/w/index.php?title=Shirley_Ann_Jackson&veaction=edit&section=1" title="Edit section: Early life and schooling" class="mw-editsection-
visualeditor">edit</a><span class="mw-editsection-bracket">]</span></span></h2>
<p>Jackson was born in Washington D.C. Her parents, Beatrice and George Jackson, strongly valued education and encouraged her in school.<sup id="cite_ref-diaspora_4-0"
class="reference"><a href="#cite_note-diaspora-4"><span>[</span>4<span>]</span></a></sup> Her father spurred on her interest in science by helping her with projects for her science
classes. At Roosevelt High School, Jackson attended accelerated programs in both math and science, and graduated in 1964 as valedictorian.<sup id="cite_ref-diaspora_4-1"
class="reference"><a href="#cite_note-diaspora-4"><span>[</span>4<span>]</span></a></sup></p>
47. ???Future Work
Search Result Processing
Improve methods of imitating Indri or Lucene passage retrieval
without a corpus.
Create a passage score and rank.
Candidate Generation
Continue to improve the speed and quality of Candidate
Generation
Research and implement Candidate Generation from
Structured Sources (Prismatic, Answer Lookup)
Record and measure recall in comparison with Watson and
other Question answering software.
51. ???Scorers
Passage Term Match
Textual Alignment
Skip-Bigram
Each of these scores supportive evidence
These scores are then merged to produce a single candidate
score
52. ???Passage Term Search
Question Terms
Extracted
Passage is searched
for those terms
Score calculated for
that passage
Done per passage
“Where is Toronto?”
“Where” “is” “Toronto”
“Toronto is in Southern Ontario”
“Toronto is ”
Score = IDF(Toronto) + IDF(is)
53. ???Textual Alignment
Finds an optimal alignment of a question and a passage
Assigns “partial credit” for close matches
“Who is the President of RPI?”
Shirley Ann Jackson is the President of RPI.
Who is the President of RPI.
54. ???Skip-Bigram
Constructs a graph
Nodes represent terms (syntactic objects)
Edges represent relations
Extracts skip-bigrams
A skip-bigram is a pair of nodes either directly connected or
which have only one intermediate node
Skip-bigrams represent close relationships between terms
Scores based on number of common skip-bigrams
58. ???UIMA
The DeepQA architecture is built on top of another
architecture, UIMA (Unstructured Information
Management Architecture).
A UIMA CAS (Common Analysis Structure) contains a
contiguous block of data (normally text), and annotations,
which contain start & end indexes into the data, and
additional data (strings, integers, doubles, arrays,
annotation references).
59. ???UIMA
CAS Multipliers output multiple CASes based on the data
in the input CAS; this facilitates parallelization, which is
the key to Watson‟s response time.
60. ???DeepQA Architecture
The DeepQA architecture, which both IBM Watson and RPI
MiniDeepQA implement, is a QA (Question Answering) system
that answers questions by generating as many potential
answers as is practical, then filtering them with multiple
evidence scorers in parallel.
61. ???Data cache
IBM Watson has a pre-processed corpus of information,
generated automatically by a subset of the DeepQA
pipeline from an enormous volume of raw text, which the
remainder of the pipeline uses at question time.
As our system retrieves information from the internet on a
per-question basis, it cannot (practically) process the
whole corpus in advance.
62. ???Data cache
Since parsing the documents takes a large amount of
time, in order to test/demonstrate the system, it is
beneficial to store webpages and associated parses
locally. This allows a question that has been asked
before, and candidates that come up for multiple
questions, to be processed faster.
As a side-benefit of the caching, if a website is
temporarily down, its data can still be used (if it was not
down at some point in the past).
63. ???Graphical User Interface
Towards the start of the project, we ran our system using
the Document Analyzer (a UIMA-provided tool).
While it was useful, once we had the entire pipeline set
up, testing the full system required more input than
necessary.
Additionally, there wasn't a convenient way to display just
the intended output, nor intermediate output at a level
suitable for monitoring progress/giving demonstrations.
64. ???Graphical User Interface
The GUI addresses these concerns, and has additionally
been extensively tweaked to be demonstration-friendly.
Editor's Notes
Our project differs from IBM’s Watson in that it draws from the internet each run to generate a question specific corpus.The reason we only take 5 documents is because of time it takes to parse these documents with the Stanford Parser. This is our main limiting factor, and as we parse
You can navigate HTML pages with jsoup very similar to jquery, selecting items by ID/class/property
Cache stores the URL and entire HTML body of the pagepassages are then used in different ways for Candidate Generation, Answer Scoring, and Supporting Evidence R