SlideShare a Scribd company logo
1 of 22
Elizabeth Walden and Xavier Young
University of Saint Mary
Outline:
Purpose of the Analysis
Athena—Many Hours of Research
Natural-language processing (NLP)
Analysis: Constraints, Limitations, and Assumptions
Co-reference Resolution Tool
Wikifier
Named Entity Recognizer
Next Steps
Purpose of the Analysis:
 The purpose of this analysis was to nominate NLP tools that will
help researchers and analysts at TRADOC G2 M&SD locate data
and information culled from documents to populate their human,
social, behavior, and culture simulation called Athena.
 In coordination with researchers from the TRADOC G2 Modeling
and Simulation Directorate (M&SD) housed at Fort Leavenworth,
students from the spring 2015 semester course titled “Software
Development and Design” at the University of Saint Mary in
Leavenworth, Kansas analyzed several natural-language
processing (NLP) tools from the Cognitive Computation Group
(CCG) at the University of Illinois at Urbana-Champaign.
Athena—Many Hours of Research
 Athena is a software application that enables analysts and
commanders to simulate the Political, Military, Economic,
Social, Infrastructure, and Information (PMESII) entities and
processes within the context of a battlefield environment, a
wide-area security operation, or in support of a country study
to evaluate social evolution dynamics.
 Needs to be populated with entities such as actor, civilian
group, force group, message, belief system, and others.
 Athena researchers troll sometimes a hundred documents or
more looking for relevant entities and relationships… MANY,
MANY HOURS…
PMESII-Related
Data and
Documents
Athena-relevant
entities and
relationships
categorized so that
an analyst may
inspect them for
usefulness
Data-Mining
Tool
POTENTIAL
TIME
SAVER
Natural-language Processing (NLP):
 SHORT VERSION: NLP is the ability of a computer to understand a
language just like a human can understand a language.
 LONG VERSION: NLP is a field of computer science, artificial
intelligence, and computational linguistics concerned with the
interactions between computers and human (natural) languages.
One important goal of NLP is trying to get computational systems to
“understand” the meaning (semantics) and context of words,
sentences, and other linguistic devices in much the same way that a
human mind is able to do so.
 Cognitive Computation Group (CCG) at the University of Illinois at
Urbana-Champaign
 Co-Reference Resolution | Name Entity Recognizer | Wikifier
Analysis: Constraints, Limitations, and
Assumptions
 Time, because the project
began near the end of the
spring semester, giving
the team only six weeks
to do the work
 The fact that the team
could only work together
two days a week, even
though the students from
USM worked many more
hours every week 
 The fact that the lead of
the team took on a new
job half way through the
work
 The fact that the team
could only access
online interactive
demonstration versions
of CCG’s tools, rather
than full access to the
complete tools
 Although some of the
tools were discovered
listed on DARPA’s DEFT
site, there was no way
to access them, and the
site administrator never
responded to the team’s
email requesting
access
 Most members of the team
have limited experience
with NLP tools
 Access to NLP tools will be
limited
 It is not likely that any of
the tools inspected this go
around will serve the
purpose of helping
researchers and analysts at
the TRADOC G2 M&SD
locate data and information
to populate Athena
Co-reference Resolution Tool:
Description and Purpose
 A given entity—representing a person, a location, or an organization, for
example—may be mentioned in a text in multiple, ambiguous ways. The
Co-reference Resolution Tool processes unannotated text, detecting
mentions of entities and showing which mentions are co-referential (i.e., all
words, phrases, or expressions that refer to the same entity in a text). The
purpose of this tool is to help parse documents for common entities and
represent the document in a diagram form. The interactive demo consists
of a box into which text is placed and a button that says Submit. Once text
is entered and the Submit button is pressed, within a few seconds the
demo displays the parsed results.
“Helicopters will patrol the temporary no-fly zone around New Jersey's MetLife Stadium Sunday, with F-16s based in Atlantic
City ready to be scrambled if an unauthorized aircraft does enter the restricted airspace. Down below, bomb-sniffing dogs will
patrol the trains and buses that are expected to take approximately 30,000 of the 80,000-plus spectators to Sunday's Super
Bowl between the Denver Broncos and Seattle Seahawks. The Transportation Security Administration said it has added about
two dozen dogs to monitor passengers coming in and out of the airport around the Super Bowl. On Saturday, TSA agents
demonstrated how the dogs can sniff out many different types of explosives. Once they do, they're trained to sit rather than
attack, so as not to raise suspicion or create a panic. TSA spokeswoman Lisa Farbstein said the dogs undergo 12 weeks of
training, which costs about $200,000, factoring in food, vehicles and salaries for trainers. Dogs have been used in cargo areas
for some time, but have just been introduced recently in passenger areas at Newark and JFK airports. JFK has one dog and
Newark has a handful, Farbstein said.”
“Helicopters will patrol
the temporary no-fly
zone around New
Jersey's MetLife
Stadium Sunday…”
Bad
Really
Good
Good
Bad
Not Sure, so Bad
Applicability, Usability, Recommendation
Wikifier: Parses a text and links terms to Wikipedia
“Helicopters will patrol the temporary no-fly zone around New Jersey's MetLife Stadium Sunday,
with F-16s based in Atlantic City ready to be scrambled if an unauthorized aircraft does enter the
restricted airspace. Down below, bomb-sniffing dogs will patrol the trains and buses that are
expected to take approximately 30,000 of the 80,000-plus spectators to Sunday's Super Bowl
between the Denver Broncos and Seattle Seahawks.”
Applicability, Usability, Recommendation
Named Entity Recognizer:
 The Named Entity
Recognizer tool labels
eighteen predefined types
of entities in plain text, all
shown in the image to the
left. The purpose of this
tool is simply to tell you
whether any terms in the
parsed text falls under one
of the eighteen different
entity types. Simply add
text in the box provided
and press Submit.
Named Entity Recognizer:
Not Bad !
Named Entity Recognizer:
Applicability, Usability, Recommendation
PMESII-Related
Data and
Documents
Athena-relevant
entities and
relationships
categorized so that
an analyst may
inspect them for
usefulness
Named Entity
Recognizer
Wikifier
Co-reference
Resolution
Tool
Website
Addresses:
 http://cogcomp.cs.illinois.edu/page/demo_view/Coref
Named Entity Recognizer
Wikifier
Co-reference Resolution Tool
 http://cogcomp.cs.illinois.edu/page/demo_view/Wikifier
 http://cogcomp.cs.illinois.edu/page/demo_view/NERextended
Next Steps
Get full access to these tools, not just
the demos
Look into other open-source tools
Get access to DARPA tools (!)
Thank You

More Related Content

Similar to NLP PowerPoint

Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSeth Grimes
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Bianca Pereira
 
InSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD
 
La résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesLa résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesData2B
 
How the Context Matters Language and Interaction in Dialogues
How the Context Matters Language and Interaction in DialoguesHow the Context Matters Language and Interaction in Dialogues
How the Context Matters Language and Interaction in DialoguesYun-Nung (Vivian) Chen
 
A Model Of Opinion Mining For Classifying Movies
A Model Of Opinion Mining For Classifying MoviesA Model Of Opinion Mining For Classifying Movies
A Model Of Opinion Mining For Classifying MoviesAndrew Molina
 
Intro to sentiment analysis
Intro to sentiment analysisIntro to sentiment analysis
Intro to sentiment analysisTimea Turdean
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Free Essays On Racism In Australia
Free Essays On Racism In AustraliaFree Essays On Racism In Australia
Free Essays On Racism In AustraliaJennifer Brown
 
New Research Articles 2020 May Issue International Journal of Software Engin...
New Research Articles 2020 May  Issue International Journal of Software Engin...New Research Articles 2020 May  Issue International Journal of Software Engin...
New Research Articles 2020 May Issue International Journal of Software Engin...ijseajournal
 
Acknowledgement Entity Recognition In CORD-19 Papers
Acknowledgement Entity Recognition In CORD-19 PapersAcknowledgement Entity Recognition In CORD-19 Papers
Acknowledgement Entity Recognition In CORD-19 PapersMartha Brown
 
Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018suresh sood
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
Data science innovations
Data science innovations Data science innovations
Data science innovations suresh sood
 
Project Chronos Presentation - Machine Learning Italy MeetUp in Turin
Project Chronos Presentation - Machine Learning Italy MeetUp in TurinProject Chronos Presentation - Machine Learning Italy MeetUp in Turin
Project Chronos Presentation - Machine Learning Italy MeetUp in TurinJacopo Durandi
 

Similar to NLP PowerPoint (20)

Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled Vision
 
Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)
 
InSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & Response
 
InSTEDD HISA Conference
InSTEDD HISA ConferenceInSTEDD HISA Conference
InSTEDD HISA Conference
 
Predicting Budget from Transportation Research Grant Description: An Explorat...
Predicting Budget from Transportation Research Grant Description: An Explorat...Predicting Budget from Transportation Research Grant Description: An Explorat...
Predicting Budget from Transportation Research Grant Description: An Explorat...
 
La résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesLa résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphes
 
How the Context Matters Language and Interaction in Dialogues
How the Context Matters Language and Interaction in DialoguesHow the Context Matters Language and Interaction in Dialogues
How the Context Matters Language and Interaction in Dialogues
 
A Model Of Opinion Mining For Classifying Movies
A Model Of Opinion Mining For Classifying MoviesA Model Of Opinion Mining For Classifying Movies
A Model Of Opinion Mining For Classifying Movies
 
ICAME 2010
ICAME 2010ICAME 2010
ICAME 2010
 
Intro to sentiment analysis
Intro to sentiment analysisIntro to sentiment analysis
Intro to sentiment analysis
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Free Essays On Racism In Australia
Free Essays On Racism In AustraliaFree Essays On Racism In Australia
Free Essays On Racism In Australia
 
New Research Articles 2020 May Issue International Journal of Software Engin...
New Research Articles 2020 May  Issue International Journal of Software Engin...New Research Articles 2020 May  Issue International Journal of Software Engin...
New Research Articles 2020 May Issue International Journal of Software Engin...
 
Acknowledgement Entity Recognition In CORD-19 Papers
Acknowledgement Entity Recognition In CORD-19 PapersAcknowledgement Entity Recognition In CORD-19 Papers
Acknowledgement Entity Recognition In CORD-19 Papers
 
Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Data science innovations
Data science innovations Data science innovations
Data science innovations
 
Project Chronos Presentation - Machine Learning Italy MeetUp in Turin
Project Chronos Presentation - Machine Learning Italy MeetUp in TurinProject Chronos Presentation - Machine Learning Italy MeetUp in Turin
Project Chronos Presentation - Machine Learning Italy MeetUp in Turin
 
zjh-cv-01-20
zjh-cv-01-20zjh-cv-01-20
zjh-cv-01-20
 

NLP PowerPoint

  • 1. Elizabeth Walden and Xavier Young University of Saint Mary
  • 2. Outline: Purpose of the Analysis Athena—Many Hours of Research Natural-language processing (NLP) Analysis: Constraints, Limitations, and Assumptions Co-reference Resolution Tool Wikifier Named Entity Recognizer Next Steps
  • 3. Purpose of the Analysis:  The purpose of this analysis was to nominate NLP tools that will help researchers and analysts at TRADOC G2 M&SD locate data and information culled from documents to populate their human, social, behavior, and culture simulation called Athena.  In coordination with researchers from the TRADOC G2 Modeling and Simulation Directorate (M&SD) housed at Fort Leavenworth, students from the spring 2015 semester course titled “Software Development and Design” at the University of Saint Mary in Leavenworth, Kansas analyzed several natural-language processing (NLP) tools from the Cognitive Computation Group (CCG) at the University of Illinois at Urbana-Champaign.
  • 4. Athena—Many Hours of Research  Athena is a software application that enables analysts and commanders to simulate the Political, Military, Economic, Social, Infrastructure, and Information (PMESII) entities and processes within the context of a battlefield environment, a wide-area security operation, or in support of a country study to evaluate social evolution dynamics.  Needs to be populated with entities such as actor, civilian group, force group, message, belief system, and others.  Athena researchers troll sometimes a hundred documents or more looking for relevant entities and relationships… MANY, MANY HOURS…
  • 5. PMESII-Related Data and Documents Athena-relevant entities and relationships categorized so that an analyst may inspect them for usefulness Data-Mining Tool POTENTIAL TIME SAVER
  • 6. Natural-language Processing (NLP):  SHORT VERSION: NLP is the ability of a computer to understand a language just like a human can understand a language.  LONG VERSION: NLP is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. One important goal of NLP is trying to get computational systems to “understand” the meaning (semantics) and context of words, sentences, and other linguistic devices in much the same way that a human mind is able to do so.  Cognitive Computation Group (CCG) at the University of Illinois at Urbana-Champaign  Co-Reference Resolution | Name Entity Recognizer | Wikifier
  • 7.
  • 8. Analysis: Constraints, Limitations, and Assumptions  Time, because the project began near the end of the spring semester, giving the team only six weeks to do the work  The fact that the team could only work together two days a week, even though the students from USM worked many more hours every week   The fact that the lead of the team took on a new job half way through the work  The fact that the team could only access online interactive demonstration versions of CCG’s tools, rather than full access to the complete tools  Although some of the tools were discovered listed on DARPA’s DEFT site, there was no way to access them, and the site administrator never responded to the team’s email requesting access  Most members of the team have limited experience with NLP tools  Access to NLP tools will be limited  It is not likely that any of the tools inspected this go around will serve the purpose of helping researchers and analysts at the TRADOC G2 M&SD locate data and information to populate Athena
  • 9. Co-reference Resolution Tool: Description and Purpose  A given entity—representing a person, a location, or an organization, for example—may be mentioned in a text in multiple, ambiguous ways. The Co-reference Resolution Tool processes unannotated text, detecting mentions of entities and showing which mentions are co-referential (i.e., all words, phrases, or expressions that refer to the same entity in a text). The purpose of this tool is to help parse documents for common entities and represent the document in a diagram form. The interactive demo consists of a box into which text is placed and a button that says Submit. Once text is entered and the Submit button is pressed, within a few seconds the demo displays the parsed results.
  • 10. “Helicopters will patrol the temporary no-fly zone around New Jersey's MetLife Stadium Sunday, with F-16s based in Atlantic City ready to be scrambled if an unauthorized aircraft does enter the restricted airspace. Down below, bomb-sniffing dogs will patrol the trains and buses that are expected to take approximately 30,000 of the 80,000-plus spectators to Sunday's Super Bowl between the Denver Broncos and Seattle Seahawks. The Transportation Security Administration said it has added about two dozen dogs to monitor passengers coming in and out of the airport around the Super Bowl. On Saturday, TSA agents demonstrated how the dogs can sniff out many different types of explosives. Once they do, they're trained to sit rather than attack, so as not to raise suspicion or create a panic. TSA spokeswoman Lisa Farbstein said the dogs undergo 12 weeks of training, which costs about $200,000, factoring in food, vehicles and salaries for trainers. Dogs have been used in cargo areas for some time, but have just been introduced recently in passenger areas at Newark and JFK airports. JFK has one dog and Newark has a handful, Farbstein said.”
  • 11. “Helicopters will patrol the temporary no-fly zone around New Jersey's MetLife Stadium Sunday…” Bad Really Good Good Bad Not Sure, so Bad
  • 13. Wikifier: Parses a text and links terms to Wikipedia “Helicopters will patrol the temporary no-fly zone around New Jersey's MetLife Stadium Sunday, with F-16s based in Atlantic City ready to be scrambled if an unauthorized aircraft does enter the restricted airspace. Down below, bomb-sniffing dogs will patrol the trains and buses that are expected to take approximately 30,000 of the 80,000-plus spectators to Sunday's Super Bowl between the Denver Broncos and Seattle Seahawks.”
  • 15. Named Entity Recognizer:  The Named Entity Recognizer tool labels eighteen predefined types of entities in plain text, all shown in the image to the left. The purpose of this tool is simply to tell you whether any terms in the parsed text falls under one of the eighteen different entity types. Simply add text in the box provided and press Submit.
  • 17. Not Bad ! Named Entity Recognizer:
  • 19. PMESII-Related Data and Documents Athena-relevant entities and relationships categorized so that an analyst may inspect them for usefulness Named Entity Recognizer Wikifier Co-reference Resolution Tool
  • 20. Website Addresses:  http://cogcomp.cs.illinois.edu/page/demo_view/Coref Named Entity Recognizer Wikifier Co-reference Resolution Tool  http://cogcomp.cs.illinois.edu/page/demo_view/Wikifier  http://cogcomp.cs.illinois.edu/page/demo_view/NERextended
  • 21. Next Steps Get full access to these tools, not just the demos Look into other open-source tools Get access to DARPA tools (!)