2. Outline:
Purpose of the Analysis
Athena—Many Hours of Research
Natural-language processing (NLP)
Analysis: Constraints, Limitations, and Assumptions
Co-reference Resolution Tool
Wikifier
Named Entity Recognizer
Next Steps
3. Purpose of the Analysis:
The purpose of this analysis was to nominate NLP tools that will
help researchers and analysts at TRADOC G2 M&SD locate data
and information culled from documents to populate their human,
social, behavior, and culture simulation called Athena.
In coordination with researchers from the TRADOC G2 Modeling
and Simulation Directorate (M&SD) housed at Fort Leavenworth,
students from the spring 2015 semester course titled “Software
Development and Design” at the University of Saint Mary in
Leavenworth, Kansas analyzed several natural-language
processing (NLP) tools from the Cognitive Computation Group
(CCG) at the University of Illinois at Urbana-Champaign.
4. Athena—Many Hours of Research
Athena is a software application that enables analysts and
commanders to simulate the Political, Military, Economic,
Social, Infrastructure, and Information (PMESII) entities and
processes within the context of a battlefield environment, a
wide-area security operation, or in support of a country study
to evaluate social evolution dynamics.
Needs to be populated with entities such as actor, civilian
group, force group, message, belief system, and others.
Athena researchers troll sometimes a hundred documents or
more looking for relevant entities and relationships… MANY,
MANY HOURS…
6. Natural-language Processing (NLP):
SHORT VERSION: NLP is the ability of a computer to understand a
language just like a human can understand a language.
LONG VERSION: NLP is a field of computer science, artificial
intelligence, and computational linguistics concerned with the
interactions between computers and human (natural) languages.
One important goal of NLP is trying to get computational systems to
“understand” the meaning (semantics) and context of words,
sentences, and other linguistic devices in much the same way that a
human mind is able to do so.
Cognitive Computation Group (CCG) at the University of Illinois at
Urbana-Champaign
Co-Reference Resolution | Name Entity Recognizer | Wikifier
7.
8. Analysis: Constraints, Limitations, and
Assumptions
Time, because the project
began near the end of the
spring semester, giving
the team only six weeks
to do the work
The fact that the team
could only work together
two days a week, even
though the students from
USM worked many more
hours every week
The fact that the lead of
the team took on a new
job half way through the
work
The fact that the team
could only access
online interactive
demonstration versions
of CCG’s tools, rather
than full access to the
complete tools
Although some of the
tools were discovered
listed on DARPA’s DEFT
site, there was no way
to access them, and the
site administrator never
responded to the team’s
email requesting
access
Most members of the team
have limited experience
with NLP tools
Access to NLP tools will be
limited
It is not likely that any of
the tools inspected this go
around will serve the
purpose of helping
researchers and analysts at
the TRADOC G2 M&SD
locate data and information
to populate Athena
9. Co-reference Resolution Tool:
Description and Purpose
A given entity—representing a person, a location, or an organization, for
example—may be mentioned in a text in multiple, ambiguous ways. The
Co-reference Resolution Tool processes unannotated text, detecting
mentions of entities and showing which mentions are co-referential (i.e., all
words, phrases, or expressions that refer to the same entity in a text). The
purpose of this tool is to help parse documents for common entities and
represent the document in a diagram form. The interactive demo consists
of a box into which text is placed and a button that says Submit. Once text
is entered and the Submit button is pressed, within a few seconds the
demo displays the parsed results.
10. “Helicopters will patrol the temporary no-fly zone around New Jersey's MetLife Stadium Sunday, with F-16s based in Atlantic
City ready to be scrambled if an unauthorized aircraft does enter the restricted airspace. Down below, bomb-sniffing dogs will
patrol the trains and buses that are expected to take approximately 30,000 of the 80,000-plus spectators to Sunday's Super
Bowl between the Denver Broncos and Seattle Seahawks. The Transportation Security Administration said it has added about
two dozen dogs to monitor passengers coming in and out of the airport around the Super Bowl. On Saturday, TSA agents
demonstrated how the dogs can sniff out many different types of explosives. Once they do, they're trained to sit rather than
attack, so as not to raise suspicion or create a panic. TSA spokeswoman Lisa Farbstein said the dogs undergo 12 weeks of
training, which costs about $200,000, factoring in food, vehicles and salaries for trainers. Dogs have been used in cargo areas
for some time, but have just been introduced recently in passenger areas at Newark and JFK airports. JFK has one dog and
Newark has a handful, Farbstein said.”
11. “Helicopters will patrol
the temporary no-fly
zone around New
Jersey's MetLife
Stadium Sunday…”
Bad
Really
Good
Good
Bad
Not Sure, so Bad
13. Wikifier: Parses a text and links terms to Wikipedia
“Helicopters will patrol the temporary no-fly zone around New Jersey's MetLife Stadium Sunday,
with F-16s based in Atlantic City ready to be scrambled if an unauthorized aircraft does enter the
restricted airspace. Down below, bomb-sniffing dogs will patrol the trains and buses that are
expected to take approximately 30,000 of the 80,000-plus spectators to Sunday's Super Bowl
between the Denver Broncos and Seattle Seahawks.”
15. Named Entity Recognizer:
The Named Entity
Recognizer tool labels
eighteen predefined types
of entities in plain text, all
shown in the image to the
left. The purpose of this
tool is simply to tell you
whether any terms in the
parsed text falls under one
of the eighteen different
entity types. Simply add
text in the box provided
and press Submit.