Named Entity Recognition - ACL 2011 Presentation

The Web is not a PERSON, Berners-
Lee is not an ORGANIZATION, and
African-Americans are not
LOCATIONS:
An Analysis of the Performance of
Named-Entity Recognition
Robert Krovetz (Lexicalresearch.com), Paul Deane, Nitin
Madnani (ETS)

A Review by Richard
Littauer (UdS)

The Background
 Named-Entity Recognition (NER) is
normally judged in the context of
Information Extraction (IE)

The Background
 Various competitions

The Background
 Various competitions
 Recently:
◦ non-English languages
◦ improving unsupervised learning methods

The Background
 “There are no well-established
standards for evaluation of NER.”

The Background
 “There are no well-established
standards for evaluation of NER.”
◦ Criteria for NER system changes for
competitions
◦ Proprietary software

The Background
 KDM wanted to identify MWEs…

The Background
… but false positives, tagging
inconsistencies stopped this.

The Background
… but false positives, tagging
inconsistencies stopped this.

 IE derives Recall and Precision from
Information Retrieval
 NER is just a small part of this, so is
rarely evaluated independently

The Background
 So, they want to test NER systems,
and provide a unit test based on the
problems encountered

Evaluation
Compared three NER taggers:
 Stanford:
◦ CRF, 100m training corpus;
 University of Illinois (LBJ):
◦ Regularized average perceptron, Reuters
1996 News Corpus;
 BBN IdentiFinder (IdentiFinder):
◦ HMMs, commercial

Evaluation
 Agreement on Classification

Evaluation
 Ambiguity in Discourse

Evaluation
 Ambiguity in Discourse

 Stanford vs. LBJ on internal ETS
425m corpus
 All three on American National Corpus

Stanford vs. LBJ
 NER reported as 85-95% accurate.

Stanford vs. LBJ
 NER reported as 85-95% accurate.
 Same number for both: 1.95m for
Stanford, 1.8m for LBJ (7.6%
difference)
 However, errors:

Stanford vs. LBJ
 Agreement:

Stanford vs. LBJ
 Ambiguity:

Stanford vs. LBJ vs.
IdentiFinder
 Agreement:

IdentiFinder
 Differences:
◦ How they are tokenized
◦ Number of entities recognized overall

IdentiFinder
 Ambiguity:

Unit Test
 Created two documents that can be
used as texts
◦ Different cases for true positives of
PERSON, LOCATION, ORGANIZATION
◦ Entirely upper case not NE (Ex.
AAARGH)
◦ Punctuated terms not NE
◦ Terms with Initials
◦ Acronyms (some expanded, some not)
◦ Last names in close proximity to first
names

Unit Test
 Created two documents that can be
used as texts
◦ Terms with prepositions (Mass. Inst. Of
Tech.)
◦ Terms with location and organization
(Amherst College)

 Provided freely online.

One NE Tag per Discourse
 Unusual for multiple occurrences of a
token in a document to be different
entities
 True for homonyms
 An exception: Location + sports team

One NE Tag per Discourse
 Stanford, LBJ have features for non-
local dependencies to help with this.
 KDM: Two other uses for NLD:
◦ Source of error in evaluation
◦ A way to identify semantically related
entities

 These should be treated as
exceptions

Discussion
 There are guidelines for NER – but we
need standards.
 The community should focus on
PERSON, ORGANISATION,
LOCATION, and MISC.
◦ Harder to deal with than Dates, Times.
◦ Disagreement between taggers.
◦ MISC is necessary.
◦ These have important value elsewhere.

Discussion
 To improve intrinsic evaluation for
NER:
1. Create test sets for divers domains.
2. Use standardized sets for different
phenomena.
3. Report accuracy for POL separately.
4. Establish uncertainty in the tagging
system.

Conclusion
 90% accuracy not real.
 We need to use only entities that are
agreed on by multiple taggers.
 Even in cases where they both
disagree (Hint: Future work.)

 Unit test downloadable.

Cheers/PERSON

Richard/ORGANISATION thanks the
Mword Class/LOCATION for listening to
his talk about Berners-Lee/MISC

Named Entity Recognition - ACL 2011 Presentation

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Named Entity Recognition - ACL 2011 Presentation

Similar to Named Entity Recognition - ACL 2011 Presentation (20)

More from Richard Littauer

More from Richard Littauer (14)

Recently uploaded

Recently uploaded (20)

Named Entity Recognition - ACL 2011 Presentation

Editor's Notes