NERD: Evaluating Named Entity Recognition Tools in the Web of Data

NERD: Evaluating Named Entity
Recognition Tools in the Web of Data

Giuseppe Rizzo <giuseppe.rizzo@eurecom.fr>
Raphaël Troncy <raphael.troncy@eurecom.fr>

What is a Named Entity recognition task?

A task that aims to locate and classify the name of a person or an
organization, a location, a brand, a product, a numeric expression
including time, date, money and percent in a textual document

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 2/21

Named Entity recognition tools


Differences among those NER extractors


Granularity

extract NE from sentences vs from the entire document


Technologies used

algorithms used to extract NE

supported languages

taxonomy of type of NE recognized

disambiguation (dataset used to provide links)

content request size

Response format


And ...


What about precision and recall?

Which extractor best fits my needs?


Seeks to find pros and cons of
those extractors

What is NERD?
REST API1 ontology3
UI2
1
http://nerd.eurecom.fr/api/application.wadl
2
http://nerd.eurecom.fr/
3
http://nerd.eurecom.fr/ontology


Showcase

http://nerd.eurecom.fr

Science: "Google Cars Drive Themselves", http://bit.ly/oTj8md (part
of the original resource found at http://nyti.ms/9p19i8)


Evaluation

5 extractors using default configurations

Controlled experiment

4 human raters

10 English news articles (5 from BBC and 5 from The New York Times)

each rater evaluated each article for all the extractors

200 evaluations in total


Uncontrolled experiment

17 human raters

53 English news articles (sources: CNN, BBC, The New York Times and
Yahoo! News)

free selection of articles

Each human rater received a training1

1
http://nerd.eurecom.fr/help


Evaluation output

t = (NE, type, URI, relevant)

The assessment consists in rating these criteria with a Boolean value

If no type or no disambiguation URI is provided by the extractor, it is
considered false by default


Controlled experiment - dataset1

Categories: World, Business, Sport, Science, Health

1 BBC article and 1 NYT article for each category

Average word number per article: 981

The final number of unique entities detected is 4641 with an average
number of named entity per article equal to 23.2

Some of the extractors (e.g. DBpedia Spotlight and Extractiv) provide NE
duplicates. We removed all duplicates do not bias the statistics

1
http://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz


Controlled experiment – agreement score
Fleiss's kappa score1

Grouped by
extractor

Grouped by
source

Grouped by
category

1
Joseph L. Fleiss. Measuring nominal scale agreement among many raters. Psychological
Bulletin, 76(5):378–382, 1971


Controlled experiment – statistic result

Overall
statistics

Grouped by
extractor

different behavior
for different sources

Grouped by
category


Uncontrolled experiment - dataset

17 raters were free to select English news articles from CNN, BBC,
The New York Times and Yahoo! News

53 news articles selected

Total number of assessments = 94 and the assessment average number
per user = 5.2

Each article assessed at least by 2 different tools

The final number of unique entities detected is 1616 with an average
number of named entity per article equal to 34

Some of the extractors (e.g. DBpedia Spotlight and Extractiv) provide NE
duplicates. In order do not bias the statistics, we removed all duplicates


Uncontrolled experiment – statistic result (I)

Overall
precision

Grouped by
extractors


Uncontrolled experiment – statistic result (II)

Grouped by
category


Q. Which are the best NER tools ?
Conclusion A. They are ...

AlchemyAPI has obtained the best results in NE extraction and
categorization

DBpedia Spotlight and Zemanta showed ability to disambiguate NE in the
LOD cloud

Experiments across categories of articles did not show significant
differences in the analysis.

Published the WEKEX'11 ground-truth
http://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz


Future Work (NERD Timeline)

beginning core application

uncontrolled experiment

controlled experiment

today REST API, release WEKEX'11 ground-truth

release ISWC'11 ground truth

NERD “smart” service: combining the best of all NER
tools


ISWC'11 golden-set

Do you believe it's easy to find
an agreement among all raters?

We'd like inviting to create a new golden-set during the
ISWC'2011 poster and demo session. We will kindly ask
each rater to evaluate two short parts of two English news
articles with all extractors supported by NERD


Thanks for your time and your attention

http://nerd.eurecom.fr

@giusepperizzo @rtroncy #nerd

http://www.slideshare.net/giusepperizzo


Fleiss ' Kappa

chance agreement

K = 1 fully agreement among all raters
K = 0 (or lesser than) poor agreement


Fleiss ' kappa Interpretation

Kappa Interpretation
<0 Poor agreement
0.01 – 0.20 Slight agreement
0.21 – 0.40 Fair agreement
0.41 – 0.60 Moderate agreement
0.61 – 0.80 Substantial agreement
0.81 – 1.00 Almost perfect agreement


NERD: Evaluating Named Entity Recognition Tools in the Web of Data

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to NERD: Evaluating Named Entity Recognition Tools in the Web of Data

Similar to NERD: Evaluating Named Entity Recognition Tools in the Web of Data (16)

More from Giuseppe Rizzo

More from Giuseppe Rizzo (20)

Recently uploaded

Recently uploaded (20)

NERD: Evaluating Named Entity Recognition Tools in the Web of Data