Authors: César de Pablo Sánchez and Paloma Martínez
31st European Conference on Information Retrieval, Tolouse, France (April 6-9, 2009)
Building a Graph of Names and Contextual Patterns for Named Entity Classification
Building a Graph of Names and Contextual Patterns for Named Entity Classification
1. Building a Graph of Names and Contextual Patterns
for Named Entity Classification
César de Pablo Sánchez and Paloma Martínez
Computer Science Department
Universidad Carlos III de Madrid
31st European Conference on Information Retrieval
Tolouse, 6-9 April
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
2. Motivation and Objectives
NERC for multilingual applications: annotation is the bottleneck
Bootstrap names lists and indicative patterns for classification
Large document collection
NE classes of interest (PERSON, LOCATION, ORG . . . )
Name seeds for every class
(Towards) language independence (regexp, stopwords)
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
3. Assumptions
Dual bootstrapping: entities → patterns → entities
One sense per entity type (name)
Counter-training: learn several classes at once
Query based exploration of the indexed collection.
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
4. Pattern expansion
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
5. Entity expansion
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
6. Results: Direct Evaluation
Language: Spanish
Collection: EFE 94 95 1GB newswire (CLEF)
NE Classes: PLO (PERSON, LOCATION, ORG), +MISC, +TEAM
seeds per class < 40 , <1h work/person
Evaluation: sample acquired name lists
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
7. Results: Direct Evaluation
Model PER LOC ORG M / T Mean
PLO 94.8 52.7 67.1 – 71.5
PLOM 93.0 44.8 79.3 75.0 73.0
PLOT 94.8 87.4 81.1 40.9 76.0
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
8. Results: Name Classification
Indirect Evaluation
Evaluation: CONLL 2002 Shared Task, Spanish EFE 2000
Model P R F Acc
baseline
CONLL 26.27 56.48 35.86 –
ORG – – – 39.34
entities
PLO 77.33 54.34 63.83 64.04
PLOM 78.85 51.53 62.36 66.24
PLOT 78.72 41.58 54.42 62.18
entities+patterns
PLO 66.12 57.97 61.78 63.17
PLOM 73.65 61.73 67.17 71.29
PLOT 66.35 56.62 61.10 62.50
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
9. Conclusions and Future Work
Efficient bootstrapping from large indexed collections with less
seeds
Already useful for NERC, performance is lower than supervised
machine learning
More classes improves precision, not always recall
Future work:
Other languages and domains
Complex semantic models
Language independence and NE Recognition
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla