Building a Graph of Names and Contextual Patterns
for Named Entity Classification
César de Pablo Sánchez and Paloma Martínez
Computer Science Department
Universidad Carlos III de Madrid
31st European Conference on Information Retrieval
Tolouse, 6-9 April
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
Motivation and Objectives
NERC for multilingual applications: annotation is the bottleneck
Bootstrap names lists and indicative patterns for classification
Large document collection
NE classes of interest (PERSON, LOCATION, ORG . . . )
Name seeds for every class
(Towards) language independence (regexp, stopwords)
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
Assumptions
Dual bootstrapping: entities → patterns → entities
One sense per entity type (name)
Counter-training: learn several classes at once
Query based exploration of the indexed collection.
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
Pattern expansion
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
Entity expansion
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
Results: Direct Evaluation
Language: Spanish
Collection: EFE 94 95 1GB newswire (CLEF)
NE Classes: PLO (PERSON, LOCATION, ORG), +MISC, +TEAM
seeds per class < 40 , <1h work/person
Evaluation: sample acquired name lists
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
Results: Direct Evaluation
Model PER LOC ORG M / T Mean
PLO 94.8 52.7 67.1 – 71.5
PLOM 93.0 44.8 79.3 75.0 73.0
PLOT 94.8 87.4 81.1 40.9 76.0
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
Results: Name Classification
Indirect Evaluation
Evaluation: CONLL 2002 Shared Task, Spanish EFE 2000
Model P R F Acc
baseline
CONLL 26.27 56.48 35.86 –
ORG – – – 39.34
entities
PLO 77.33 54.34 63.83 64.04
PLOM 78.85 51.53 62.36 66.24
PLOT 78.72 41.58 54.42 62.18
entities+patterns
PLO 66.12 57.97 61.78 63.17
PLOM 73.65 61.73 67.17 71.29
PLOT 66.35 56.62 61.10 62.50
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
Conclusions and Future Work
Efficient bootstrapping from large indexed collections with less
seeds
Already useful for NERC, performance is lower than supervised
machine learning
More classes improves precision, not always recall
Future work:
Other languages and domains
Complex semantic models
Language independence and NE Recognition
César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla

Building a Graph of Names and Contextual Patterns for Named Entity Classification

  • 1.
    Building a Graphof Names and Contextual Patterns for Named Entity Classification César de Pablo Sánchez and Paloma Martínez Computer Science Department Universidad Carlos III de Madrid 31st European Conference on Information Retrieval Tolouse, 6-9 April César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
  • 2.
    Motivation and Objectives NERCfor multilingual applications: annotation is the bottleneck Bootstrap names lists and indicative patterns for classification Large document collection NE classes of interest (PERSON, LOCATION, ORG . . . ) Name seeds for every class (Towards) language independence (regexp, stopwords) César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
  • 3.
    Assumptions Dual bootstrapping: entities→ patterns → entities One sense per entity type (name) Counter-training: learn several classes at once Query based exploration of the indexed collection. César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
  • 4.
    Pattern expansion César dePablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
  • 5.
    Entity expansion César dePablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
  • 6.
    Results: Direct Evaluation Language:Spanish Collection: EFE 94 95 1GB newswire (CLEF) NE Classes: PLO (PERSON, LOCATION, ORG), +MISC, +TEAM seeds per class < 40 , <1h work/person Evaluation: sample acquired name lists César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
  • 7.
    Results: Direct Evaluation ModelPER LOC ORG M / T Mean PLO 94.8 52.7 67.1 – 71.5 PLOM 93.0 44.8 79.3 75.0 73.0 PLOT 94.8 87.4 81.1 40.9 76.0 César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
  • 8.
    Results: Name Classification IndirectEvaluation Evaluation: CONLL 2002 Shared Task, Spanish EFE 2000 Model P R F Acc baseline CONLL 26.27 56.48 35.86 – ORG – – – 39.34 entities PLO 77.33 54.34 63.83 64.04 PLOM 78.85 51.53 62.36 66.24 PLOT 78.72 41.58 54.42 62.18 entities+patterns PLO 66.12 57.97 61.78 63.17 PLOM 73.65 61.73 67.17 71.29 PLOT 66.35 56.62 61.10 62.50 César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla
  • 9.
    Conclusions and FutureWork Efficient bootstrapping from large indexed collections with less seeds Already useful for NERC, performance is lower than supervised machine learning More classes improves precision, not always recall Future work: Other languages and domains Complex semantic models Language independence and NE Recognition César de Pablo Sánchez and Paloma Martínez Building a Graph of Names and Contextual Patterns for Named Entity Cla