Presentation at ACL-IJCNLP 2009 of Cristina Mota & Ralph Grishman (2009a). “Updating a name tagger using contemporary unlabeled data.” Proc. of the Joint conference of the 47th Annual Meeting of the
ACL and the 4th IJCNLP of the AFNLP, August, 2009, Singapore.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Updating a Name Tagger Using Contemporary Unlabeled Data
1. Updating a Name Tagger Using
Contemporary Unlabeled Data
ACL-IJCNLP 2009
Singapore, August 3rd - 5th
Cristina Mota1,2 and Ralph Grishman2
1 IST & L2F INESC-ID (Portugal)
2 New York University (USA)
(Advisors: Ralph Grishman & Nuno Mamede)
This research was funded by Funda¸ao para a Ciˆncia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
c˜ e
2. Motivation 0.85
y=−0.00391x+0.82479
R2=0.3647
0.84
The performance of a co-trained
0.83
named entity tagger decreases as
F−measure
the time gap increases between
0.82
training and test sets (Mota &
0.81
Grishman, 2008)
0.80
0.79
0 1 2 3 4 5 6 7
Time gap (year)
Do we need to update the seeds or the unlabeled data?
Does more older data help?
3. Motivation 0.85
y=−0.00391x+0.82479
R2=0.3647
0.84
The performance of a co-trained
0.83
named entity tagger decreases as
F−measure
the time gap increases between
0.82
training and test sets (Mota &
0.81
Grishman, 2008)
0.80
0.79
0 1 2 3 4 5 6 7
Time gap (year)
Do we need to update the seeds or the unlabeled data?
Does more older data help?
4. Related Work
“More data are better data” (Church & Mercer, 1993)
Enlarge labeled data as a way of improving performance
Contemporary (labeled) data reduces out-of-vocabulary rates
Time-adaptive language model (Auzanne et al., 2000)
Generation of offline name lists (Palmer & Ostendorf, 2005)
Daily adaptation of the language model of a broadcast news
transcription system (Martins et al., 2006)
5. Data Sets
Data sets were drawn from the Politics section of CETEMP´blico
u
corpus (Santos & Rocha, 2001)
Language: Portuguese
Time span: 8 years (1991-1998)
Time gap: 1=6 months
For each six month period
Seeds (S): names collected from first 192 extracts∗
Test data (T): next 208 extracts
Unlabeled data (U): next 7856 extracts
∗
1 extract = app. 2 paragraphs
6. Named Entity Tagger
Unlabeled text
Based on a co-training classifier
Training (Collins & Singer, 1999)
Identification Includes propagation step
Needs few seeds and
Pairs (spelling features,
contextual features) performance is high (above
80%)
Seeds Co-training
Performance is parametrized by
combination of seeds,
Spelling +
contextual rules unlabeled set and test set:
(S,U,T)
Tagger is evaluated after
propagation with HAREM
scoring programs
7. Named Entity Tagger
Unlabeled text Test text
Based on a co-training classifier
Training Testing (Collins & Singer, 1999)
Identification Includes propagation step
Needs few seeds and
Pairs (spelling features,
contextual features) performance is high (above
80%)
Seeds Co-training
Performance is parametrized by
combination of seeds,
Spelling + Classification
contextual rules unlabeled set and test set:
Labeled Pairs
(S,U,T)
Tagger is evaluated after
Propagation propagation with HAREM
scoring programs
Text with classified NE
8. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui
Experiment 1: Baseline (vary seeds and unlabeled data
synchronously as in Mota & Grishman (2008))
9. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui
Experiment 1: Baseline (vary seeds and unlabeled data
synchronously as in Mota & Grishman (2008))
10. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui
Experiment 1: Baseline (vary seeds and unlabeled data
synchronously as in Mota & Grishman (2008))
11. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui
Experiment 1: Baseline (vary seeds and unlabeled data
synchronously as in Mota & Grishman (2008))
12. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui
Experiment 1: Baseline (vary seeds and unlabeled data
synchronously as in Mota & Grishman (2008))
13. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui
Experiment 1: Baseline (vary seeds and unlabeled data
synchronously as in Mota & Grishman (2008))
14. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui
Experiment 1: Baseline (vary seeds and unlabeled data
synchronously as in Mota & Grishman (2008))
15. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui
Experiment 1: Baseline (vary seeds and unlabeled data
synchronously as in Mota & Grishman (2008))
16. Update seeds or unlabeled data?
F−measure
0.84
0.82
0.80
Performance decays as the
time gap increases (Mota &
0.78
Grishman, 2008)
(i,i,98b)
0.76
(98b,i,98b)
(i,98b,98b)
0.74
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Training epoch
v v v v v v v v v v v v v v v v v v v v v v v v
17. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui
Experiment 2: Update seeds (vary unlabeled data but use
contemporary seeds)
18. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui
Experiment 2: Update seeds (vary unlabeled data but use
contemporary seeds)
19. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui
Experiment 2: Update seeds (vary unlabeled data but use
contemporary seeds)
20. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui
Experiment 2: Update seeds (vary unlabeled data but use
contemporary seeds)
21. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui
Experiment 2: Update seeds (vary unlabeled data but use
contemporary seeds)
22. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui
Experiment 2: Update seeds (vary unlabeled data but use
contemporary seeds)
23. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui
Experiment 2: Update seeds (vary unlabeled data but use
contemporary seeds)
24. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui
Experiment 2: Update seeds (vary unlabeled data but use
contemporary seeds)
25. Update seeds or unlabeled data?
F−measure
0.84
0.82
0.80
Contemporary seeds slightly
attenuate the decrease
0.78
(i,i,98b)
0.76
(98b,i,98b)
(i,98b,98b)
0.74
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Training epoch
v v v v v v v v v v v v v v v v v v v v v v v v
26. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Un
Experiment 3: Update unlabeled data (vary seeds but use
contemporary unlabeled data)
27. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Un
Experiment 3: Update unlabeled data (vary seeds but use
contemporary unlabeled data)
28. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Un
Experiment 3: Update unlabeled data (vary seeds but use
contemporary unlabeled data)
29. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Un
Experiment 3: Update unlabeled data (vary seeds but use
contemporary unlabeled data)
30. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Un
Experiment 3: Update unlabeled data (vary seeds but use
contemporary unlabeled data)
31. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Un
Experiment 3: Update unlabeled data (vary seeds but use
contemporary unlabeled data)
32. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Un
Experiment 3: Update unlabeled data (vary seeds but use
contemporary unlabeled data)
33. Update seeds or unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Un
Experiment 3: Update unlabeled data (vary seeds but use
contemporary unlabeled data)
34. Updating the unlabeled data is better than
updating the seeds
0.84
0.82
0.80
F−measure
Contemporary unlabeled data
maintain the performance
0.78
(i,i,98b)
0.76
(98b,i,98b)
(i,98b,98b)
0.74
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Training epoch
v v v v v v v v v v v v v v v v v v v v v v v v
35. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui
Experiment 4: Enlarge unlabeled data with older data and
use contemporary seeds
36. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui Ui
Experiment 4: Enlarge unlabeled data with older data and
use contemporary seeds
37. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui Ui Ui
Experiment 4: Enlarge unlabeled data with older data and
use contemporary seeds
38. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui Ui Ui Ui
Experiment 4: Enlarge unlabeled data with older data and
use contemporary seeds
39. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui Ui Ui Ui Ui
Experiment 4: Enlarge unlabeled data with older data and
use contemporary seeds
40. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui Ui Ui Ui Ui Ui
Experiment 4: Enlarge unlabeled data with older data and
use contemporary seeds
41. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Sn
Unlabeled
examples Ui Ui Ui Ui Ui Ui Ui
Experiment 4: Enlarge unlabeled data with older data and
use contemporary seeds
42. Augment unlabeled data?
F−measure
0.84
0.82
0.80
0.78
Larger amounts of older
(i,98b,98b) unlabeled data does not always
0.76
(i,u[i,...,98a],98b)
(98b,u[i,...,98a],98b) result in better performance
0.74
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Time frame (semester)
Green line: Same seeds for all taggers (98b);
unlabeled data is enlarging backwards
Blue line: Different seeds for each tagger; same
unlabeled data for all taggers (98b)
43. Augment unlabeled data?
F−measure
0.84
0.82
0.80
0.78
Larger amounts of older
(i,98b,98b) unlabeled data does not always
0.76
(i,u[i,...,98a],98b)
(98b,u[i,...,98a],98b) result in better performance
0.74
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Time frame (semester)
Green line: Same seeds for all taggers (98b);
unlabeled data is enlarging backwards
Blue line: Different seeds for each tagger; same
unlabeled data for all taggers (98b)
44. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui
Experiment 5: Enlarge the size of unlabeled data and vary
seeds
45. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui Ui
Experiment 5: Enlarge the size of unlabeled data and vary
seeds
46. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and vary
seeds
47. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and vary
seeds
48. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui Ui Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and vary
seeds
49. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui Ui Ui Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and vary
seeds
50. Augment unlabeled data?
91a Timeline 98b
Test Tn
Seeds Si
Unlabeled
examples Ui Ui Ui Ui Ui Ui Ui
Experiment 5: Enlarge the size of unlabeled data and vary
seeds
51. Updating the unlabeled data is better than
accumulating older unlabeled data
0.84
0.82
0.80
F−measure
Larger amounts of unlabeled
data is worse than training with
0.78
contemporary unlabeled data
(i,98b,98b)
0.76
(i,u[i,...,98a],98b)
(98b,u[i,...,98a],98b)
Larger amounts of unlabeled
data does not outperform the
0.74
tagger trained with
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Time frame (semester) contemporary seeds and
unlabeled data
Violet line: Seeds in the same time frame as
unlabeled set being added; unlabeled data is
enlarging backwards
Blue line: Seeds are the same as in the violet
line; same unlabeled data for all taggers (98b)
Green line: Same seeds for all taggers (98b);
unlabeled data is enlarging backwards
52. Updating the unlabeled data is better than
accumulating older unlabeled data
0.84
0.82
0.80
F−measure
Larger amounts of unlabeled
data is worse than training with
0.78
contemporary unlabeled data
(i,98b,98b)
0.76
(i,u[i,...,98a],98b)
(98b,u[i,...,98a],98b)
Larger amounts of unlabeled
data does not outperform the
0.74
tagger trained with
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Time frame (semester) contemporary seeds and
unlabeled data
Violet line: Seeds in the same time frame as
unlabeled set being added; unlabeled data is
enlarging backwards
Blue line: Seeds are the same as in the violet
line; same unlabeled data for all taggers (98b)
Green line: Same seeds for all taggers (98b);
unlabeled data is enlarging backwards
53. Updating the unlabeled data is better than
accumulating older unlabeled data
0.84
0.82
0.80
F−measure
Larger amounts of unlabeled
data is worse than training with
0.78
contemporary unlabeled data
(i,98b,98b)
0.76
(i,u[i,...,98a],98b)
(98b,u[i,...,98a],98b)
Larger amounts of unlabeled
data does not outperform the
0.74
tagger trained with
91a
91b
92a
92b
93a
93b
94a
94b
95a
95b
96a
96b
97a
97b
98a
98b
Time frame (semester) contemporary seeds and
unlabeled data
Violet line: Seeds in the same time frame as
unlabeled set being added; unlabeled data is
enlarging backwards
Blue line: Seeds are the same as in the violet
line; same unlabeled data for all taggers (98b)
Green line: Same seeds for all taggers (98b);
unlabeled data is enlarging backwards
54. Final remarks
Contemporary unlabeled data are better data
But...
Why doesn’t the labeled data impact the performance more?
Are other semi-supervised approaches also sensitive?
55. Acknowledgments
This research work was funded by Funda¸˜o para a Ciˆncia e a
ca e
Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
56. Updating a Name Tagger Using
Contemporary Unlabeled Data
ACL-IJCNLP 2009
Singapore, August 3rd - 5th
Cristina Mota1,2 and Ralph Grishman2
1 IST & L2F INESC-ID (Portugal)
2 New York University (USA)
(Advisors: Ralph Grishman & Nuno Mamede)
This research was funded by Funda¸ao para a Ciˆncia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
c˜ e
57. Example of (miss)classification
Test set 98b includes two instances of “Tizi Ouzou”:
Tizi Ouzou tem (en: Tizi Ouzou has)
manifesta¸oes em Tizi Ouzou (en: demonstrations in Tizi Ouzou)
c˜
Does not occur in u 91a so depends on contexts:
(”n v” ”tem”) ORGANIZATION 0.52
(”type” ”nprop v”) PERSON 0.43
(”len” 2) PERSON 0.62
But occurs in u 98b:
noite em Tizi (en: night in Tizi)
ruas de Tizi Ouzou (en: street of Tizi Ouzou)
ir a Tizi-Ouzou (en: go to Tizi Ouzou)
58. NE tagger: Identification
Portuguese Raw text
1 Elisa Ferreira come¸ou por
c
dictionary
criticar Cavaco Silva
Priority Lexical analysis
dictionaries
2 [Elisa Ferreira]SEQM [come¸ou c
Morphological
grammars por
Chunking Chunking
criticar]V +Complexo+Pred=criticar
grammars
[Cavaco Silva]SEQM
NE + context
grammars
NE + context identification Text with unclassified NE
3 [Elisa Ferreira]nprop v+criticar
Pairs (NE,context)
come¸ou por criticar [Cavaco
c
Silva]v nprop+criticar
Identification designed with NooJ
4 [Elisa Ferreira]nprop v+criticar
(Silberztein, 2004)
[Cavaco Silva]v nprop+criticar
59. NE tagger: Classification
List of examples Seeds
Label with name rules
Labeled examples
Infer context rules
Spelling features ← SEEDS: (Elisa
Context
Ferreira,PESSOA,0.9999)
rules
Label with context rules
Labeled examples
1 LABEL: Elisa Ferreira,criticar ← PESSOA
Infer name rules 2 INFER: (criticar,PESSOA,0.98)
Name
rules
3 LABEL: Cavaco Silva,criticar ← PESSOA
Label with
name + context rules
4 INFER: (Silva,PESSOA,0.97)
Labeled examples
5 REPEAT
Infer
name + context rules
Name + context
rules
60. NE tagger performance decreases over time (Mota & Grishman, 2008)
Detailed analysis using six-month periods (instead of periods of 1 year)
y=−0.00232x+0.79906
R2=0.2376
0.82
(Si , Ui , Tj )
R2
0.80
a b
F−measure
P 0.827 -0.0024 0.24824
0.78
R 0.773 -0.0022 0.19393
0.76
F 0.799 -0.0023 0.23765
0.74
0 5 10 15
Time gap (1=6 months)
The performance decreases at an estimated rate of:
0.00232 in F-measure each 6 months (0.0348 after 8 years)
The low R-squared values show that not all variation is attributable
to increasing the time gap
61. Updating the unlabeled data is better than
updating the seeds (Complete training-test configurations)
y=−0.00232x+0.79906
R2=0.2376
0.82
Update? a b R2
0.80
F−measure
No 0.799 -0.0023 0.238
0.78
Seeds 0.800 -0.0019 0.192
Unlabeled 0.807 -0.0005 0.019
0.76
0.74
0 5 10 15
Time gap (1=6 months)
62. Updating the unlabeled data is better than
updating the seeds (Complete training-test configurations)
y=−0.00189x+0.80025
R2=0.1917
0.82
Update? a b R2
0.80
F−measure
No 0.799 -0.0023 0.238
Seeds 0.800 -0.0019 0.192
0.78
Unlabeled 0.807 -0.0005 0.019
0.76
0 5 10 15
Time gap (1=6 months)
63. Updating the unlabeled data is better than
updating the seeds (Complete training-test configurations)
y=−0.00051x+0.80769
R2=0.0189
0.83
0.82
Update? a b R2
0.81
F−measure
No 0.799 -0.0023 0.238
0.80
Seeds 0.800 -0.0019 0.192
0.79
Unlabeled 0.807 -0.0005 0.019
0.78
0.77
0 5 10 15
Time gap (1=6 months)