Updating a Name Tagger Using Contemporary Unlabeled Data

1. Updating a Name Tagger Using Contemporary Unlabeled Data ACL-IJCNLP 2009 Singapore, August 3rd - 5th Cristina Mota1,2 and Ralph Grishman2 1 IST & L2F INESC-ID (Portugal) 2 New York University (USA) (Advisors: Ralph Grishman & Nuno Mamede) This research was funded by Funda¸ao para a Ciˆncia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000) c˜ e

2. Motivation 0.85 y=−0.00391x+0.82479 R2=0.3647 0.84 The performance of a co-trained 0.83 named entity tagger decreases as F−measure the time gap increases between 0.82 training and test sets (Mota & 0.81 Grishman, 2008) 0.80 0.79 0 1 2 3 4 5 6 7 Time gap (year) Do we need to update the seeds or the unlabeled data? Does more older data help?

3. Motivation 0.85 y=−0.00391x+0.82479 R2=0.3647 0.84 The performance of a co-trained 0.83 named entity tagger decreases as F−measure the time gap increases between 0.82 training and test sets (Mota & 0.81 Grishman, 2008) 0.80 0.79 0 1 2 3 4 5 6 7 Time gap (year) Do we need to update the seeds or the unlabeled data? Does more older data help?

4. Related Work “More data are better data” (Church & Mercer, 1993) Enlarge labeled data as a way of improving performance Contemporary (labeled) data reduces out-of-vocabulary rates Time-adaptive language model (Auzanne et al., 2000) Generation of oﬄine name lists (Palmer & Ostendorf, 2005) Daily adaptation of the language model of a broadcast news transcription system (Martins et al., 2006)

5. Data Sets Data sets were drawn from the Politics section of CETEMP´blico u corpus (Santos & Rocha, 2001) Language: Portuguese Time span: 8 years (1991-1998) Time gap: 1=6 months For each six month period Seeds (S): names collected from ﬁrst 192 extracts∗ Test data (T): next 208 extracts Unlabeled data (U): next 7856 extracts ∗ 1 extract = app. 2 paragraphs

6. Named Entity Tagger Unlabeled text Based on a co-training classiﬁer Training (Collins & Singer, 1999) Identiﬁcation Includes propagation step Needs few seeds and Pairs (spelling features, contextual features) performance is high (above 80%) Seeds Co-training Performance is parametrized by combination of seeds, Spelling + contextual rules unlabeled set and test set: (S,U,T) Tagger is evaluated after propagation with HAREM scoring programs

7. Named Entity Tagger Unlabeled text Test text Based on a co-training classifier Training Testing (Collins & Singer, 1999) Identification Includes propagation step Needs few seeds and Pairs (spelling features, contextual features) performance is high (above 80%) Seeds Co-training Performance is parametrized by combination of seeds, Spelling + Classification contextual rules unlabeled set and test set: Labeled Pairs (S,U,T) Tagger is evaluated after Propagation propagation with HAREM scoring programs Text with classified NE

8. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Experiment 1: Baseline (vary seeds and unlabeled data synchronously as in Mota & Grishman (2008))

16. Update seeds or unlabeled data? F−measure 0.84 0.82 0.80 Performance decays as the time gap increases (Mota & 0.78 Grishman, 2008) (i,i,98b) 0.76 (98b,i,98b) (i,98b,98b) 0.74 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Training epoch v v v v v v v v v v v v v v v v v v v v v v v v

17. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Experiment 2: Update seeds (vary unlabeled data but use contemporary seeds)

25. Update seeds or unlabeled data? F−measure 0.84 0.82 0.80 Contemporary seeds slightly attenuate the decrease 0.78 (i,i,98b) 0.76 (98b,i,98b) (i,98b,98b) 0.74 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Training epoch v v v v v v v v v v v v v v v v v v v v v v v v

26. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Un Experiment 3: Update unlabeled data (vary seeds but use contemporary unlabeled data)

34. Updating the unlabeled data is better than updating the seeds 0.84 0.82 0.80 F−measure Contemporary unlabeled data maintain the performance 0.78 (i,i,98b) 0.76 (98b,i,98b) (i,98b,98b) 0.74 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Training epoch v v v v v v v v v v v v v v v v v v v v v v v v

35. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds

36. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds

37. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Ui Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds

38. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Ui Ui Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds

39. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Ui Ui Ui Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds

40. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Ui Ui Ui Ui Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds

41. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Ui Ui Ui Ui Ui Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds

42. Augment unlabeled data? F−measure 0.84 0.82 0.80 0.78 Larger amounts of older (i,98b,98b) unlabeled data does not always 0.76 (i,u[i,...,98a],98b) (98b,u[i,...,98a],98b) result in better performance 0.74 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Time frame (semester) Green line: Same seeds for all taggers (98b); unlabeled data is enlarging backwards Blue line: Diﬀerent seeds for each tagger; same unlabeled data for all taggers (98b)

43. Augment unlabeled data? F−measure 0.84 0.82 0.80 0.78 Larger amounts of older (i,98b,98b) unlabeled data does not always 0.76 (i,u[i,...,98a],98b) (98b,u[i,...,98a],98b) result in better performance 0.74 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Time frame (semester) Green line: Same seeds for all taggers (98b); unlabeled data is enlarging backwards Blue line: Diﬀerent seeds for each tagger; same unlabeled data for all taggers (98b)

44. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds

45. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds

46. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Ui Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds

47. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Ui Ui Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds

48. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Ui Ui Ui Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds

49. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Ui Ui Ui Ui Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds

50. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Ui Ui Ui Ui Ui Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds

51. Updating the unlabeled data is better than accumulating older unlabeled data 0.84 0.82 0.80 F−measure Larger amounts of unlabeled data is worse than training with 0.78 contemporary unlabeled data (i,98b,98b) 0.76 (i,u[i,...,98a],98b) (98b,u[i,...,98a],98b) Larger amounts of unlabeled data does not outperform the 0.74 tagger trained with 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Time frame (semester) contemporary seeds and unlabeled data Violet line: Seeds in the same time frame as unlabeled set being added; unlabeled data is enlarging backwards Blue line: Seeds are the same as in the violet line; same unlabeled data for all taggers (98b) Green line: Same seeds for all taggers (98b); unlabeled data is enlarging backwards

54. Final remarks Contemporary unlabeled data are better data But... Why doesn’t the labeled data impact the performance more? Are other semi-supervised approaches also sensitive?

55. Acknowledgments This research work was funded by Funda¸˜o para a Ciˆncia e a ca e Tecnologia (doctoral scholarship SFRH/BD/3237/2000)

56. Updating a Name Tagger Using Contemporary Unlabeled Data ACL-IJCNLP 2009 Singapore, August 3rd - 5th Cristina Mota1,2 and Ralph Grishman2 1 IST & L2F INESC-ID (Portugal) 2 New York University (USA) (Advisors: Ralph Grishman & Nuno Mamede) This research was funded by Funda¸ao para a Ciˆncia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000) c˜ e

57. Example of (miss)classiﬁcation Test set 98b includes two instances of “Tizi Ouzou”: Tizi Ouzou tem (en: Tizi Ouzou has) manifesta¸oes em Tizi Ouzou (en: demonstrations in Tizi Ouzou) c˜ Does not occur in u 91a so depends on contexts: (”n v” ”tem”) ORGANIZATION 0.52 (”type” ”nprop v”) PERSON 0.43 (”len” 2) PERSON 0.62 But occurs in u 98b: noite em Tizi (en: night in Tizi) ruas de Tizi Ouzou (en: street of Tizi Ouzou) ir a Tizi-Ouzou (en: go to Tizi Ouzou)

58. NE tagger: Identification Portuguese Raw text 1 Elisa Ferreira come¸ou por c dictionary criticar Cavaco Silva Priority Lexical analysis dictionaries 2 [Elisa Ferreira]SEQM [come¸ou c Morphological grammars por Chunking Chunking criticar]V +Complexo+Pred=criticar grammars [Cavaco Silva]SEQM NE + context grammars NE + context identification Text with unclassified NE 3 [Elisa Ferreira]nprop v+criticar Pairs (NE,context) come¸ou por criticar [Cavaco c Silva]v nprop+criticar Identification designed with NooJ 4 [Elisa Ferreira]nprop v+criticar (Silberztein, 2004) [Cavaco Silva]v nprop+criticar

59. NE tagger: Classiﬁcation List of examples Seeds Label with name rules Labeled examples Infer context rules Spelling features ← SEEDS: (Elisa Context Ferreira,PESSOA,0.9999) rules Label with context rules Labeled examples 1 LABEL: Elisa Ferreira,criticar ← PESSOA Infer name rules 2 INFER: (criticar,PESSOA,0.98) Name rules 3 LABEL: Cavaco Silva,criticar ← PESSOA Label with name + context rules 4 INFER: (Silva,PESSOA,0.97) Labeled examples 5 REPEAT Infer name + context rules Name + context rules

60. NE tagger performance decreases over time (Mota & Grishman, 2008) Detailed analysis using six-month periods (instead of periods of 1 year) y=−0.00232x+0.79906 R2=0.2376 0.82 (Si , Ui , Tj ) R2 0.80 a b F−measure P 0.827 -0.0024 0.24824 0.78 R 0.773 -0.0022 0.19393 0.76 F 0.799 -0.0023 0.23765 0.74 0 5 10 15 Time gap (1=6 months) The performance decreases at an estimated rate of: 0.00232 in F-measure each 6 months (0.0348 after 8 years) The low R-squared values show that not all variation is attributable to increasing the time gap

61. Updating the unlabeled data is better than updating the seeds (Complete training-test conﬁgurations) y=−0.00232x+0.79906 R2=0.2376 0.82 Update? a b R2 0.80 F−measure No 0.799 -0.0023 0.238 0.78 Seeds 0.800 -0.0019 0.192 Unlabeled 0.807 -0.0005 0.019 0.76 0.74 0 5 10 15 Time gap (1=6 months)

62. Updating the unlabeled data is better than updating the seeds (Complete training-test conﬁgurations) y=−0.00189x+0.80025 R2=0.1917 0.82 Update? a b R2 0.80 F−measure No 0.799 -0.0023 0.238 Seeds 0.800 -0.0019 0.192 0.78 Unlabeled 0.807 -0.0005 0.019 0.76 0 5 10 15 Time gap (1=6 months)

63. Updating the unlabeled data is better than updating the seeds (Complete training-test conﬁgurations) y=−0.00051x+0.80769 R2=0.0189 0.83 0.82 Update? a b R2 0.81 F−measure No 0.799 -0.0023 0.238 0.80 Seeds 0.800 -0.0019 0.192 0.79 Unlabeled 0.807 -0.0005 0.019 0.78 0.77 0 5 10 15 Time gap (1=6 months)

64. Confusion matrices 91a 335 12 22 330 16 20 393 12 22 52 453 79 52 456 69 12 463 38 23 21 330 28 14 342 5 11 371 92b 368 19 42 368 16 40 391 11 22 19 435 55 23 445 39 14 463 29 23 32 334 19 25 352 5 12 380 95b 375 14 34 387 14 30 394 12 26 22 465 78 13 461 73 12 463 43 13 7 319 10 11 328 4 11 362 98a 390 16 31 386 16 28 395 11 28 11 458 58 13 460 48 11 464 39 9 12 342 11 10 355 4 11 364 98b 394 9 20 394 9 20 394 9 20 8 467 29 8 467 29 8 467 29 8 10 382 8 10 382 8 10 382

Updating a Name Tagger Using Contemporary Unlabeled Data

Recommended

Recommended

More Related Content

Featured

Featured (20)

Updating a Name Tagger Using Contemporary Unlabeled Data