SlideShare a Scribd company logo
1 of 64
Download to read offline
Updating a Name Tagger Using
                      Contemporary Unlabeled Data
                                     ACL-IJCNLP 2009
                                 Singapore, August 3rd - 5th


                    Cristina Mota1,2 and Ralph Grishman2

                                 1 IST   & L2F INESC-ID (Portugal)
                                    2 New  York University (USA)

                                (Advisors: Ralph Grishman & Nuno Mamede)




This research was funded by Funda¸ao para a Ciˆncia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
                                 c˜           e
Motivation   0.85




                                                   y=−0.00391x+0.82479
                                                             R2=0.3647
             0.84




                                                                         The performance of a co-trained
             0.83




                                                                         named entity tagger decreases as
 F−measure




                                                                         the time gap increases between
             0.82




                                                                         training and test sets (Mota &
             0.81




                                                                         Grishman, 2008)
             0.80
             0.79




                    0    1   2      3      4       5        6        7
                                 Time gap (year)




                        Do we need to update the seeds or the unlabeled data?
                        Does more older data help?
Motivation   0.85




                                                   y=−0.00391x+0.82479
                                                             R2=0.3647
             0.84




                                                                         The performance of a co-trained
             0.83




                                                                         named entity tagger decreases as
 F−measure




                                                                         the time gap increases between
             0.82




                                                                         training and test sets (Mota &
             0.81




                                                                         Grishman, 2008)
             0.80
             0.79




                    0    1   2      3      4       5        6        7
                                 Time gap (year)




                        Do we need to update the seeds or the unlabeled data?
                        Does more older data help?
Related Work




      “More data are better data” (Church & Mercer, 1993)
      Enlarge labeled data as a way of improving performance
      Contemporary (labeled) data reduces out-of-vocabulary rates
           Time-adaptive language model (Auzanne et al., 2000)
           Generation of offline name lists (Palmer & Ostendorf, 2005)
           Daily adaptation of the language model of a broadcast news
           transcription system (Martins et al., 2006)
Data Sets



   Data sets were drawn from the Politics section of CETEMP´blico
                                                           u
   corpus (Santos & Rocha, 2001)
             Language: Portuguese
             Time span: 8 years (1991-1998)
             Time gap: 1=6 months
             For each six month period
                     Seeds (S): names collected from first 192 extracts∗
                     Test data (T): next 208 extracts
                     Unlabeled data (U): next 7856 extracts
   ∗
       1 extract = app. 2 paragraphs
Named Entity Tagger


         Unlabeled text
                                               Based on a co-training classifier
        Training                               (Collins & Singer, 1999)
                        Identification          Includes propagation step
                                               Needs few seeds and
                   Pairs (spelling features,
                    contextual features)       performance is high (above
                                               80%)
Seeds     Co-training
                                               Performance is parametrized by
                                               combination of seeds,
          Spelling +
        contextual rules                       unlabeled set and test set:
                                               (S,U,T)
                                               Tagger is evaluated after
                                               propagation with HAREM
                                               scoring programs
Named Entity Tagger


         Unlabeled text                  Test text
                                                            Based on a co-training classifier
        Training                                 Testing    (Collins & Singer, 1999)
                        Identification                       Includes propagation step
                                                            Needs few seeds and
                   Pairs (spelling features,
                    contextual features)                    performance is high (above
                                                            80%)
Seeds     Co-training
                                                            Performance is parametrized by
                                                            combination of seeds,
          Spelling +                    Classification
        contextual rules                                    unlabeled set and test set:
                                        Labeled Pairs
                                                            (S,U,T)
                                                            Tagger is evaluated after
                                        Propagation         propagation with HAREM
                                                            scoring programs
                                   Text with classified NE
Update seeds or unlabeled data?



                91a            Timeline          98b


      Test                                       Tn


     Seeds                                        Si


    Unlabeled
    examples                                     Ui




        Experiment 1: Baseline (vary seeds and unlabeled data
        synchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?



                91a            Timeline           98b


      Test                                        Tn


     Seeds                                   Si


    Unlabeled
    examples                                 Ui




        Experiment 1: Baseline (vary seeds and unlabeled data
        synchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?



                91a            Timeline          98b


      Test                                       Tn


     Seeds                                Si


    Unlabeled
    examples                              Ui




        Experiment 1: Baseline (vary seeds and unlabeled data
        synchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?



                91a            Timeline          98b


      Test                                       Tn


     Seeds                           Si


    Unlabeled
    examples                         Ui




        Experiment 1: Baseline (vary seeds and unlabeled data
        synchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?



                91a            Timeline          98b


      Test                                       Tn


     Seeds                    Si


    Unlabeled
    examples                  Ui




        Experiment 1: Baseline (vary seeds and unlabeled data
        synchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?



                91a            Timeline          98b


      Test                                       Tn


     Seeds               Si


    Unlabeled
    examples             Ui




        Experiment 1: Baseline (vary seeds and unlabeled data
        synchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?



                91a            Timeline          98b


      Test                                       Tn


     Seeds            Si


    Unlabeled
    examples          Ui




        Experiment 1: Baseline (vary seeds and unlabeled data
        synchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?



                91a            Timeline          98b


      Test                                       Tn


     Seeds      Si


    Unlabeled
    examples    Ui




        Experiment 1: Baseline (vary seeds and unlabeled data
        synchronously as in Mota & Grishman (2008))
Update seeds or unlabeled data?
 F−measure
             0.84
             0.82
             0.80




                                                                                                                     Performance decays as the
                                                                                                                     time gap increases (Mota &
             0.78




                                                                                                                     Grishman, 2008)
                                                                                      (i,i,98b)
             0.76




                                                                                      (98b,i,98b)
                                                                                      (i,98b,98b)
             0.74

                    91a
                          91b
                                92a
                                      92b
                                            93a
                                                  93b
                                                        94a
                                                              94b
                                                                    95a
                                                                          95b
                                                                                96a
                                                                                       96b
                                                                                             97a
                                                                                                   97b
                                                                                                         98a
                                                                                                               98b




                                                   Training epoch

                    v v v v v v v v v v v v v v v v v v v v v v v v
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                                       Sn


    Unlabeled
    examples                                     Ui




        Experiment 2: Update seeds (vary unlabeled data but use
        contemporary seeds)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                                       Sn


    Unlabeled
    examples                                Ui




        Experiment 2: Update seeds (vary unlabeled data but use
        contemporary seeds)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                                       Sn


    Unlabeled
    examples                             Ui




        Experiment 2: Update seeds (vary unlabeled data but use
        contemporary seeds)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                                       Sn


    Unlabeled
    examples                        Ui




        Experiment 2: Update seeds (vary unlabeled data but use
        contemporary seeds)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                                       Sn


    Unlabeled
    examples                  Ui




        Experiment 2: Update seeds (vary unlabeled data but use
        contemporary seeds)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                                       Sn


    Unlabeled
    examples             Ui




        Experiment 2: Update seeds (vary unlabeled data but use
        contemporary seeds)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                                       Sn


    Unlabeled
    examples          Ui




        Experiment 2: Update seeds (vary unlabeled data but use
        contemporary seeds)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                                       Sn


    Unlabeled
    examples    Ui




        Experiment 2: Update seeds (vary unlabeled data but use
        contemporary seeds)
Update seeds or unlabeled data?
 F−measure
             0.84
             0.82
             0.80




                                                                                                                     Contemporary seeds slightly
                                                                                                                     attenuate the decrease
             0.78




                                                                                      (i,i,98b)
             0.76




                                                                                      (98b,i,98b)
                                                                                      (i,98b,98b)
             0.74

                    91a
                          91b
                                92a
                                      92b
                                            93a
                                                  93b
                                                        94a
                                                              94b
                                                                    95a
                                                                          95b
                                                                                96a
                                                                                       96b
                                                                                             97a
                                                                                                   97b
                                                                                                         98a
                                                                                                               98b




                                                   Training epoch

                    v v v v v v v v v v v v v v v v v v v v v v v v
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                                       Si


    Unlabeled
    examples                                     Un




        Experiment 3: Update unlabeled data (vary seeds but use
        contemporary unlabeled data)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                                  Si


    Unlabeled
    examples                                     Un




        Experiment 3: Update unlabeled data (vary seeds but use
        contemporary unlabeled data)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                               Si


    Unlabeled
    examples                                     Un




        Experiment 3: Update unlabeled data (vary seeds but use
        contemporary unlabeled data)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                          Si


    Unlabeled
    examples                                     Un




        Experiment 3: Update unlabeled data (vary seeds but use
        contemporary unlabeled data)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds                    Si


    Unlabeled
    examples                                     Un




        Experiment 3: Update unlabeled data (vary seeds but use
        contemporary unlabeled data)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds               Si


    Unlabeled
    examples                                     Un




        Experiment 3: Update unlabeled data (vary seeds but use
        contemporary unlabeled data)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds            Si


    Unlabeled
    examples                                     Un




        Experiment 3: Update unlabeled data (vary seeds but use
        contemporary unlabeled data)
Update seeds or unlabeled data?



                91a           Timeline           98b


      Test                                       Tn


     Seeds      Si


    Unlabeled
    examples                                     Un




        Experiment 3: Update unlabeled data (vary seeds but use
        contemporary unlabeled data)
Updating the unlabeled data is better than
updating the seeds
             0.84
             0.82
             0.80
 F−measure




                                                                                                                     Contemporary unlabeled data
                                                                                                                     maintain the performance
             0.78




                                                                                      (i,i,98b)
             0.76




                                                                                      (98b,i,98b)
                                                                                      (i,98b,98b)
             0.74

                    91a
                          91b
                                92a
                                      92b
                                            93a
                                                  93b
                                                        94a
                                                              94b
                                                                    95a
                                                                          95b
                                                                                96a
                                                                                       96b
                                                                                             97a
                                                                                                   97b
                                                                                                         98a
                                                                                                               98b




                                                   Training epoch

                    v v v v v v v v v v v v v v v v v v v v v v v v
Augment unlabeled data?



                91a           Timeline            98b


      Test                                        Tn


     Seeds                                        Sn


    Unlabeled
    examples                                 Ui




        Experiment 4: Enlarge unlabeled data with older data and
        use contemporary seeds
Augment unlabeled data?



                91a           Timeline             98b


      Test                                         Tn


     Seeds                                         Sn


    Unlabeled
    examples                             Ui   Ui




        Experiment 4: Enlarge unlabeled data with older data and
        use contemporary seeds
Augment unlabeled data?



                91a           Timeline             98b


      Test                                         Tn


     Seeds                                         Sn


    Unlabeled
    examples                        Ui   Ui   Ui




        Experiment 4: Enlarge unlabeled data with older data and
        use contemporary seeds
Augment unlabeled data?



                91a           Timeline             98b


      Test                                         Tn


     Seeds                                         Sn


    Unlabeled
    examples                  Ui    Ui   Ui   Ui




        Experiment 4: Enlarge unlabeled data with older data and
        use contemporary seeds
Augment unlabeled data?



                91a           Timeline             98b


      Test                                         Tn


     Seeds                                         Sn


    Unlabeled
    examples             Ui   Ui    Ui   Ui   Ui




        Experiment 4: Enlarge unlabeled data with older data and
        use contemporary seeds
Augment unlabeled data?



                91a             Timeline             98b


      Test                                           Tn


     Seeds                                           Sn


    Unlabeled
    examples          Ui   Ui   Ui    Ui   Ui   Ui




        Experiment 4: Enlarge unlabeled data with older data and
        use contemporary seeds
Augment unlabeled data?



                91a             Timeline             98b


      Test                                           Tn


     Seeds                                           Sn


    Unlabeled
    examples    Ui    Ui   Ui   Ui    Ui   Ui   Ui




        Experiment 4: Enlarge unlabeled data with older data and
        use contemporary seeds
Augment unlabeled data?
 F−measure
             0.84
             0.82
             0.80
             0.78




                                                                                                                     Larger amounts of older
                                                                          (i,98b,98b)                                unlabeled data does not always
             0.76




                                                                          (i,u[i,...,98a],98b)
                                                                          (98b,u[i,...,98a],98b)                     result in better performance
             0.74

                    91a
                          91b
                                92a
                                      92b
                                            93a
                                                  93b
                                                        94a
                                                              94b
                                                                    95a
                                                                           95b
                                                                                 96a
                                                                                       96b
                                                                                             97a
                                                                                                   97b
                                                                                                         98a
                                                                                                               98b




                                              Time frame (semester)


                    Green line: Same seeds for all taggers (98b);
                    unlabeled data is enlarging backwards
                    Blue line: Different seeds for each tagger; same
                    unlabeled data for all taggers (98b)
Augment unlabeled data?
 F−measure
             0.84
             0.82
             0.80
             0.78




                                                                                                                     Larger amounts of older
                                                                          (i,98b,98b)                                unlabeled data does not always
             0.76




                                                                          (i,u[i,...,98a],98b)
                                                                          (98b,u[i,...,98a],98b)                     result in better performance
             0.74

                    91a
                          91b
                                92a
                                      92b
                                            93a
                                                  93b
                                                        94a
                                                              94b
                                                                    95a
                                                                           95b
                                                                                 96a
                                                                                       96b
                                                                                             97a
                                                                                                   97b
                                                                                                         98a
                                                                                                               98b




                                              Time frame (semester)


                    Green line: Same seeds for all taggers (98b);
                    unlabeled data is enlarging backwards
                    Blue line: Different seeds for each tagger; same
                    unlabeled data for all taggers (98b)
Augment unlabeled data?



                91a            Timeline           98b


      Test                                        Tn


     Seeds                                   Si


    Unlabeled
    examples                                 Ui




        Experiment 5: Enlarge the size of unlabeled data and vary
        seeds
Augment unlabeled data?



                91a            Timeline             98b


      Test                                          Tn


     Seeds                                Si


    Unlabeled
    examples                              Ui   Ui




        Experiment 5: Enlarge the size of unlabeled data and vary
        seeds
Augment unlabeled data?



                91a            Timeline             98b


      Test                                          Tn


     Seeds                           Si


    Unlabeled
    examples                         Ui   Ui   Ui




        Experiment 5: Enlarge the size of unlabeled data and vary
        seeds
Augment unlabeled data?



                91a            Timeline             98b


      Test                                          Tn


     Seeds                     Si


    Unlabeled
    examples                   Ui    Ui   Ui   Ui




        Experiment 5: Enlarge the size of unlabeled data and vary
        seeds
Augment unlabeled data?



                91a            Timeline             98b


      Test                                          Tn


     Seeds                Si


    Unlabeled
    examples              Ui   Ui    Ui   Ui   Ui




        Experiment 5: Enlarge the size of unlabeled data and vary
        seeds
Augment unlabeled data?



                91a             Timeline             98b


      Test                                           Tn


     Seeds            Si


    Unlabeled
    examples          Ui   Ui   Ui    Ui   Ui   Ui




        Experiment 5: Enlarge the size of unlabeled data and vary
        seeds
Augment unlabeled data?



                91a             Timeline             98b


      Test                                           Tn


     Seeds      Si


    Unlabeled
    examples    Ui    Ui   Ui   Ui    Ui   Ui   Ui




        Experiment 5: Enlarge the size of unlabeled data and vary
        seeds
Updating the unlabeled data is better than
accumulating older unlabeled data
             0.84
             0.82
             0.80
 F−measure




                                                                                                                     Larger amounts of unlabeled
                                                                                                                     data is worse than training with
             0.78




                                                                                                                     contemporary unlabeled data
                                                                          (i,98b,98b)
             0.76




                                                                          (i,u[i,...,98a],98b)
                                                                          (98b,u[i,...,98a],98b)
                                                                                                                     Larger amounts of unlabeled
                                                                                                                     data does not outperform the
             0.74




                                                                                                                     tagger trained with
                    91a
                          91b
                                92a
                                      92b
                                            93a
                                                  93b
                                                        94a
                                                              94b
                                                                    95a
                                                                           95b
                                                                                 96a
                                                                                       96b
                                                                                             97a
                                                                                                   97b
                                                                                                         98a
                                                                                                               98b




                                              Time frame (semester)                                                  contemporary seeds and
                                                                                                                     unlabeled data
                    Violet line: Seeds in the same time frame as
                    unlabeled set being added; unlabeled data is
                    enlarging backwards
                    Blue line: Seeds are the same as in the violet
                    line; same unlabeled data for all taggers (98b)
                    Green line: Same seeds for all taggers (98b);
                    unlabeled data is enlarging backwards
Updating the unlabeled data is better than
accumulating older unlabeled data
             0.84
             0.82
             0.80
 F−measure




                                                                                                                     Larger amounts of unlabeled
                                                                                                                     data is worse than training with
             0.78




                                                                                                                     contemporary unlabeled data
                                                                          (i,98b,98b)
             0.76




                                                                          (i,u[i,...,98a],98b)
                                                                          (98b,u[i,...,98a],98b)
                                                                                                                     Larger amounts of unlabeled
                                                                                                                     data does not outperform the
             0.74




                                                                                                                     tagger trained with
                    91a
                          91b
                                92a
                                      92b
                                            93a
                                                  93b
                                                        94a
                                                              94b
                                                                    95a
                                                                           95b
                                                                                 96a
                                                                                       96b
                                                                                             97a
                                                                                                   97b
                                                                                                         98a
                                                                                                               98b




                                              Time frame (semester)                                                  contemporary seeds and
                                                                                                                     unlabeled data
                    Violet line: Seeds in the same time frame as
                    unlabeled set being added; unlabeled data is
                    enlarging backwards
                    Blue line: Seeds are the same as in the violet
                    line; same unlabeled data for all taggers (98b)
                    Green line: Same seeds for all taggers (98b);
                    unlabeled data is enlarging backwards
Updating the unlabeled data is better than
accumulating older unlabeled data
             0.84
             0.82
             0.80
 F−measure




                                                                                                                     Larger amounts of unlabeled
                                                                                                                     data is worse than training with
             0.78




                                                                                                                     contemporary unlabeled data
                                                                          (i,98b,98b)
             0.76




                                                                          (i,u[i,...,98a],98b)
                                                                          (98b,u[i,...,98a],98b)
                                                                                                                     Larger amounts of unlabeled
                                                                                                                     data does not outperform the
             0.74




                                                                                                                     tagger trained with
                    91a
                          91b
                                92a
                                      92b
                                            93a
                                                  93b
                                                        94a
                                                              94b
                                                                    95a
                                                                           95b
                                                                                 96a
                                                                                       96b
                                                                                             97a
                                                                                                   97b
                                                                                                         98a
                                                                                                               98b




                                              Time frame (semester)                                                  contemporary seeds and
                                                                                                                     unlabeled data
                    Violet line: Seeds in the same time frame as
                    unlabeled set being added; unlabeled data is
                    enlarging backwards
                    Blue line: Seeds are the same as in the violet
                    line; same unlabeled data for all taggers (98b)
                    Green line: Same seeds for all taggers (98b);
                    unlabeled data is enlarging backwards
Final remarks




      Contemporary unlabeled data are better data
      But...
          Why doesn’t the labeled data impact the performance more?
          Are other semi-supervised approaches also sensitive?
Acknowledgments




    This research work was funded by Funda¸˜o para a Ciˆncia e a
                                           ca          e
       Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
Updating a Name Tagger Using
                      Contemporary Unlabeled Data
                                     ACL-IJCNLP 2009
                                 Singapore, August 3rd - 5th


                    Cristina Mota1,2 and Ralph Grishman2

                                 1 IST   & L2F INESC-ID (Portugal)
                                    2 New  York University (USA)

                                (Advisors: Ralph Grishman & Nuno Mamede)




This research was funded by Funda¸ao para a Ciˆncia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
                                 c˜           e
Example of (miss)classification



      Test set 98b includes two instances of “Tizi Ouzou”:
      Tizi Ouzou tem (en: Tizi Ouzou has)
      manifesta¸oes em Tizi Ouzou (en: demonstrations in Tizi Ouzou)
                c˜
      Does not occur in u 91a so depends on contexts:
      (”n v” ”tem”) ORGANIZATION 0.52
      (”type” ”nprop v”) PERSON 0.43
      (”len” 2) PERSON 0.62
      But occurs in u 98b:
      noite em Tizi (en: night in Tizi)
      ruas de Tizi Ouzou (en: street of Tizi Ouzou)
      ir a Tizi-Ouzou (en: go to Tizi Ouzou)
NE tagger: Identification




   Portuguese             Raw text
                                                                          1   Elisa Ferreira come¸ou por
                                                                                                     c
    dictionary
                                                                              criticar Cavaco Silva
     Priority          Lexical analysis
   dictionaries
                                                                          2   [Elisa Ferreira]SEQM [come¸ou  c
  Morphological
   grammars                                                                   por
    Chunking              Chunking
                                                                              criticar]V +Complexo+Pred=criticar
    grammars
                                                                              [Cavaco Silva]SEQM
   NE + context
    grammars
                  NE + context identification   Text with unclassified NE
                                                                          3   [Elisa Ferreira]nprop v+criticar
                     Pairs (NE,context)
                                                                              come¸ou por criticar [Cavaco
                                                                                    c
                                                                              Silva]v nprop+criticar
 Identification designed with NooJ
                                                                          4   [Elisa Ferreira]nprop v+criticar
 (Silberztein, 2004)
                                                                              [Cavaco Silva]v nprop+criticar
NE tagger: Classification

       List of examples        Seeds




    Label with name rules


      Labeled examples


      Infer context rules
                                       Spelling features ← SEEDS: (Elisa
            Context
                                       Ferreira,PESSOA,0.9999)
             rules


    Label with context rules


      Labeled examples
                                         1   LABEL: Elisa Ferreira,criticar ← PESSOA
       Infer name rules                  2   INFER: (criticar,PESSOA,0.98)
            Name
            rules
                                         3   LABEL: Cavaco Silva,criticar ← PESSOA
         Label with
     name + context rules
                                         4   INFER: (Silva,PESSOA,0.97)
      Labeled examples
                                         5   REPEAT
            Infer
     name + context rules



       Name + context
           rules
NE tagger performance decreases over time                                         (Mota & Grishman, 2008)
Detailed analysis using six-month periods (instead of periods of 1 year)



                                                                                          y=−0.00232x+0.79906
                                                                                                    R2=0.2376




                                                             0.82
                    (Si , Ui , Tj )
                                    R2




                                                             0.80
               a             b




                                                 F−measure
      P    0.827     -0.0024 0.24824




                                                             0.78
      R    0.773     -0.0022 0.19393




                                                             0.76
      F    0.799     -0.0023 0.23765



                                                             0.74
                                                                    0     5             10                 15
                                                                        Time gap (1=6 months)



     The performance decreases at an estimated rate of:
          0.00232 in F-measure each 6 months (0.0348 after 8 years)
          The low R-squared values show that not all variation is attributable
          to increasing the time gap
Updating the unlabeled data is better than
updating the seeds (Complete training-test configurations)



                                         y=−0.00232x+0.79906
                                                   R2=0.2376
            0.82




                                                               Update?         a         b      R2
            0.80
F−measure




                                                               No          0.799   -0.0023   0.238
            0.78




                                                               Seeds       0.800   -0.0019   0.192
                                                               Unlabeled   0.807   -0.0005   0.019
            0.76
            0.74




                   0     5             10                 15
                       Time gap (1=6 months)
Updating the unlabeled data is better than
updating the seeds (Complete training-test configurations)



                                         y=−0.00189x+0.80025
                                                   R2=0.1917
            0.82




                                                               Update?         a         b      R2
            0.80
F−measure




                                                               No          0.799   -0.0023   0.238
                                                               Seeds       0.800   -0.0019   0.192
            0.78




                                                               Unlabeled   0.807   -0.0005   0.019
            0.76




                   0     5             10                 15
                       Time gap (1=6 months)
Updating the unlabeled data is better than
updating the seeds (Complete training-test configurations)



                                         y=−0.00051x+0.80769
                                                   R2=0.0189
            0.83
            0.82




                                                               Update?         a         b      R2
            0.81
F−measure




                                                               No          0.799   -0.0023   0.238
            0.80




                                                               Seeds       0.800   -0.0019   0.192
            0.79




                                                               Unlabeled   0.807   -0.0005   0.019
            0.78
            0.77




                   0     5             10                 15
                       Time gap (1=6 months)
Confusion matrices

    91a   335    12    22   330    16    20   393    12    22
           52   453    79    52   456    69    12   463    38
           23    21   330    28    14   342     5    11   371
    92b   368    19    42   368    16    40   391    11    22
           19   435    55    23   445    39    14   463    29
           23    32   334    19    25   352     5    12   380
    95b   375    14    34   387    14    30   394    12    26
           22   465    78    13   461    73    12   463    43
           13     7   319    10    11   328     4    11   362
    98a   390    16    31   386    16    28   395    11    28
           11   458    58    13   460    48    11   464    39
            9    12   342    11    10   355     4    11   364
    98b   394     9    20   394     9    20   394     9    20
            8   467    29     8   467    29     8   467    29
            8    10   382     8    10   382     8    10   382

More Related Content

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Updating a Name Tagger Using Contemporary Unlabeled Data

  • 1. Updating a Name Tagger Using Contemporary Unlabeled Data ACL-IJCNLP 2009 Singapore, August 3rd - 5th Cristina Mota1,2 and Ralph Grishman2 1 IST & L2F INESC-ID (Portugal) 2 New York University (USA) (Advisors: Ralph Grishman & Nuno Mamede) This research was funded by Funda¸ao para a Ciˆncia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000) c˜ e
  • 2. Motivation 0.85 y=−0.00391x+0.82479 R2=0.3647 0.84 The performance of a co-trained 0.83 named entity tagger decreases as F−measure the time gap increases between 0.82 training and test sets (Mota & 0.81 Grishman, 2008) 0.80 0.79 0 1 2 3 4 5 6 7 Time gap (year) Do we need to update the seeds or the unlabeled data? Does more older data help?
  • 3. Motivation 0.85 y=−0.00391x+0.82479 R2=0.3647 0.84 The performance of a co-trained 0.83 named entity tagger decreases as F−measure the time gap increases between 0.82 training and test sets (Mota & 0.81 Grishman, 2008) 0.80 0.79 0 1 2 3 4 5 6 7 Time gap (year) Do we need to update the seeds or the unlabeled data? Does more older data help?
  • 4. Related Work “More data are better data” (Church & Mercer, 1993) Enlarge labeled data as a way of improving performance Contemporary (labeled) data reduces out-of-vocabulary rates Time-adaptive language model (Auzanne et al., 2000) Generation of offline name lists (Palmer & Ostendorf, 2005) Daily adaptation of the language model of a broadcast news transcription system (Martins et al., 2006)
  • 5. Data Sets Data sets were drawn from the Politics section of CETEMP´blico u corpus (Santos & Rocha, 2001) Language: Portuguese Time span: 8 years (1991-1998) Time gap: 1=6 months For each six month period Seeds (S): names collected from first 192 extracts∗ Test data (T): next 208 extracts Unlabeled data (U): next 7856 extracts ∗ 1 extract = app. 2 paragraphs
  • 6. Named Entity Tagger Unlabeled text Based on a co-training classifier Training (Collins & Singer, 1999) Identification Includes propagation step Needs few seeds and Pairs (spelling features, contextual features) performance is high (above 80%) Seeds Co-training Performance is parametrized by combination of seeds, Spelling + contextual rules unlabeled set and test set: (S,U,T) Tagger is evaluated after propagation with HAREM scoring programs
  • 7. Named Entity Tagger Unlabeled text Test text Based on a co-training classifier Training Testing (Collins & Singer, 1999) Identification Includes propagation step Needs few seeds and Pairs (spelling features, contextual features) performance is high (above 80%) Seeds Co-training Performance is parametrized by combination of seeds, Spelling + Classification contextual rules unlabeled set and test set: Labeled Pairs (S,U,T) Tagger is evaluated after Propagation propagation with HAREM scoring programs Text with classified NE
  • 8. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Experiment 1: Baseline (vary seeds and unlabeled data synchronously as in Mota & Grishman (2008))
  • 9. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Experiment 1: Baseline (vary seeds and unlabeled data synchronously as in Mota & Grishman (2008))
  • 10. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Experiment 1: Baseline (vary seeds and unlabeled data synchronously as in Mota & Grishman (2008))
  • 11. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Experiment 1: Baseline (vary seeds and unlabeled data synchronously as in Mota & Grishman (2008))
  • 12. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Experiment 1: Baseline (vary seeds and unlabeled data synchronously as in Mota & Grishman (2008))
  • 13. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Experiment 1: Baseline (vary seeds and unlabeled data synchronously as in Mota & Grishman (2008))
  • 14. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Experiment 1: Baseline (vary seeds and unlabeled data synchronously as in Mota & Grishman (2008))
  • 15. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Experiment 1: Baseline (vary seeds and unlabeled data synchronously as in Mota & Grishman (2008))
  • 16. Update seeds or unlabeled data? F−measure 0.84 0.82 0.80 Performance decays as the time gap increases (Mota & 0.78 Grishman, 2008) (i,i,98b) 0.76 (98b,i,98b) (i,98b,98b) 0.74 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Training epoch v v v v v v v v v v v v v v v v v v v v v v v v
  • 17. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Experiment 2: Update seeds (vary unlabeled data but use contemporary seeds)
  • 18. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Experiment 2: Update seeds (vary unlabeled data but use contemporary seeds)
  • 19. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Experiment 2: Update seeds (vary unlabeled data but use contemporary seeds)
  • 20. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Experiment 2: Update seeds (vary unlabeled data but use contemporary seeds)
  • 21. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Experiment 2: Update seeds (vary unlabeled data but use contemporary seeds)
  • 22. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Experiment 2: Update seeds (vary unlabeled data but use contemporary seeds)
  • 23. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Experiment 2: Update seeds (vary unlabeled data but use contemporary seeds)
  • 24. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Experiment 2: Update seeds (vary unlabeled data but use contemporary seeds)
  • 25. Update seeds or unlabeled data? F−measure 0.84 0.82 0.80 Contemporary seeds slightly attenuate the decrease 0.78 (i,i,98b) 0.76 (98b,i,98b) (i,98b,98b) 0.74 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Training epoch v v v v v v v v v v v v v v v v v v v v v v v v
  • 26. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Un Experiment 3: Update unlabeled data (vary seeds but use contemporary unlabeled data)
  • 27. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Un Experiment 3: Update unlabeled data (vary seeds but use contemporary unlabeled data)
  • 28. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Un Experiment 3: Update unlabeled data (vary seeds but use contemporary unlabeled data)
  • 29. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Un Experiment 3: Update unlabeled data (vary seeds but use contemporary unlabeled data)
  • 30. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Un Experiment 3: Update unlabeled data (vary seeds but use contemporary unlabeled data)
  • 31. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Un Experiment 3: Update unlabeled data (vary seeds but use contemporary unlabeled data)
  • 32. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Un Experiment 3: Update unlabeled data (vary seeds but use contemporary unlabeled data)
  • 33. Update seeds or unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Un Experiment 3: Update unlabeled data (vary seeds but use contemporary unlabeled data)
  • 34. Updating the unlabeled data is better than updating the seeds 0.84 0.82 0.80 F−measure Contemporary unlabeled data maintain the performance 0.78 (i,i,98b) 0.76 (98b,i,98b) (i,98b,98b) 0.74 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Training epoch v v v v v v v v v v v v v v v v v v v v v v v v
  • 35. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds
  • 36. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds
  • 37. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Ui Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds
  • 38. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Ui Ui Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds
  • 39. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Ui Ui Ui Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds
  • 40. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Ui Ui Ui Ui Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds
  • 41. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Sn Unlabeled examples Ui Ui Ui Ui Ui Ui Ui Experiment 4: Enlarge unlabeled data with older data and use contemporary seeds
  • 42. Augment unlabeled data? F−measure 0.84 0.82 0.80 0.78 Larger amounts of older (i,98b,98b) unlabeled data does not always 0.76 (i,u[i,...,98a],98b) (98b,u[i,...,98a],98b) result in better performance 0.74 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Time frame (semester) Green line: Same seeds for all taggers (98b); unlabeled data is enlarging backwards Blue line: Different seeds for each tagger; same unlabeled data for all taggers (98b)
  • 43. Augment unlabeled data? F−measure 0.84 0.82 0.80 0.78 Larger amounts of older (i,98b,98b) unlabeled data does not always 0.76 (i,u[i,...,98a],98b) (98b,u[i,...,98a],98b) result in better performance 0.74 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Time frame (semester) Green line: Same seeds for all taggers (98b); unlabeled data is enlarging backwards Blue line: Different seeds for each tagger; same unlabeled data for all taggers (98b)
  • 44. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds
  • 45. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds
  • 46. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Ui Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds
  • 47. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Ui Ui Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds
  • 48. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Ui Ui Ui Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds
  • 49. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Ui Ui Ui Ui Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds
  • 50. Augment unlabeled data? 91a Timeline 98b Test Tn Seeds Si Unlabeled examples Ui Ui Ui Ui Ui Ui Ui Experiment 5: Enlarge the size of unlabeled data and vary seeds
  • 51. Updating the unlabeled data is better than accumulating older unlabeled data 0.84 0.82 0.80 F−measure Larger amounts of unlabeled data is worse than training with 0.78 contemporary unlabeled data (i,98b,98b) 0.76 (i,u[i,...,98a],98b) (98b,u[i,...,98a],98b) Larger amounts of unlabeled data does not outperform the 0.74 tagger trained with 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Time frame (semester) contemporary seeds and unlabeled data Violet line: Seeds in the same time frame as unlabeled set being added; unlabeled data is enlarging backwards Blue line: Seeds are the same as in the violet line; same unlabeled data for all taggers (98b) Green line: Same seeds for all taggers (98b); unlabeled data is enlarging backwards
  • 52. Updating the unlabeled data is better than accumulating older unlabeled data 0.84 0.82 0.80 F−measure Larger amounts of unlabeled data is worse than training with 0.78 contemporary unlabeled data (i,98b,98b) 0.76 (i,u[i,...,98a],98b) (98b,u[i,...,98a],98b) Larger amounts of unlabeled data does not outperform the 0.74 tagger trained with 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Time frame (semester) contemporary seeds and unlabeled data Violet line: Seeds in the same time frame as unlabeled set being added; unlabeled data is enlarging backwards Blue line: Seeds are the same as in the violet line; same unlabeled data for all taggers (98b) Green line: Same seeds for all taggers (98b); unlabeled data is enlarging backwards
  • 53. Updating the unlabeled data is better than accumulating older unlabeled data 0.84 0.82 0.80 F−measure Larger amounts of unlabeled data is worse than training with 0.78 contemporary unlabeled data (i,98b,98b) 0.76 (i,u[i,...,98a],98b) (98b,u[i,...,98a],98b) Larger amounts of unlabeled data does not outperform the 0.74 tagger trained with 91a 91b 92a 92b 93a 93b 94a 94b 95a 95b 96a 96b 97a 97b 98a 98b Time frame (semester) contemporary seeds and unlabeled data Violet line: Seeds in the same time frame as unlabeled set being added; unlabeled data is enlarging backwards Blue line: Seeds are the same as in the violet line; same unlabeled data for all taggers (98b) Green line: Same seeds for all taggers (98b); unlabeled data is enlarging backwards
  • 54. Final remarks Contemporary unlabeled data are better data But... Why doesn’t the labeled data impact the performance more? Are other semi-supervised approaches also sensitive?
  • 55. Acknowledgments This research work was funded by Funda¸˜o para a Ciˆncia e a ca e Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
  • 56. Updating a Name Tagger Using Contemporary Unlabeled Data ACL-IJCNLP 2009 Singapore, August 3rd - 5th Cristina Mota1,2 and Ralph Grishman2 1 IST & L2F INESC-ID (Portugal) 2 New York University (USA) (Advisors: Ralph Grishman & Nuno Mamede) This research was funded by Funda¸ao para a Ciˆncia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000) c˜ e
  • 57. Example of (miss)classification Test set 98b includes two instances of “Tizi Ouzou”: Tizi Ouzou tem (en: Tizi Ouzou has) manifesta¸oes em Tizi Ouzou (en: demonstrations in Tizi Ouzou) c˜ Does not occur in u 91a so depends on contexts: (”n v” ”tem”) ORGANIZATION 0.52 (”type” ”nprop v”) PERSON 0.43 (”len” 2) PERSON 0.62 But occurs in u 98b: noite em Tizi (en: night in Tizi) ruas de Tizi Ouzou (en: street of Tizi Ouzou) ir a Tizi-Ouzou (en: go to Tizi Ouzou)
  • 58. NE tagger: Identification Portuguese Raw text 1 Elisa Ferreira come¸ou por c dictionary criticar Cavaco Silva Priority Lexical analysis dictionaries 2 [Elisa Ferreira]SEQM [come¸ou c Morphological grammars por Chunking Chunking criticar]V +Complexo+Pred=criticar grammars [Cavaco Silva]SEQM NE + context grammars NE + context identification Text with unclassified NE 3 [Elisa Ferreira]nprop v+criticar Pairs (NE,context) come¸ou por criticar [Cavaco c Silva]v nprop+criticar Identification designed with NooJ 4 [Elisa Ferreira]nprop v+criticar (Silberztein, 2004) [Cavaco Silva]v nprop+criticar
  • 59. NE tagger: Classification List of examples Seeds Label with name rules Labeled examples Infer context rules Spelling features ← SEEDS: (Elisa Context Ferreira,PESSOA,0.9999) rules Label with context rules Labeled examples 1 LABEL: Elisa Ferreira,criticar ← PESSOA Infer name rules 2 INFER: (criticar,PESSOA,0.98) Name rules 3 LABEL: Cavaco Silva,criticar ← PESSOA Label with name + context rules 4 INFER: (Silva,PESSOA,0.97) Labeled examples 5 REPEAT Infer name + context rules Name + context rules
  • 60. NE tagger performance decreases over time (Mota & Grishman, 2008) Detailed analysis using six-month periods (instead of periods of 1 year) y=−0.00232x+0.79906 R2=0.2376 0.82 (Si , Ui , Tj ) R2 0.80 a b F−measure P 0.827 -0.0024 0.24824 0.78 R 0.773 -0.0022 0.19393 0.76 F 0.799 -0.0023 0.23765 0.74 0 5 10 15 Time gap (1=6 months) The performance decreases at an estimated rate of: 0.00232 in F-measure each 6 months (0.0348 after 8 years) The low R-squared values show that not all variation is attributable to increasing the time gap
  • 61. Updating the unlabeled data is better than updating the seeds (Complete training-test configurations) y=−0.00232x+0.79906 R2=0.2376 0.82 Update? a b R2 0.80 F−measure No 0.799 -0.0023 0.238 0.78 Seeds 0.800 -0.0019 0.192 Unlabeled 0.807 -0.0005 0.019 0.76 0.74 0 5 10 15 Time gap (1=6 months)
  • 62. Updating the unlabeled data is better than updating the seeds (Complete training-test configurations) y=−0.00189x+0.80025 R2=0.1917 0.82 Update? a b R2 0.80 F−measure No 0.799 -0.0023 0.238 Seeds 0.800 -0.0019 0.192 0.78 Unlabeled 0.807 -0.0005 0.019 0.76 0 5 10 15 Time gap (1=6 months)
  • 63. Updating the unlabeled data is better than updating the seeds (Complete training-test configurations) y=−0.00051x+0.80769 R2=0.0189 0.83 0.82 Update? a b R2 0.81 F−measure No 0.799 -0.0023 0.238 0.80 Seeds 0.800 -0.0019 0.192 0.79 Unlabeled 0.807 -0.0005 0.019 0.78 0.77 0 5 10 15 Time gap (1=6 months)
  • 64. Confusion matrices 91a 335 12 22 330 16 20 393 12 22 52 453 79 52 456 69 12 463 38 23 21 330 28 14 342 5 11 371 92b 368 19 42 368 16 40 391 11 22 19 435 55 23 445 39 14 463 29 23 32 334 19 25 352 5 12 380 95b 375 14 34 387 14 30 394 12 26 22 465 78 13 461 73 12 463 43 13 7 319 10 11 328 4 11 362 98a 390 16 31 386 16 28 395 11 28 11 458 58 13 460 48 11 464 39 9 12 342 11 10 355 4 11 364 98b 394 9 20 394 9 20 394 9 20 8 467 29 8 467 29 8 467 29 8 10 382 8 10 382 8 10 382