SlideShare a Scribd company logo
1 of 23
Download to read offline
Outline
                                      Introduction
                                  Corpus Analysis
                         NER Performance Analysis
                                      Experiments
                                    Final Remarks




                        Is this NE tagger getting old?
                   Language Resources and Evaluation Conference
                    Marrakech, Morocco - May 28th - 30th 2008


                         Cristina Mota and Ralph Grishman

                        IST & L2F INESC-ID (Portugal) & NYU (USA)
                                          and
                                New York University (USA)

                                (Advisors: Ralph Grishman & Nuno Mamede)




This research was funded by Funda¸ao para a Ciˆncia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
                                 c˜           e

                  Cristina Mota and Ralph Grishman       Is this NE tagger getting old?
Outline
                                  Introduction
                              Corpus Analysis
                     NER Performance Analysis
                                  Experiments
                                Final Remarks


Outline


  1   Introduction

  2   Corpus Analysis

  3   NER Performance Analysis

  4   Experiments

  5   Final Remarks



              Cristina Mota and Ralph Grishman   Is this NE tagger getting old?
Outline
                                Introduction
                            Corpus Analysis    Motivation
                   NER Performance Analysis    Approach
                                Experiments
                              Final Remarks




1   Introduction
       Motivation
       Approach

2   Corpus Analysis

3   NER Performance Analysis

4   Experiments

5   Final Remarks


            Cristina Mota and Ralph Grishman   Is this NE tagger getting old?
Outline
                              Introduction
                          Corpus Analysis    Motivation
                 NER Performance Analysis    Approach
                              Experiments
                            Final Remarks


What is NER?

      Mary is studying in Rabat at Mohammed V University
                           NE Tagger
      MaryPER is studying in RabatLOC at Mohammed V
                          UniversityORG




          Cristina Mota and Ralph Grishman   Is this NE tagger getting old?
Outline
                               Introduction
                           Corpus Analysis    Motivation
                  NER Performance Analysis    Approach
                               Experiments
                             Final Remarks


The Problem

                                                                                           x

                                                                                                                                    o   UE




                                                                                25
                                                                                                                                    x   CEE
                                                                                                                                    O   União Europeia
                                                                                      x                                             x   Comunidade Europeia




                                                Name occurrences / 100K words

                                                                                20
                                                                                                                                            O
                                                                                                                      O
                                                                                                x                              O                      O
                                                                                                                                        O




                                                                                15
                                                                                                                           O
                                                                                                                                                  O             O     O
                                                                                                                                            o               O
                                                                                                                                                      o         o
                                                                                                     x




                                                                                10
                                                                                                          x                    o        o         o         o
                                                                                                               x
                                                                                                                           o                                          o
                                                                                                                      o
                                                                                           x




                                                                                5
                                                                                      x        O
                                                                                               x
                                                                                                     O                x
                                                                                                     x   O     O           x
                                                                                           O             x                      x       x
                                                                                     O                         x      x    x    x       x    x
                                                                                                                                             x    x    x         x    x
                                                                                     o     o   o     o   o     o                                  x    x    x
                                                                                                                                                            x    x    x




                                                                                0
                                                                                     91a       92a       93a         94a       95a          96a       97a       98a

                                                                                                                   Time frame (semester)




     Do texts vary over time in a way that affects NE recognition?
     Should NE taggers be also conceived time-aware?

           Cristina Mota and Ralph Grishman   Is this NE tagger getting old?
Outline
                                    Introduction
                                Corpus Analysis    Motivation
                       NER Performance Analysis    Approach
                                    Experiments
                                  Final Remarks


Approach



 Corpus Analysis
                                                        NER Performance Analysis
 Measure corpus similarity based on
                                                        Assess performance by training
      Words                                             and testing with different
 Compute name list overlaps                             configurations (train,test)

      By type                                                   Increase time gap between
                                                                training and test data
      By token




                Cristina Mota and Ralph Grishman   Is this NE tagger getting old?
Outline
                                Introduction
                            Corpus Analysis    Corpus Similarity Algorithm (Kilgarriff, 2001)
                   NER Performance Analysis    Name List Overlaps
                                Experiments
                              Final Remarks




1   Introduction

2   Corpus Analysis
      Corpus Similarity Algorithm (Kilgarriff, 2001)
      Name List Overlaps

3   NER Performance Analysis

4   Experiments

5   Final Remarks


            Cristina Mota and Ralph Grishman   Is this NE tagger getting old?
Outline
                                  Introduction
                              Corpus Analysis    Corpus Similarity Algorithm (Kilgarriff, 2001)
                     NER Performance Analysis    Name List Overlaps
                                  Experiments
                                Final Remarks


Corpus Similarity Algorithm (Kilgarriff, 2001)

  Similarity(A,B):
      Split corpus A and B into k slices each
      Repeat m times:
           Randomly allocate k slices to Ai and k to Bi
                               2                  2
           Construct word frequency lists for Ai and Bi
           Compute CBDF between A and B for the n most frequent
           words of the joint corpus (Ai +Bi )
           [CBDF = χ2 by degrees of freedom]
      Output mean and standard deviation of CBDF of all
      experiments
  Repeat using corpus A only: Similarity(A,A) → Homogeneity(A)
  Repeat using corpus B only: Similarity(B,B) → Homogeneity(B)

             Cristina Mota and Ralph Grishman    Is this NE tagger getting old?
Outline
                                          Introduction
                                      Corpus Analysis            Corpus Similarity Algorithm (Kilgarriff, 2001)
                             NER Performance Analysis            Name List Overlaps
                                          Experiments
                                        Final Remarks


Corpus Similarity Algorithm (Kilgarriff, 2001)
                             1                            1
  Corpus A                   2   Corpus A +               2   Corpus B                            Corpus B
             DAA′ 1                                      DAB ′ 1                                              DBB ′ 1



             DAA′ 2                                    DAB ′ 2                                               DBB ′ 2

        .                                          .                                                     .
        .                                          .                                                     .
        .                                          .                                                     .


             DAA′ n                                    DAB ′ n                                               DBB ′ n



      ¯
      DAA′                                       ¯                                                    ¯
                                                                                                      DBB ′
                                                 DAB
  Homogeneity(A)                            Similarity(A, B)                                      Homogeneity(B)



                               ¯
               Lower values of D ⇒ higher homogeneity/similarity

                      Cristina Mota and Ralph Grishman           Is this NE tagger getting old?
Outline
                                Introduction
                            Corpus Analysis       Corpus Similarity Algorithm (Kilgarriff, 2001)
                   NER Performance Analysis       Name List Overlaps
                                Experiments
                              Final Remarks


Name List Overlaps



                                               |TA ∩ TB |
              type overlap =                                                                      (1)
                                       |TA | + |TB | − |TA ∩ TB |
                                                N
                                                i =1 min(fA (i ), fb (i ))
               token overlap =                 N
                                                                                                  (2)
                                               i =1 max(fA (i ), fB (i ))
  TA = list of different names (name types) of text A
  fA (i ) = frequency of name i in text A




            Cristina Mota and Ralph Grishman      Is this NE tagger getting old?
Outline
                                Introduction
                            Corpus Analysis    Corpus Similarity Algorithm (Kilgarriff, 2001)
                   NER Performance Analysis    Name List Overlaps
                                Experiments
                              Final Remarks


Name List Overlaps

  A name list: Mary (3), Rabat (5), Mohammed V University (4)
  B name list: John (1), Rabat (2), Mohammed V Universirty (6)
  Type Overlap
             |{Rabat, MohammedVUniversity }|
                                                     = 2/4
        |{Mary , Rabat, MohammedVUniversity , John}|

  Token Overlap
        min(3, 0) + min(5, 2) + min(4, 6) + min(0, 1)
                                                      = 6/15
       max(3, 0) + max(5, 2) + max(4, 6) + max(0, 1)



            Cristina Mota and Ralph Grishman   Is this NE tagger getting old?
Outline
                                Introduction
                            Corpus Analysis
                                               NE Tagger Description (Collins & Singer, 1999)
                   NER Performance Analysis
                                Experiments
                              Final Remarks




1   Introduction

2   Corpus Analysis

3   NER Performance Analysis
      NE Tagger Description (Collins & Singer, 1999)

4   Experiments

5   Final Remarks



            Cristina Mota and Ralph Grishman   Is this NE tagger getting old?
Outline
                                           Introduction
                                       Corpus Analysis
                                                          NE Tagger Description (Collins & Singer, 1999)
                              NER Performance Analysis
                                           Experiments
                                         Final Remarks


NE Tagger Description (Collins & Singer, 1999)
  Raw TEXT
                                                              Classification in detail:
      ?
  POS Tagging + Parsing                                        Name Rules :- Name seeds
      ?                                                            ?                       
  Shallow Parsed TEXT                                          Label with Name Rules                       6
      ?                                                            ?
  NE Identification       -   TEXT with unclassified NE          Infer Contextual Rules
      ?                                                            ?
  List of Examples (NE,context)                                Label with Contextual Rules
      ?                                                            ?
  NE Classification               Name seeds                   Infer Name Rules                       -
      ?                                                            ?
  List of Labeled Examples (NE, context, label)                Label with Name + Contextual Rules
      ?                                                            ?
  Text Update + NE Propagation                                 List of Labeled Examples (NE, Context, Label)
      ?                                            ?
  TEXT with classified NE




                     Cristina Mota and Ralph Grishman     Is this NE tagger getting old?
Outline
                                               Experimental Setting
                                Introduction
                                               F-Measure over Time
                            Corpus Analysis
                                               Politics Dissimilarity over Time
                   NER Performance Analysis
                                               Politics Name List Overlap over Time
                                Experiments
                                               F-Measure compared to Dissimilarity
                              Final Remarks



1   Introduction

2   Corpus Analysis

3   NER Performance Analysis

4   Experiments
      Experimental Setting
      F-Measure over Time
      Politics Dissimilarity over Time
      Politics Name List Overlap over Time
      F-Measure compared to Dissimilarity

5   Final Remarks
            Cristina Mota and Ralph Grishman   Is this NE tagger getting old?
Outline
                                                                                                        Experimental Setting
                                                                                         Introduction
                                                                                                        F-Measure over Time
                                                                                     Corpus Analysis
                                                                                                        Politics Dissimilarity over Time
                                                                            NER Performance Analysis
                                                                                                        Politics Name List Overlap over Time
                                                                                         Experiments
                                                                                                        F-Measure compared to Dissimilarity
                                                                                       Final Remarks


Experimental Setting

                                                                                         CETEMPublico (Santos  Rocha, 2001) is a
                                                                                         Portuguese public journalistic corpus

                                                               Culture
                                                                                                 Size: 180 million words
                   1e+07




                                                               Sports
                                                               Economy


                                                                                                 Time span: 8 years
                                                               Politics
                                                               Society
                   8e+06
 Number of words
                   6e+06




                                                                                                 Organization: randomly shuffled extracts
                   4e+06




                                                                                                 [1 extract ≅ 2 paragraphs]
                   2e+06




                                                                                                 Classification: 10 topics and 16 time
                   0e+00




                                                                                                 frames (year + semester)
                           91a   92a   93a   94a   95a   96a   97a        98a

                                       Time frame (semester)




                                                                                                 Mark up: paragraphs, sentences,
                                                                                                 enumeration lists and authors


                                                         Cristina Mota and Ralph Grishman               Is this NE tagger getting old?
Outline
                                               Experimental Setting
                                Introduction
                                               F-Measure over Time
                            Corpus Analysis
                                               Politics Dissimilarity over Time
                   NER Performance Analysis
                                               Politics Name List Overlap over Time
                                Experiments
                                               F-Measure compared to Dissimilarity
                              Final Remarks


Experimental Setting

      Topic: politics
      Time unit: year
      Text unit: sentence
      Size: 10 slices x 60000 words per time frame
      N most frequent words: 2000 words
      Names compared: 82400 per time frame
      Seeds (S): different names in the first 2500 name instances [first
      198 extracts per semester]
      Test (T): next 208 extracts per semester grouped by year
      Unlabeled examples (U): first 82456 names with context per year
      [following 7856 extracts]


            Cristina Mota and Ralph Grishman   Is this NE tagger getting old?
Outline
                                                                             Experimental Setting
                                                         Introduction
                                                                             F-Measure over Time
                                                     Corpus Analysis
                                                                             Politics Dissimilarity over Time
                                            NER Performance Analysis
                                                                             Politics Name List Overlap over Time
                                                         Experiments
                                                                             F-Measure compared to Dissimilarity
                                                       Final Remarks


NER Performance: F-Measure over Time


                                                                              When the texts are from the same
                 0.85




                                                                              year (time gap = 0), the
                 0.84




                                                                              F-measure ranges approximately
                                                                              from 82% to 85%
                 0.83
 F−measure (%)




                                                                              When the texts are 5 years apart
                 0.82




                                                                              the F-measure ranges from about
                 0.81




                                                                              79% to 82%
                 0.80




                                                                              As the time gap between (Sk , Uk )
                                                                              and Tj increases, the F-measure
                 0.79




                        0   1   2      3      4       5   6   7

                                    Time gap (year)
                                                                              shows a tendency to decay

             Training-test configuration: (Si ,Ui ,Tj ), i =91..98, j=91..98 [64 tests]




                                    Cristina Mota and Ralph Grishman         Is this NE tagger getting old?
Outline
                                                                                      Experimental Setting
                                                                    Introduction
                                                                                      F-Measure over Time
                                                                Corpus Analysis
                                                                                      Politics Dissimilarity over Time
                                                       NER Performance Analysis
                                                                                      Politics Name List Overlap over Time
                                                                    Experiments
                                                                                      F-Measure compared to Dissimilarity
                                                                  Final Remarks


Politics Corpus Dissimilarity over time

                                                                                       The homogeneity for all the texts
                                                                                       is very close to 1
                               6




                                                                                       Increasing the time gap to one
 Dissimilarity (= mean CBDF)




                                                                                       year, the dissimilarity ranges from
                               5




                                                                                       2.5 to 4.5
                               4




                                                                                       At a distance of five years
                                                                                       dissimilarity ranges from 4.7 to
                               3




                                                                                       almost 6.5
                               2




                                                                                       The dissimilarity shows a tendency
                               1




                                   0   1   2      3      4       5   6   7
                                                                                       to increase as the time gap
                                               Time gap (year)
                                                                                       increases

                          Corpus comparisons: (Ui ,Uj ), i =91..98, j=91..98 [64 comparisons; Higher values = Lower similarity]


                                               Cristina Mota and Ralph Grishman       Is this NE tagger getting old?
Outline
                                                                                          Experimental Setting
                                                                          Introduction
                                                                                          F-Measure over Time
                                                                      Corpus Analysis
                                                                                          Politics Dissimilarity over Time
                                                             NER Performance Analysis
                                                                                          Politics Name List Overlap over Time
                                                                          Experiments
                                                                                          F-Measure compared to Dissimilarity
                                                                        Final Remarks


Politics Name List Overlap over Time
                          6.0




                                                                                                                     2.2
                          5.5




                                                                                            Name token overlap (%)
  Name type overlap (%)




                                                                                                                     2.1
                          5.0




                                                                                                                     2.0
                                                                                                                     1.9
                          4.5




                                                                                                                     1.8
                          4.0




                                                                                                                     1.7
                                0     1    2      3      4       5   6   7                                                 0   1   2      3      4       5   6   7

                                               Time gap (year)                                                                         Time gap (year)




                                    Within the same time frame, the type overlap varies between 5% and 6%
                                    At a distance of 5 years it varies between 3.5% and 4.5%
                                    Within the same year, the name token overlap varied between 4.2% and 4.4%
                                    At distance of 5 years varied between 3.2% and 3.7%
                                    Overlap between name lists also decreases over time


  Corpus comparisons: (Ui ,Tj ), i =91..98, j=91..98 [64 comparisons]
                    Cristina Mota and Ralph Grishman          Is this NE tagger getting old?
Outline
                                                                             Experimental Setting
                                                              Introduction
                                                                             F-Measure over Time
                                                          Corpus Analysis
                                                                             Politics Dissimilarity over Time
                                                 NER Performance Analysis
                                                                             Politics Name List Overlap over Time
                                                              Experiments
                                                                             F-Measure compared to Dissimilarity
                                                            Final Remarks


F-Measure compared to Dissimilarity
                 0.85
                 0.84




                                                                             There is an inverse association
                 0.83
 F−measure (%)




                                                                             between dissimilarity and
                                                                             F-measure: for higher levels of
                 0.82




                                                                             dissimilarity (i.e, higher distance
                 0.81




                                                                             values) we obtain lower
                                                                             performance values
                 0.80
                 0.79




                        1   2        3       4        5       6

                                Dissimilarity (= mean CBDF)

             OBS: Higher values = Lower similarity




                                     Cristina Mota and Ralph Grishman        Is this NE tagger getting old?
Outline
                                Introduction
                            Corpus Analysis    Main Results
                   NER Performance Analysis    Work in Progress
                                Experiments
                              Final Remarks




1   Introduction

2   Corpus Analysis

3   NER Performance Analysis

4   Experiments

5   Final Remarks
      Main Results
      Work in Progress


            Cristina Mota and Ralph Grishman   Is this NE tagger getting old?
Outline
                                 Introduction
                             Corpus Analysis    Main Results
                    NER Performance Analysis    Work in Progress
                                 Experiments
                               Final Remarks


Main Results


  Within a period of 8 years we observed that:
      Corpus similarity and name overlaps tend to decrease as the
      two corpora become more temporally distant
      The performance of a co-training based NE tagger trained and
      tested on those texts shows a decay as we increase the time
      gap between the training and the test data
      There is an association between the results of the corpus
      analysis and the tagger performance




             Cristina Mota and Ralph Grishman   Is this NE tagger getting old?
Outline
                                 Introduction
                             Corpus Analysis    Main Results
                    NER Performance Analysis    Work in Progress
                                 Experiments
                               Final Remarks


Work in Progress


  Other related issues we are currently investigating aiming at better
  named entity recognition
      Analyze the NE surrounding contexts to verify if they also
      tend to overlap less over time
      Investigate how we can avoid the performance decay
           Do we need more data?
           Do we need more labeled data within the same time frame?
           Do we need more unlabeled data within the same time frame?




             Cristina Mota and Ralph Grishman   Is this NE tagger getting old?

More Related Content

Featured

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Featured (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Is this NE tagger getting old?

  • 1. Outline Introduction Corpus Analysis NER Performance Analysis Experiments Final Remarks Is this NE tagger getting old? Language Resources and Evaluation Conference Marrakech, Morocco - May 28th - 30th 2008 Cristina Mota and Ralph Grishman IST & L2F INESC-ID (Portugal) & NYU (USA) and New York University (USA) (Advisors: Ralph Grishman & Nuno Mamede) This research was funded by Funda¸ao para a Ciˆncia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000) c˜ e Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 2. Outline Introduction Corpus Analysis NER Performance Analysis Experiments Final Remarks Outline 1 Introduction 2 Corpus Analysis 3 NER Performance Analysis 4 Experiments 5 Final Remarks Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 3. Outline Introduction Corpus Analysis Motivation NER Performance Analysis Approach Experiments Final Remarks 1 Introduction Motivation Approach 2 Corpus Analysis 3 NER Performance Analysis 4 Experiments 5 Final Remarks Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 4. Outline Introduction Corpus Analysis Motivation NER Performance Analysis Approach Experiments Final Remarks What is NER? Mary is studying in Rabat at Mohammed V University NE Tagger MaryPER is studying in RabatLOC at Mohammed V UniversityORG Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 5. Outline Introduction Corpus Analysis Motivation NER Performance Analysis Approach Experiments Final Remarks The Problem x o UE 25 x CEE O União Europeia x x Comunidade Europeia Name occurrences / 100K words 20 O O x O O O 15 O O O O o O o o x 10 x o o o o x o o o x 5 x O x O x x O O x O x x x O x x x x x x x x x x x o o o o o o x x x x x x 0 91a 92a 93a 94a 95a 96a 97a 98a Time frame (semester) Do texts vary over time in a way that affects NE recognition? Should NE taggers be also conceived time-aware? Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 6. Outline Introduction Corpus Analysis Motivation NER Performance Analysis Approach Experiments Final Remarks Approach Corpus Analysis NER Performance Analysis Measure corpus similarity based on Assess performance by training Words and testing with different Compute name list overlaps configurations (train,test) By type Increase time gap between training and test data By token Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 7. Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks 1 Introduction 2 Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) Name List Overlaps 3 NER Performance Analysis 4 Experiments 5 Final Remarks Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 8. Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Corpus Similarity Algorithm (Kilgarriff, 2001) Similarity(A,B): Split corpus A and B into k slices each Repeat m times: Randomly allocate k slices to Ai and k to Bi 2 2 Construct word frequency lists for Ai and Bi Compute CBDF between A and B for the n most frequent words of the joint corpus (Ai +Bi ) [CBDF = χ2 by degrees of freedom] Output mean and standard deviation of CBDF of all experiments Repeat using corpus A only: Similarity(A,A) → Homogeneity(A) Repeat using corpus B only: Similarity(B,B) → Homogeneity(B) Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 9. Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Corpus Similarity Algorithm (Kilgarriff, 2001) 1 1 Corpus A 2 Corpus A + 2 Corpus B Corpus B DAA′ 1 DAB ′ 1 DBB ′ 1 DAA′ 2 DAB ′ 2 DBB ′ 2 . . . . . . . . . DAA′ n DAB ′ n DBB ′ n ¯ DAA′ ¯ ¯ DBB ′ DAB Homogeneity(A) Similarity(A, B) Homogeneity(B) ¯ Lower values of D ⇒ higher homogeneity/similarity Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 10. Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Name List Overlaps |TA ∩ TB | type overlap = (1) |TA | + |TB | − |TA ∩ TB | N i =1 min(fA (i ), fb (i )) token overlap = N (2) i =1 max(fA (i ), fB (i )) TA = list of different names (name types) of text A fA (i ) = frequency of name i in text A Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 11. Outline Introduction Corpus Analysis Corpus Similarity Algorithm (Kilgarriff, 2001) NER Performance Analysis Name List Overlaps Experiments Final Remarks Name List Overlaps A name list: Mary (3), Rabat (5), Mohammed V University (4) B name list: John (1), Rabat (2), Mohammed V Universirty (6) Type Overlap |{Rabat, MohammedVUniversity }| = 2/4 |{Mary , Rabat, MohammedVUniversity , John}| Token Overlap min(3, 0) + min(5, 2) + min(4, 6) + min(0, 1) = 6/15 max(3, 0) + max(5, 2) + max(4, 6) + max(0, 1) Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 12. Outline Introduction Corpus Analysis NE Tagger Description (Collins & Singer, 1999) NER Performance Analysis Experiments Final Remarks 1 Introduction 2 Corpus Analysis 3 NER Performance Analysis NE Tagger Description (Collins & Singer, 1999) 4 Experiments 5 Final Remarks Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 13. Outline Introduction Corpus Analysis NE Tagger Description (Collins & Singer, 1999) NER Performance Analysis Experiments Final Remarks NE Tagger Description (Collins & Singer, 1999) Raw TEXT Classification in detail: ? POS Tagging + Parsing Name Rules :- Name seeds ? ? Shallow Parsed TEXT Label with Name Rules 6 ? ? NE Identification - TEXT with unclassified NE Infer Contextual Rules ? ? List of Examples (NE,context) Label with Contextual Rules ? ? NE Classification Name seeds Infer Name Rules - ? ? List of Labeled Examples (NE, context, label) Label with Name + Contextual Rules ? ? Text Update + NE Propagation List of Labeled Examples (NE, Context, Label) ? ? TEXT with classified NE Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 14. Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks 1 Introduction 2 Corpus Analysis 3 NER Performance Analysis 4 Experiments Experimental Setting F-Measure over Time Politics Dissimilarity over Time Politics Name List Overlap over Time F-Measure compared to Dissimilarity 5 Final Remarks Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 15. Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks Experimental Setting CETEMPublico (Santos Rocha, 2001) is a Portuguese public journalistic corpus Culture Size: 180 million words 1e+07 Sports Economy Time span: 8 years Politics Society 8e+06 Number of words 6e+06 Organization: randomly shuffled extracts 4e+06 [1 extract ≅ 2 paragraphs] 2e+06 Classification: 10 topics and 16 time 0e+00 frames (year + semester) 91a 92a 93a 94a 95a 96a 97a 98a Time frame (semester) Mark up: paragraphs, sentences, enumeration lists and authors Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 16. Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks Experimental Setting Topic: politics Time unit: year Text unit: sentence Size: 10 slices x 60000 words per time frame N most frequent words: 2000 words Names compared: 82400 per time frame Seeds (S): different names in the first 2500 name instances [first 198 extracts per semester] Test (T): next 208 extracts per semester grouped by year Unlabeled examples (U): first 82456 names with context per year [following 7856 extracts] Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 17. Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks NER Performance: F-Measure over Time When the texts are from the same 0.85 year (time gap = 0), the 0.84 F-measure ranges approximately from 82% to 85% 0.83 F−measure (%) When the texts are 5 years apart 0.82 the F-measure ranges from about 0.81 79% to 82% 0.80 As the time gap between (Sk , Uk ) and Tj increases, the F-measure 0.79 0 1 2 3 4 5 6 7 Time gap (year) shows a tendency to decay Training-test configuration: (Si ,Ui ,Tj ), i =91..98, j=91..98 [64 tests] Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 18. Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks Politics Corpus Dissimilarity over time The homogeneity for all the texts is very close to 1 6 Increasing the time gap to one Dissimilarity (= mean CBDF) year, the dissimilarity ranges from 5 2.5 to 4.5 4 At a distance of five years dissimilarity ranges from 4.7 to 3 almost 6.5 2 The dissimilarity shows a tendency 1 0 1 2 3 4 5 6 7 to increase as the time gap Time gap (year) increases Corpus comparisons: (Ui ,Uj ), i =91..98, j=91..98 [64 comparisons; Higher values = Lower similarity] Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 19. Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks Politics Name List Overlap over Time 6.0 2.2 5.5 Name token overlap (%) Name type overlap (%) 2.1 5.0 2.0 1.9 4.5 1.8 4.0 1.7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Time gap (year) Time gap (year) Within the same time frame, the type overlap varies between 5% and 6% At a distance of 5 years it varies between 3.5% and 4.5% Within the same year, the name token overlap varied between 4.2% and 4.4% At distance of 5 years varied between 3.2% and 3.7% Overlap between name lists also decreases over time Corpus comparisons: (Ui ,Tj ), i =91..98, j=91..98 [64 comparisons] Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 20. Outline Experimental Setting Introduction F-Measure over Time Corpus Analysis Politics Dissimilarity over Time NER Performance Analysis Politics Name List Overlap over Time Experiments F-Measure compared to Dissimilarity Final Remarks F-Measure compared to Dissimilarity 0.85 0.84 There is an inverse association 0.83 F−measure (%) between dissimilarity and F-measure: for higher levels of 0.82 dissimilarity (i.e, higher distance 0.81 values) we obtain lower performance values 0.80 0.79 1 2 3 4 5 6 Dissimilarity (= mean CBDF) OBS: Higher values = Lower similarity Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 21. Outline Introduction Corpus Analysis Main Results NER Performance Analysis Work in Progress Experiments Final Remarks 1 Introduction 2 Corpus Analysis 3 NER Performance Analysis 4 Experiments 5 Final Remarks Main Results Work in Progress Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 22. Outline Introduction Corpus Analysis Main Results NER Performance Analysis Work in Progress Experiments Final Remarks Main Results Within a period of 8 years we observed that: Corpus similarity and name overlaps tend to decrease as the two corpora become more temporally distant The performance of a co-training based NE tagger trained and tested on those texts shows a decay as we increase the time gap between the training and the test data There is an association between the results of the corpus analysis and the tagger performance Cristina Mota and Ralph Grishman Is this NE tagger getting old?
  • 23. Outline Introduction Corpus Analysis Main Results NER Performance Analysis Work in Progress Experiments Final Remarks Work in Progress Other related issues we are currently investigating aiming at better named entity recognition Analyze the NE surrounding contexts to verify if they also tend to overlap less over time Investigate how we can avoid the performance decay Do we need more data? Do we need more labeled data within the same time frame? Do we need more unlabeled data within the same time frame? Cristina Mota and Ralph Grishman Is this NE tagger getting old?