Arcomem training neer_advanced


Published on

This presentation on Named Entity Evolution Recognition is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.

Published in: Spiritual, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • We use the characteristics of NEE to desing our algorithm.Just to recap, the characteristics of NEE are ’instant’ changes, small or no concept shifts and announcements to the public.Let’s go through a couple of definitions needed for the algorithm.
  • We start by extracting documents corresponding to each change period, where at least one of the strings of the name are included. For Pope Benedict XVI, we require only Benedict or Pope. From this dataset, we then extract nouns (max length 3) and Named Entities and sum up the frequencies. Stanford NER which recognized Barack Obama but not President Barack Obama and therefore Barack Obama is counted twice by NEER. All the terms are stored in a dictionary.Then we build a co-ocurrence graph using hte extracted terms and the dictionary, We consider 9 terms on either side of a term..And we start merging direct co-references according to three rules. At each time two terms are merged we choose one representative which is the one with the highest frequency. Here is matters that Barack Obama was found by both the Lingua Tagger and Stanford Tagger.WE have the prolongation rule because we limit the max lenght of the terms to reduce noise.The co-occurring terms are considered to be candidates for INDIRECT co-rerences.
  • Arcomem training neer_advanced

    1. 1. NEER: An Unsupervised Method for Named Entity Evolution Recognition (Advanced Level) Prerequisite: NEER: An Unsupervised Method for Named Entity Evolution Recognition (Beginner Level) Nina N. Tahmasebi, Thomas Risse L3S Research Center Hannover, Germany
    2. 2. Change Period Named Entity Evolution Named Entities (NE): people, places, companies... Characteristics of Named Entity Evolution (NEE)  Same thing but different terms over time  Change occurs over short periods of time  Small or no concept shift  Announced to the public repeatedly Goal: Find method for named entity evolution recognition independent from external knowledge sources Slide 2 Joseph Ratzinger Pope Benedict Pope Benedict XVI Benedict XVI Joseph Aloisius Ratzinger Cardinal Ratzinger Cardinal Joseph Ratzinger
    3. 3. Definitions – Context Cwi: all terms related to word w at time i – Temporal co-references: names used for the same entity at same or different points in time. – Direct temp. co-reference: co-references with lexical overlap – Indirect temp. co-reference: co-references without lexical overlap – Change period (CP): period of time where change occurs. Cardinal Joseph Ratzinger  Pope Benedict XVI CP = 2005
    4. 4. Our Method 1. Identify change periods 2. Create one context per CP. 3. Capture at least two co-references  No need to compare vastly different contexts! time Cwalkman-discman Cmp3 player -ipodCdiscman-minidisc t1 t2 t3 Change period discman minidiscwalkman discman mp3 player ipod Cminidisc-mp3 player t2 minidisc mp3 player
    5. 5. Finding Change Periods • Kleinberg’s burst detection • Out of the box Java implementation from CIShell • Compare to manually found change periods (Known CPs)
    6. 6. Finding Direct Co-references 1. Extract text for each change period 2. Term & NE extraction 3. Build co-occurrence graph 4. Rules to merge terms from dictionary and graph Sub-Term Rule: Cardinal Joseph Ratzinger ↔ Joseph Ratzinger Prefix/suffix Rule: Cardinal Joseph Ratzinger ↔ Cardinal Ratzinger Prolongation Rule: Pope John Paul + John Paul II = Pope John Paul II Cardinal Joseph Ratzinger, Cardinal Ratzinger, Joseph Ratzinger Cardinal Joseph Ratzinger, Cardinal Ratzinger, Joseph Ratzinger
    7. 7. Detailed Merging • Merge one token terms (Co-ref classes): – Pope Benedict and Benedict = corefBenedict {Pope Benedict, Benedict} – Benedict XVI and Benedict = corefBenedict {Benedict XVI, Benedict} choose Benedict as representative – highest frequency • Merge co-reference classes  corefBenedict{Pope Benedict, Benedict, Benedict XVI} • Apply remaining rules: – Merge corefBenedict with Pope Benedict XVI (subterm rule)  corefBenedict {Pope Benedict, Benedict, Benedict XVI, Pope Benedict XVI}
    8. 8. Example co-refernces
    9. 9. THANK YOU CONTACT DETAILS Dr. Thomas Risse +49 511 762 17764