Character identification is a task of entity linking that finds the global entity of each personal mention in multiparty dialogue. For this task, the first two seasons of the popular TV show Friends are annotated, comprising a total of 448 dialogues, 15,709 mentions, and 401 entities. The personal mentions are detected from nominals referring to certain characters in the show, and the entities are collected from the list of all characters in those two seasons of the show. This task is challenging because it requires the identification of characters that are mentioned but may not be active during the conversation. Among 90+ participants, four of them submitted their system outputs and showed strengths in different aspects about the task. Thorough analyses of the distributed datasets, system out- puts, and comparative studies are also provided. To facilitate the momentum, we create an open- source project for this task and publicly release a larger and cleaner dataset, hoping to support researchers for more enhanced modeling.
SemEval 2018 Task 4: Character Identification on Multiparty Dialogues
1. SemEval 2018 Task 4:
Character Identification
on Multiparty Dialogues
Jinho D. Choi @ Emory University
HenryY. Chen @ Snap Inc.
https://github.com/emorynlp/semeval-2018-task4
http://nlp.mathcs.emory.edu
2. Character Mining
2
Extract or infer various information about
individual characters in multiparty dialogues
text chat
audio chat
video chat
5. 5
Ross I told mom and dad last night, they seemed to take it pretty well.
Monica
Oh really, so that hysterical phone call I got from a woman at sobbing 3:00 A.M., "I'll
never have grandchildren, I'll never have grandchildren." was what? A wrong number?
Ross Sorry.
Joey
Alright Ross, look. You're feeling a lot of pain right now. You're angry. You're hurting.
Can I tell you what the answer is?
Jack Monica
Ross Joey
Judy
Character Identification
6. 6
Ross I told mom and dad last night, they seemed to take it pretty well.
Monica
Oh really, so that hysterical phone call I got from a woman at sobbing 3:00 A.M., "I'll
never have grandchildren, I'll never have grandchildren." was what? A wrong number?
Ross Sorry.
Joey
Alright Ross, look. You're feeling a lot of pain right now. You're angry. You're hurting.
Can I tell you what the answer is?
Entity Linking or Coreference Resolution?
? ?
Cross-document resolution is required
Judy & Jack appear in some other dialogue
Character Identification
7. Corpus
7
Episodes Scenes Speakers Utterances Sentences Tokens
Season 1 24 326 107 5,968 10,790 81,453
Season 2 24 293 107 5,747 9,337 81,910
Total 48 619 171 11,715 20,127 163,363
Two seasons of Friends
Pseudo-annotated then
manually corrected
Manually double-annotated
by crowd workers
Nominals referring to humans Characters in the show
Mentions Entities
8. Mention Annotation
8
Named Entity Pronoun Gazeteer Total
Season 1 946 7,549 811 9,306
Season 2 1,037 7,459 806 9,302
Total 1,983 15,008 1,617 18,608
A noun phrase is a mention if it is either
1. A PERSON named entity, or
2. A pronoun or possessive pronoun excluding“it” and “they”, or
3. One of the personal noun gazetteers that are 603 common and singular
personal nouns selected from Freebase and DBPedia (e.g., mom, girlfriend).
Gives about the F1 score of 95.93%
9. Entity Annotation
9
Primary Secondary Generic Collective General Other Total
Season 1 5,160 2,526 178 1,150 111 187 9,312
Season 2 5,385 2,340 120 1,238 44 169 9,296
Total 10,545 4,866 298 2,388 155 356 18,608
1. Primary: six main characters of the show.
2. Secondary: all the non-primary characters.
3. Generic: real entities whose identities are not defined
(e.g.,That waitress is really cute, I am going to ask her out)
4. Collective: the plural use of the pronoun you, which cannot be deterministically
distinguished from the singular use
5. General: mentions used in reference to a general case rather than an specific entity
(e.g.,The ideal guy you look for doesn’t exist)
6. Other: indicates all the other kinds of mentions
Used for this year’s
shared task
10. Evaluation Metrics
10
Labeling Accuracy
# of all mentions
# of correctly identified mentions
Average Macro F1
F1 score of the i’th character
i = 1
N
N
1
6 Main Characters + Others All Characters
11. Overall Scores
11
Over 90 teams registered at CodaLab
Only 4 teams made submissions :-(
Main + Others All Characters
Acc F1 Acc F1
AMORE-UPF 77.23 79.36 74.72 41.05
KNU-CI 85.10 86.00 69.49 16.98
Kampfpudding 73.36 73.51 59.45 37.37
Zuma-AR 46.85 44.68 33.06 16.09
14. Next Step
14
Character Identification v2.0
Four seasons of Friends are completely annotated
Plural mentions are annotated
Ihttps://github.com/emorynlp/character-identification
They Exist! Introducing Plural Mentions to
Coreference Resolution and Entity Linking
Ethan Zhou and Jinho D. Choi
COLING 2018