BenG Update on automatic labelling

MM P05 automatic labeling
term extraction
Victor de Boer
Josefien Schuurman
Roeland Ordelman

Term extraction from TT888
• Input:
– TT888 subtitles
• Output:
– GTAA terms
• Onderwerpen
• Persoonsnamen
• Namen
• Geografische namen
– For entire video
(corresponds to
documentalist tasks)

Planning
• version 0.1
– `naive baseline’
– Test input andoutput
• version 0.2
– Multiple GTAA axes
– Improve statistics
– Bespreking met metadatabeheer
• version 0.3
– More improvements
– Evaluation
• version 1.0
– To be reimplemented
http://www.recensiekoning.nl/2011/09/48928/ondertiteling

Implementation details
• Java to make integration easier
• XML and CSV outputs
– URI of GTAA term
– pref-label
– Confidence value
– Axis
• Input comes from Immix OAI API, where segmentation
should already have taken place
– Algorithm expects one OAI identifier (Expressie or Selectie)
• Matching with GTAA using ElasticSearch instance

version 0.1
For every item
1. Get TT888 words in a frequency list
2. Discard stop words (‘de’, ‘het’, ‘op’, ‘naar’..)
3. Take all words with freq > n
4. Match with GTAA “Onderwerpen” with ElasticSearch score > m
– Preflabel + altlabel
Algorithm
GTAA
gtaa:002151
“theater”
OAI
Stop words

version 0.1
Informal Evaluation:
Compare to hist labels (“Onderwerpen”)
Works a bit (< 20% correct). Input for version 0.2
Algorithm
GTAA
gtaa:002151
“theater”
OAI
Stop words

version 0.2
• Intermediate version, uses Named Entity
Recognizer. Results discussed with Lisette and
Vincent -> Version 0.3
Algorithm
GTAA
“theater”
“Jos Brink”
“Amsterdam”
OAI
Stop words
Named Entity
Recognition
Word freq NL

Named Entity Recognition
• Webservice CLTL @ VU
• Input:
– “Hallo, mijn naam is Victor de Boer en ik woon in de mooie stad Haarlem. Ik werk nu bij het
Nederlands Instituut voor Beeld en Geluid in Hilversum. Hiervoor was ik werkzaam bij de
Vrije Universiteit. “
• Output:
[ Victor de Boer | PERSON ],
[ Haarlem | LOCATION ],
[ Nederlands | MISC ],
[ Instituut voor Beeld en Geluid | ORGANIZATION ],
[ Hilversum | LOCATION ],
[ Vrije Universiteit | ORGANIZATION ]

version 0.3
For every item
1. Track 1
1. Get TT888 words in a frequency list
2. Discard stop words (‘de’, ‘het’, ‘op’, ‘naar’..)
3. Take all N-GRAMS with normalized frequency > n
4. Match with GTAA “Onderwerpen” with score > m
2. Track 2
1. Present TT888 to Named Entity Recognizer (VU-webservice)
2. Match result (with freq > L) with GTAA “PersoonsNamen”, “Geografische
Namen”, “Onderwerpen”, “Namen”
Algorithm
GTAA
“theater”
“Jos Brink”
“Amsterdam”
OAI
Stop words
Named Entity
Recognition
Word freq NL

Evaluation
• Setup
– 4 evaluators (Vincent, Lisette , Alma, Tim)
• 3 in one 50 min session
• 1 in another session
– ~8 minutes per item
– Video + extracted terms
• Open Videos in IE browser
• GTAA URIS + preflabels
• Any other info allowed
– Five point Likert scale
• Only precision, no recall
De gebruikte evaluatieschaal. 0 betekent echt
fout (bv een verkeerd homonym) of echt niet
relevant (verkeerd persoon). Aangezien hier
wisselwerking optreedt kan dit niet veel verder
uitgesplitst worden.
0: Term is geheel niet relevant
1: Term is niet relevant
2: Term is een beetje relevant
3: Term is relevant
4: Term is zeer relevant

Results
• Total of 70 terms for 13 videos (5.4 term per vid)
– Some videos did not start-> discarded
– 38 terms with three evaluations
– 32 with one

Results
eval_1 eval_2 eval_
3
eval_4 Avg
gem: F Term 2,59 1,35 2,00 2,37 2,08
item 1 6 licht 0 0 0 0
item 1 2 Friesland 0 0 2 0,666667
item 3 2 soul 0 1 1 0,666667
item 3 3 Romme,
Gianni
3 4 4 3,666667
item 3 2 Somerville,
Jimmy
4 2 2 2,666667
item 3 3 Harrison,
George
4 4 3 3,666667
item 3 4 Clapton, Eric 4 4 2 3,333333
item 3 2 Milwaukee 3 1 1 1,666667

Example of disagreement
• Term “Milwaukee”
– Top2000 a gogo
Eval 1-> score=3
“Term an sich niet heel relevant, maar in combinatie
met Romme, Gianni toch waardevol. Alweer: NER
wint aan kracht als user tijdcode meekrijgt en kan
afspelen ter check of fragment relevant of niet is
voor zijn zoekactie/hergebruik.”
Eval 3-> score=1
“twee keer genoemd, niet relevant”
Eval 2-> score=1
“…”

Inter-annotator agreement
Pearson eval1 eval2 eval3
eval1 1
eval2 0,52 1
eval3 0,67 0,58 1
eval4 0.78 x 0.92
Agreement between 3 and 4 is large
between 1 and 4 is substantive
between 1 and 2 , 1 and 3, 2 and 3 is lower but ok
Task is fairly objective, but somewhat subjective
We look mainly at averages for the rest

Results: average scores
• Total average of 2.15 (“beetje relevant”+)
At threshold of 2: Precision = 0.61
At threshold of 3: Precision = 0.36
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
0 0.2 0.4 0.6 0.8 1 1.2

Results per video
item average
item 1 0,3333333
item 3 2,6111111
item 5 2,4444444
item 6 1,75
item 8 1,4
item 9 3,6666667
item 10 2,4545455
item 13 2.375
item 14 0(!)
item 15 1.33
item 17 2.08
item 19 4.00 (!)
item 20 1.67
• For some videos we
shouldn’t do this
– Nederland in Beweging
– Metadata on Reeks-level
“Advies: Niveau 1 programma's uitsluiten van
trefwoordextractie, ws. ook van NER”

Results correlation freq/score
• Correlation between frequency of term in text
and average score
– No correlation (?)

Evaluator remarks
• For some videos this shouldn’t be done
– Game shows, drama..
– Annotate at Reeks level
• Some axes seem to work better then others
– Persoonsnamen, Namen, Geografische namen
• More abstraction or combination would be helpful
– Semantic Clustering?
• Subtitles with * are song lyrics
• Still a need for time-coded terms

Conclusion and current steps
• Limited evaluation
• But it works (prec 0.61)
– With some tweaks to 0.7-0.8
• NEs lower threshold, Subjects higher
• Better Elasticsearch matching
– With semantic clustering to 0.8-0.9?
• Currently re-implemented by Arjen as a proper
service
• Re-use for annotating program guides

A huge thanks to the annotators for their valuable effort!!
Questions?
antwoordnu.nl

BenG Update on automatic labelling

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to BenG Update on automatic labelling

Similar to BenG Update on automatic labelling (20)

More from Victor de Boer

More from Victor de Boer (20)

Recently uploaded

Recently uploaded (20)

BenG Update on automatic labelling

Editor's Notes