Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.
Presentation of paper:
Automated Verb Sense Labelling Based on Linked Lexical Resources by Kostadin Cholakov, Judith Eckle-Kohler and Iryna Gurevych. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), p. 68-77, Association for Computational Linguistics, April 2014.
Frenemies to Friends - Bridging the Gap Between Marketing & Sales
Similar to Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.
Making the introductory science lab accessible online apr 2012gregkp
Similar to Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych. (20)
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.
1. 1
Kostadin Cholakov, Judith Eckle-Kohler and Iryna Gurevych
Automated Verb Sense Labelling
Based on Linked Lexical Resources
2. 2
Outline
Evaluation
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Take Home Messages
Automated Verb Sense Labelling in a Nutshell
3. 3April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Motivation
Motivation
Sense annotated corpora are important resources in NLP
usually created manually which is time consuming and expensive
verbs have more senses and thus, annotating verb senses is more
difficult
Solution
Using a large-scale linked lexical resource for creating data annotated
with verb senses automatically
UBY
4. 4
Linking Lexical Resources at the Word Sense
Level – example: UBY
Web 2.0
IMSLex-Subcat
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
UBY
5. 5
Linking Lexical Resources at the Word Sense
Level – example: UBY
Web 2.0
IMSLex-Subcat
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
UBY
Open Source Java API: http://code.google.com/p/uby/
6. 6April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Automated Verb Sense Labelling: Approach
UBY
Corpus
Uby: Verb Sense Patterns derived from lexical information
Corpus: Verb Sense Patterns derived from verb instances
Similarity Metric
7. 7
WN ask%2:32:01 (make a request or demand for something to somebody)
is linked to FN Id 639 (request to do or give something):
As twenty are required it might pay to ask your supplier for a ` bulk discount ".
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Step 1: Creation of sense patterns from
enriched senses
UBY
Uby: [ask%2:32:0] be PP VV to ask person for a JJ act
8. 8April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Step 1: Creation of sense patterns from
enriched senses
9. 9April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Step 1: Creation of sense patterns from
enriched senses
sense enrichment predicate argument structure information
10. 10April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Step 2: Automated Labelling based on Pattern
Similarity
WN ask%2:32:01 is linked to FN Id 639:
As twenty are required it might pay to ask your supplier for a ` bulk discount ".
UBY
he would n't be pleased if a rumdum like me were to ask
his daughter for a date
Similarity score: 0.217 > threshold
Uby: [ask%2:32:01] be PP VV to ask person for a JJ act
Corpus: if PP be to ask person for a time
11. 11April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Step 2: Automated Labelling based on Pattern
Similarity
Using a similarity metric to compare patterns derived from UBY and
patterns derived from verb instances found in corpora
Considers the common bi-, tri-, and four-grams of two patterns:
Takes word order into account!
w >= 1 is the window around the verb
Gn(pi) is the set of ngrams occurring in pattern pi
12. 12
Outline
Evaluation
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Take Home Messages
Automated Verb Sense Labelling in a Nutshell
13. 13April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Intrinsic Evaluation
Evaluation for occurrences of Senseval-3 verbs in SemCor (152 verbs)
Ca. 33.000 sense patterns generated from WN-FN-WKT for these verbs
various similarity thresholds t
14. 14April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Extrinsic Evaluation – Experimental Setup
Comparison of two supervised classifiers for verb sense
disambiguation:
1. Trained on an automatically labelled corpus (ALC):
Verb senses for test verbs given in MASC and Senseval-3 are
labelled in a huge Web Corpus with similarity threshold t=0.1
2. Trained on SemCor 3.0
Test data:
1. MASC corpus: 16 verbs annotated with WordNet 3.0 senses, 11 997
test instances
2. Senseval-3 dataset for all-words WSD: 152 verbs annotated with
WordNet 3.0 senses, 442 test instances
15. 15April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Training Sets
0 100000 200000 300000 400000
Training Data ALC
SemCor
SemCor 3.0
Ca. 22.000 train instances of 16
MASC and 152 Senseval-3 verbs
Automatically labelled corpus (ALC)
Ca. 350.000 train instances of 16
MASC and 152 Senseval-3 verbs
16. 16April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Classification
Preprocessing: POS tagging, dependency parsing and Named
Entity recognition
using the TreeTagger and the Stanford Parser and Named
Entity Recognizer form the DKPro Core component collection,
http://dkpro-core-asl.googlecode.com
Features: lexical, syntactic and semantic features
Classification: A separate logistic regression classifier is
trained for each of the test verbs, using WEKA,
http://www.cs.waikato.ac.nz/ml/weka/
17. 17
Performance of classifiers (accuracy)
evaluated on MASC / Senseval-3
SemCor 3.0
Evaluation on MASC: 50.23
Evaluation on Senseval-3: 48.64
(45.20 with back-off)
Automatically labelled corpus (ALC)
Evaluation on MASC: 49.00
Evaluation on Senseval-3: 47.51
(43.24 with back-off)
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
MFS Baseline for the two test sets
1. MASC: MFS baseline: 41.72
2. Senseval-3: MFS baseline: 25.34
Training Sets
18. 18April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Extrinsic Evaluation – effect of sense
enrichment
Best results with the combination WordNet-FrameNet-Wiktionary
WordNet-FrameNet achieves similar accuracy but the coverage is lower
WordNet-FrameNet-Wiktionary-VerbNet achieves lower accuracy
Using WordNet only achieved the lowest coverage and accuracy
19. 19
Outline
Evaluation
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Take Home Messages
Automated Verb Sense Labelling in a Nutshell
20. 20
Linked Lexical Resources such as UBY are knowledge bases …
… that can be used to perform automated verb sense labelling
the automatically labelled data can successfully be used to train
supervised Machine Learning systems: Distant / Weak Supervision
This is due to the enriched sense representation for word senses
that are interlinked
Particularly useful for languages such as German where lexical resources
are available but no sense-labelled data exist.
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Take Home Messages
21. 21April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Thank You!
Questions?
22. 22April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Training Data Coverage
Coverage of WN senses annotated in MASC in the training data:
There are 22 WN senses with instances in MASC which are not found in
SemCor
There are 34 WN senses with instances in MASC which are not found in
the ALC
The VSD system cannot correctly classify instances of those senses
The Coverage of the WN senses annotated in the test sets by the training
data constitutes the upper bound of our classifiers:
ALC: 0.8805 (increasing the size of the ALC does not help)
SemCor: 0.948
23. 23April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Comparison with other systems for verb sense
disambiguation
State-of-the-art supervised system (Chen and Palmer 2009) on Senseval-
2 data :
0.648 accuracy, MFS baseline: 0.407
Not comparable due to different versions of WordNet used
Best performing Lesk-based system (Miller et al., 2012):
33.86% accuracy for the MASC verbs
30.16% accuracy for the Senseval-3 verbs