Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

1
Kostadin Cholakov, Judith Eckle-Kohler and Iryna Gurevych
Automated Verb Sense Labelling
Based on Linked Lexical Resources

2
Outline
Evaluation
April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Take Home Messages
Automated Verb Sense Labelling in a Nutshell

3April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler
Motivation
Motivation
 Sense annotated corpora are important resources in NLP
 usually created manually which is time consuming and expensive
 verbs have more senses and thus, annotating verb senses is more
difficult
Solution
 Using a large-scale linked lexical resource for creating data annotated
with verb senses automatically
UBY

4
Linking Lexical Resources at the Word Sense
Level – example: UBY
Web 2.0
IMSLex-Subcat
UBY

5
Linking Lexical Resources at the Word Sense
Level – example: UBY
Web 2.0
IMSLex-Subcat
UBY
Open Source Java API: http://code.google.com/p/uby/

Automated Verb Sense Labelling: Approach
UBY
Corpus
Uby: Verb Sense Patterns derived from lexical information
Corpus: Verb Sense Patterns derived from verb instances
Similarity Metric

7
WN ask%2:32:01 (make a request or demand for something to somebody)
is linked to FN Id 639 (request to do or give something):
As twenty are required it might pay to ask your supplier for a ` bulk discount ".
Step 1: Creation of sense patterns from
enriched senses
UBY
Uby: [ask%2:32:0] be PP VV to ask person for a JJ act

enriched senses

enriched senses
sense enrichment predicate argument structure information

Step 2: Automated Labelling based on Pattern
Similarity
WN ask%2:32:01 is linked to FN Id 639:
As twenty are required it might pay to ask your supplier for a ` bulk discount ".
UBY
he would n't be pleased if a rumdum like me were to ask
his daughter for a date
Similarity score: 0.217 > threshold
Uby: [ask%2:32:01] be PP VV to ask person for a JJ act
Corpus: if PP be to ask person for a time

Step 2: Automated Labelling based on Pattern
Similarity
 Using a similarity metric to compare patterns derived from UBY and
patterns derived from verb instances found in corpora
 Considers the common bi-, tri-, and four-grams of two patterns:
 Takes word order into account!
w >= 1 is the window around the verb
Gn(pi) is the set of ngrams occurring in pattern pi

12
Outline
Evaluation
Take Home Messages

Intrinsic Evaluation
 Evaluation for occurrences of Senseval-3 verbs in SemCor (152 verbs)
 Ca. 33.000 sense patterns generated from WN-FN-WKT for these verbs
 various similarity thresholds t

Extrinsic Evaluation – Experimental Setup
Comparison of two supervised classifiers for verb sense
disambiguation:
1. Trained on an automatically labelled corpus (ALC):
 Verb senses for test verbs given in MASC and Senseval-3 are
labelled in a huge Web Corpus with similarity threshold t=0.1
2. Trained on SemCor 3.0
Test data:
1. MASC corpus: 16 verbs annotated with WordNet 3.0 senses, 11 997
test instances
2. Senseval-3 dataset for all-words WSD: 152 verbs annotated with
WordNet 3.0 senses, 442 test instances

Training Sets
0 100000 200000 300000 400000
Training Data ALC
SemCor
 SemCor 3.0
 Ca. 22.000 train instances of 16
MASC and 152 Senseval-3 verbs
 Automatically labelled corpus (ALC)
 Ca. 350.000 train instances of 16
MASC and 152 Senseval-3 verbs

Classification
Preprocessing: POS tagging, dependency parsing and Named
Entity recognition
 using the TreeTagger and the Stanford Parser and Named
Entity Recognizer form the DKPro Core component collection,
http://dkpro-core-asl.googlecode.com
Features: lexical, syntactic and semantic features
Classification: A separate logistic regression classifier is
trained for each of the test verbs, using WEKA,
http://www.cs.waikato.ac.nz/ml/weka/

17
Performance of classifiers (accuracy)
evaluated on MASC / Senseval-3
SemCor 3.0
 Evaluation on MASC: 50.23
 Evaluation on Senseval-3: 48.64
(45.20 with back-off)
Automatically labelled corpus (ALC)
 Evaluation on MASC: 49.00
 Evaluation on Senseval-3: 47.51
(43.24 with back-off)
MFS Baseline for the two test sets
1. MASC: MFS baseline: 41.72
2. Senseval-3: MFS baseline: 25.34
Training Sets

Extrinsic Evaluation – effect of sense
enrichment
 Best results with the combination WordNet-FrameNet-Wiktionary
 WordNet-FrameNet achieves similar accuracy but the coverage is lower
 WordNet-FrameNet-Wiktionary-VerbNet achieves lower accuracy
 Using WordNet only achieved the lowest coverage and accuracy

19
Outline
Evaluation
Take Home Messages

20
Linked Lexical Resources such as UBY are knowledge bases …
 … that can be used to perform automated verb sense labelling
 the automatically labelled data can successfully be used to train
supervised Machine Learning systems: Distant / Weak Supervision
 This is due to the enriched sense representation for word senses
that are interlinked
 Particularly useful for languages such as German where lexical resources
are available but no sense-labelled data exist.
Take Home Messages

Thank You!
Questions?

Training Data Coverage
Coverage of WN senses annotated in MASC in the training data:
 There are 22 WN senses with instances in MASC which are not found in
SemCor
 There are 34 WN senses with instances in MASC which are not found in
the ALC
 The VSD system cannot correctly classify instances of those senses
 The Coverage of the WN senses annotated in the test sets by the training
data constitutes the upper bound of our classifiers:
 ALC: 0.8805 (increasing the size of the ALC does not help)
 SemCor: 0.948

Comparison with other systems for verb sense
disambiguation
 State-of-the-art supervised system (Chen and Palmer 2009) on Senseval-
2 data :
 0.648 accuracy, MFS baseline: 0.407
 Not comparable due to different versions of WordNet used
 Best performing Lesk-based system (Miller et al., 2012):
 33.86% accuracy for the MASC verbs
 30.16% accuracy for the Senseval-3 verbs

Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.

Similar to Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych. (20)

Recently uploaded

Recently uploaded (20)

Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych.