Exploring Higher Order Dependency Parsers Pranava Swaroop Madhyastha Supervised by: Prof. Michael Rosner & RNDr. Daniel Zeman September 6, 2011
Introduction ◮ Dependency Grammar. ◮ Binary asymmetric relations - Head and Modiﬁer - Highly lexical relationships. ◮ A quick example: ◮ Projective Constraint ◮ Graph Based Dependency Parsing ◮ Arc-Factored Parsing
Problem Description? ◮ Augmentation of Features ◮ Semantic features ◮ Morpho-syntactic features ◮ Higher order parsing ◮ Context availability ◮ horizontal and vertical context availability ◮ Motivation ◮ Semi-supervised dependency parsing and improvements. ◮ Using well deﬁned linguistic components.
What is Higher Order Dependency Parsing ◮ First-order model - decomposition of the tree into head and modiﬁer dependencies. ◮ Second-order models - inclusion of sibling relation of the modiﬁer tokens along with head and modiﬁer or inclusion of head and modiﬁer and children of the modiﬁer. ◮ Third-order models - one level up. ◮ An illustration
Features ◮ For a given φ - a feature vector and w - the list of related parameters, each part is scored as Part(x, p) = w .φ(x, p) (1) ◮ Each of these contributing feature vectors would be scored by calculating the individual features in this fashion: ◮ dir.pos(h).pos(m) ◮ dir.form(h).pos(m) ◮ and so on ... ◮ The most basic feature patterns consider the surface form, part-of-speech, lemma and other morphosyntactic attributes of the head or the modiﬁer of a dependency.
Experimentation done with: ◮ English - Penn Treebank ◮ Section 2 to 10 as training set - a set of 15000 sentences. ◮ Random sets of sentences from sections 15, 17, 19, 25 of the Penn Treebank as development data - a set of 1000 sentences. ◮ Test set was chosen from Sections 0, 1, 21, 23 of the penn treebank - a set of 2000 sentences. ◮ Czech - Prague Dependency Treebank ◮ The sentences were chosen from pdt2-full-automorph dataset. ◮ The training set consisted of train1 - train5 splits - a set of 15,000 sentences.. ◮ The development set consisted of train6 and train7 splits - a set of 1000 sentences. ◮ The test set was made up of dtest and etest parts - a set of 2000 sentences.
Experimentation ◮ Fine and Coarse Grained Wordsenses ◮ Approximation ◮ For English: ◮ Both Fine and Coarse Grained Wordsense extraction make use of WordNet::SenseRelate package. ◮ Fine grained wordsense basically restricts a word to a particular sense - Word - noun and ﬁrst sense (extracted from the wordnet) ◮ Coarse Grained wordsense is a more generic wordsense description Word - the semantic ﬁle to which the word belongs to. ◮ For Czech: ◮ Only Fine Grained Wordsense extraction (approximately). ◮ extracted by using the sempos which is already tagged in the prague dependency treebank.
Results for the Wordsense augmentation experiment ◮ Sibling based parsers show a statistically signiﬁcant improvement. ◮ For English with Fine Grained wordsense addition - Third order grand-sibling based parser gives an improvement of +0.81 percent (Unlabeled Accuracy Score). A closer statistical examination showed that sibling based interactions which are close to each other have better precision. ◮ For English with Coarse Grained wordsense addition - the second order sibling based parser gives an improvement of approximately +1.09 percent. ◮ Again for Czech with ﬁne grained wordsense augmentation, the 3rd order sibling based parser gives an improvement of approximately +1.20 percent.
Results for Morphosyntactic augmentation experiment ◮ Morphosyntactic augmentation was basically used directly by extracting tags from the corpus. ◮ For Czech, instead of the 15 Letter tagset, we tried out a subset (which includes - Person, Number, POSSGender, Tense, Voice and Case) ◮ For English we integrated the ﬁne grained part-of-speech.
Results ◮ Both for English and Czech, there is a signiﬁcant improvement in the parsing accuracy when it is parsed with the grandchild based algorithms. ◮ For Czech, the third order grand sibling based algorithm shows an improvement of +1.72 percent. ◮ For English, the third order grand sibling based algorithm shows an improvement of +1.21 percent.
Conclusion ◮ Semantic features work better with sibling based parsers (larger horizontal contexts). ◮ Morpho-syntactic features work better with grandchild based parsers (larger vertical contexts). ◮ Features can be instrumental in several tasks, which include accurate labeling of semantic roles and other related tasks. ◮ Linguistic information can be better handled by a higher order parsing algorithm.
Future Work ◮ Higher order parsers with labels (we have not yet tested labeled accuracy scores). ◮ Joint extraction of word-senses and semantic roles. ◮ Experimentation with lexical clusters. ◮ Thorough experimentation of several features. ◮ Maximum and Minimum order requirements.