2. Outline
1. MaltParser
2. Transition Based Parsing
a. Example
b. Oracle
3. Integrating Graph and Transition Based
4. Non βProjective Dependency Parsing
3. MaltParser
β Different Languages ( No tuning for an Specific Lang)
β Language independent: accurate parsing for a wide
variety of languages
β Accuracy between 80% and 90%
β Deterministic
Treebank MaltParser Dependency Parser
( Transition Based )
input output
15. Oracle
β Greedy Algorithm, choose a local optimal hoping it will lead
to the global optimal
β It makes Transition Based Algorithm Deterministic.
β Originally there might be more than one possible transition from
one configuration to another
β Construct the Optimal Transition sequence for the
Input Sentence
β How to Build the Oracle? Build a Classifier
16. Classifier
The Classifier
Classes:
β Shift
β Left-arc
β Right-arc
β Reduction
Feature Vector (Features)
β POS of words in the Buffer
and Stack
β Words themselves
β The First Word in the Stack
β The L World in the Buffer
β The current arcs in the
Graph
17. Results of the MaltParser
β Evaluation Metrics:
β ASU (Unlabeled Attachment Score): Proportion of Tokens assigned
the correct head
β ASL(Labeled Attachment Score): Proportion of tokens assigned with
the correct head and the correct dependency type
18. Results of the MaltParser
More flexible Word order
Rich Morphology
More Inflexible Word order,
βpoorβ Morphology
English
Chinese
Czech
Turkish
Danish
Dutch
Italian
Swedish
German
Goal ->
Evaluate if Maltparser can do reasonably accurate parsing for a
wide variety of languages
20. Results of the MaltParser
β Results:
β Above 80% unlabeled dependency Accuracy (ASU) for all languages
β morphological richness and word order are the cause of variation
across languages
In General lower accuracy for languages like Czech and Turkish.
β There are more non-projective structures in those languages
β It is difficult to do Cross-Language Comparison:
β Big difference in the amount of annotated data
β existence of accurate POS Taggers..
State of the art for Italian, Swedish, Danish, Turkish
21. Graph Based vs Transition Based
Graph Based
β Search for Optimal Graph
(Highest Scoring Graph)
β Globally Trained(Global
Optimal)
β Limited History of Parsing
Desitions
β Less rich feature
representation
Transition Based
β Search for Optimal Graph by finding
the best transition between two
states. (Local Optimal Desitions)
β Locally Trained (configurations)
β Rich History of Parsing Desitions
β More rich feature but Error
Propagation (Greedy Alg.)
22. Graph Based vs Transition Based
Graph Based (MST)
β Better for Long
Dependencies
β More accurate for
dependents that are :
β Verbs
β Adjectives
β Adverbs
Transition Based(Malt)
β Better for Short dependencies
β More accurate for dependents
that are:
β Nouns
β Pronouns
Integrate Both Approaches
23. Integrating Graph and Transition Based
Treebank T Malt Parser Transition Based
Parser
Parsed T
β Integrate both approaches at learning time.
MST Parser
β Base MSTParser guided by Malt
Treebank T MST Parser Transition Based
Parser
Malt Parser
β Base MALTParser guided by MLT
Parsed T
24. Features used in the Integration
β MSTParser guided by
Malt
β Is arc (π, π,β) in πΊ ππππ‘
β Is arc (π, π, π) in πΊ ππππ‘
β Is arc π, π,β πππ‘ in πΊ ππππ‘
β Identity of πβ such that
π, π, πβ² is in πΊ ππππ‘
β ..
MaltParser guided by MST
β Is arc (π0
, π΅0
,β) in πΊ ππ π‘
β Is arc (π΅0, π0,β) in πΊ ππ π‘
β Head direction of π΅0 in πΊ ππ π‘
(left,right,root..)
β Identity of πβ such that β, π΅0, πβ²
is in πΊ ππ π‘
π0
=fist element of the Stack, π΅0
=First element of the Buffer
28. Results of Integration
β Graph-based models predict better long arcs
β Each model learn streghts from the others
β The integration actually improves accuracy
β Trying to do more chaining of systems do not
gain better accuracy
29. Non-Projectivity
β Some Sentences have long distance dependencies which
cannot be parsed with this algorithm
β Cause it only consider relations between neighbors words
β 25% or more of the sentences in some languages are non-
projective
β Useful for some languages with less constraints on word
order
β Harder Problem, There could be relations over unbounded
distances.
30. Non-Projectivity
A dependency Tree π is Projective:
if for every π΄ππ (ππ, ππ, πππ) there is a path from ππ to ππ , if ππ
is between ππ and ππ
From βScheduledβ π2 there is an arc to π5 however there is no
way to get to π4, π3 from π2
31. Non-Projectivity
β Why the previous transition algorithm would not be able to
generate this tree?
Stack Buffer
is
hearing
On
β¦
β¦
βisβ can never be reduced
βhearingβ and βonβ will never
get an arc
32. Handling Non-Projectivity
β Add a new Transition β ββSwapββ
Stack Buffer
ππ
ππ
ππ+1
Stack Buffer
ππ
ππππ+1
swap
β Re-Order the initial Input Sentance
34. Non-Projective Dependency Parsing
β Useful for some languages with less constraints on word
order
Theoretically
β Best case π(π), , that is: no swaps
β Worst Case π(π2
),
35. Results
Non-Projective Dependency Parsing
Running Time
β Test on 5 languages( Danish, Arabic, Czech, Slovene, Turkish)
β In practice the running time is π π .
Parsing Accuracy
β Criteria
β Attachment Score: Percentage of tokens with correct head and
dependency label
β Exact match: completely correct labeled dependency tree
36. Results
Non-Projective Dependency Parsing
β Systems Compared
β πΊ π= allowing Non Projective
β πΊ π =Just Projective
β πΊ ππ=Handling non-Projectivity as a pos-processing
β AS: Percentage of tokens with correct head and dependency label
β EM: completely correct labeled dependency tree
37. Results
Non-Projective Dependency Parsing
β AS
β Performance of π π’ is better for for:
β Czech and Slovene ο more non-porjective arcs in this languages.
β In AS π π’ is lower than π π, however the drop is not really significant
β For Arabic the results are not meaningful since there are only 11 non-
projective arcs in the whole set
β ME
β π π’ outperforms all other parsers.
β The positive effect of π π’ is dependent on the non-projectivity arcs in the
language
38.
39. References
β Joakim Nivre, Jens Nilsson, Johan Hall, Atanas Chanev, GΓΌlsen Eryigit, Sandra
KΓΌbler, Svetoslav Marinov, and Erwin Marsi. Maltparser: a language-
independent system for data-driven dependency parsing. Natural Language
Engineering, 13(1):1β41, 2007.
β Joakim Nivre and Ryan McDonald. Integrating graph-based and transition-
based dependency parsers. In Proceedings of ACL-08: HLT, pages 950β958,
Columbus, Ohio, June 2008.
β Joakim Nivre. Non-projective dependency parsing in expected linear time.
In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL
and the 4th International Joint Conference on Natural Language Processing of
the AFNLP, pages 351β359, Suntec, Singapore, 2009.
β Sandra KΓΌbler, Ryan McDonald, Joakim Nivre. Dependency Parsing, Morgan &
Claypool Publishers, 2009