Dependency Parsing
        Jinho D. Choi
    University of Colorado
      Preliminary Exam
        March 4, 2009
Contents
• Dependency Structure
  - What is dependency structure?
  - Phrase structure vs. Dependency structure
  - Dependency Graph
• Dependency Parsers
  - MaltParser: Nivre’s algorithm
  - MSTParser: Edmonds’s algorithm
  - MaltParser vs. MSTParser
  - Choi’s algorithm
• Applications
Dependency Structure
• What is dependency?
  - Syntactic or semantic relation between lexicons
  - Syntactic: NMOD, AMOD, Semantic: LOC, MNR
• Phrase Structure(PS) vs. Dependency Structure(DS)
  - Constituents vs. Dependencies
  - There are no phrasal nodes in DS.
     ! Each node in DS represents a word-token.
  - In DS, every node except the root is dependent in exactly one
    other node.
Phrase vs. Dependency
                    She bought a car
         Phrase Structure    Dependency Structure
            S

                                                       bought
   NP                VP
                                                 SBJ                OBJ


   Pro          V               NP         she                            car


                                                                          DET
   she      bought        Det        N

                                                                a

                          a          car




• Not flexible with word-orders
• Language dependent
• No semantic information
Dependency Graph
• For a sentence x = w ..w , a dependency graph G = (V , E )
                          1   n                                        x           x    x


  - V = {w = root, w , ... , w },
      x       0       1               n


  - E = {(w , r, w ) : w " w , w ! V , w ! V - w , r ! R }
      x       i   j   i           j       i   x   j       x     0              x


     ! R = a set of all possible dependency relations in x
          x



• Well-formed Dependency Graph                                        Root


  - Unique root                                                       bought


  - Single head                                                 SBJ          OBJ



  - Connected                                             She                          car



  - Acyclic
                                                                                   NMOD

                                                  Jinho                        a
Projectivity vs Non-projectivity
• Projectivity means no cross-edges.
                 root        She   bought         a            car




    root   She      bought     a   car      yesterday       that     was   blue




• Why projectivity?
  - Regenerate the original sentence with the same word-orders
  - Parsing is less expressive (O(n) vs. O(n ))         2


  - There are not many non-projective relations
Dependency Parsers
• Two state-of-art dependency parsers
  - MaltParser: performed the best in CoNLL 2007 shared task
  - MSTParser: performed the best in CoNLL 2006 shared task
• MaltParser
  - Developed by Johan Hall, Jens Nilsson, and Joakim Nivre
  - Nivre’s algorithm(p, O(n)), Covington’s algorithm(n, O(n ))
                                                            2


• MSTParser
  - Developed by Ryan McDonald
  - Eisner’s algorithm(p,O(k log k)), Edmonds’s algorithm(n, O(kn )
                                                                 2
Nivre’s Algorithm
• Based on Shift-Reduce algorithm
• S = a stack
• I = a list of remaining input tokens

                                         she   bought   a   car
Nivre’s Algorithm
       she   bought       a   car




S                     I             A
Nivre’s Algorithm
                    she   bought       a   car




             S                     I             A
•   Initialize
Nivre’s Algorithm
                    she   bought         a   car




                                   she
                              bought
                                    a
                                   car

             S                     I               A
•   Initialize
Nivre’s Algorithm
                       she   bought         a   car




                                      she
                                 bought
                                       a
                                      car

             S                        I               A
•   Initialize
•   Shift : ‘she’
Nivre’s Algorithm
                       she   bought         a   car




                                 bought
                                      a
             she                      car

             S                        I               A
•   Initialize
•   Shift : ‘she’
Nivre’s Algorithm
                            she   bought         a   car




                                      bought
                                           a
             she                           car

             S                             I               A
•   Initialize
•   Shift : ‘she’
•   Left-Arc : ‘she ! bought’
Nivre’s Algorithm
                            she   bought         a   car




                                      bought
                                           a
             she                           car             she ! bought

             S                             I                   A
•   Initialize
•   Shift : ‘she’
•   Left-Arc : ‘she ! bought’
Nivre’s Algorithm
                            she   bought         a   car




                                      bought
                                           a
                                           car             she ! bought

             S                             I                   A
•   Initialize
•   Shift : ‘she’
•   Left-Arc : ‘she ! bought’
Nivre’s Algorithm
                            she   bought         a   car




                                      bought
                                           a
                                           car             she ! bought

             S                             I                   A
•   Initialize
•   Shift : ‘she’
•   Left-Arc : ‘she ! bought’
•   Shift : ‘bought’
Nivre’s Algorithm
                            she   bought         a   car




                                           a
          bought                           car             she ! bought

             S                             I                   A
•   Initialize
•   Shift : ‘she’
•   Left-Arc : ‘she ! bought’
•   Shift : ‘bought’
Nivre’s Algorithm
                            she   bought            a   car




                                           a
          bought                           car                she ! bought

             S                             I                      A
•   Initialize                    •   Shift : ‘a’
•   Shift : ‘she’
•   Left-Arc : ‘she ! bought’
•   Shift : ‘bought’
Nivre’s Algorithm
                            she   bought            a   car




                 a
          bought                           car                she ! bought

             S                             I                      A
•   Initialize                    •   Shift : ‘a’
•   Shift : ‘she’
•   Left-Arc : ‘she ! bought’
•   Shift : ‘bought’
Nivre’s Algorithm
                            she   bought            a   car




                 a
          bought                           car                she ! bought

             S                             I                      A
•   Initialize                    •   Shift : ‘a’
•   Shift : ‘she’                 •   Left-Arc : ‘a ! car’
•   Left-Arc : ‘she ! bought’
•   Shift : ‘bought’
Nivre’s Algorithm
                            she   bought            a   car




                 a                                              a ! car
          bought                           car                she ! bought

             S                             I                      A
•   Initialize                    •   Shift : ‘a’
•   Shift : ‘she’                 •   Left-Arc : ‘a ! car’
•   Left-Arc : ‘she ! bought’
•   Shift : ‘bought’
Nivre’s Algorithm
                            she   bought            a   car




                                                                a ! car
          bought                           car                she ! bought

             S                             I                      A
•   Initialize                    •   Shift : ‘a’
•   Shift : ‘she’                 •   Left-Arc : ‘a ! car’
•   Left-Arc : ‘she ! bought’
•   Shift : ‘bought’
Nivre’s Algorithm
                            she   bought            a   car




                                                                   a ! car
          bought                           car                she ! bought

             S                             I                         A
•   Initialize                    •   Shift : ‘a’
•   Shift : ‘she’                 •   Left-Arc : ‘a ! car’
•   Left-Arc : ‘she ! bought’     •   Right-Arc : ‘bought " car’
•   Shift : ‘bought’
Nivre’s Algorithm
                            she   bought            a   car




                                                               bought " car
                                                                   a ! car
          bought                           car                she ! bought

             S                             I                         A
•   Initialize                    •   Shift : ‘a’
•   Shift : ‘she’                 •   Left-Arc : ‘a ! car’
•   Left-Arc : ‘she ! bought’     •   Right-Arc : ‘bought " car’
•   Shift : ‘bought’
Nivre’s Algorithm
                            she   bought            a   car




                                                               bought " car
             car                                                   a ! car
          bought                                              she ! bought

             S                             I                         A
•   Initialize                    •   Shift : ‘a’
•   Shift : ‘she’                 •   Left-Arc : ‘a ! car’
•   Left-Arc : ‘she ! bought’     •   Right-Arc : ‘bought " car’
•   Shift : ‘bought’
Nivre’s Algorithm
                            she   bought            a   car




                                                               bought " car
             car                                                   a ! car
          bought                                              she ! bought

             S                             I                         A
•   Initialize                    •   Shift : ‘a’
•   Shift : ‘she’                 •   Left-Arc : ‘a ! car’
•   Left-Arc : ‘she ! bought’     •   Right-Arc : ‘bought " car’
•   Shift : ‘bought’              •   Terminate (no need to reduce ‘car’ or ‘bought’)
Edmonds’s Algorithm
• Based on Maximum Spanning Tree algorithm
• Algorithm
  1. Build a complete graph
  2. Keep only incoming edges with the maximum scores
  3. If there is no cycle, goto #5
  4. If there is a cycle, pretend the cycle as one vertex and update
     scores for all incoming edges to the cycle; goto #2
  5. Break all cycles by removing appropriate edges in the cycle
     (edges that cause multiple heads)
Edmonds’s Algorithm
    root                      9
                10

9               saw
     20                       0
           30            30

    John         11           Mary
                     3
Edmonds’s Algorithm
    root                      9      root
                10

9               saw                              saw
     20                       0       20
           30            30                 30         30

    John         11           Mary   John                   Mary
                     3
Edmonds’s Algorithm
    root                      9      root
                10

9               saw                              saw
     20                       0       20
           30            30                 30         30

    John         11           Mary   John                   Mary
                     3
Edmonds’s Algorithm
                 root                         9      root
                                10

            9                   saw                              saw
                     20                       0       20
                           30            30                 30         30

                John                11        Mary   John                   Mary
                                     3



     root                       9
                40

29              saw
                               30
                          30

     John        31            Mary
                     3
Edmonds’s Algorithm
                 root                         9                          root
                                10

            9                   saw                                                  saw
                     20                       0                           20
                           30            30                                     30         30

                John                11        Mary                      John                    Mary
                                     3



     root                       9                    root
                40                                          40

29              saw                                         saw
                               30
                          30                                      30

     John        31            Mary                  John              Mary
                     3
Edmonds’s Algorithm
                 root                         9                          root
                                10

            9                   saw                                                  saw
                     20                       0                           20
                           30            30                                     30         30

                John                11        Mary                      John                      Mary
                                     3



     root                       9                    root                                  root
                40                                          40                                          10

29              saw                                         saw                                          saw
                               30
                          30                                      30                               30          30

     John        31            Mary                  John              Mary                John                     Mary
                     3
MaltParser vs. MSTParser
• Advantages
  - MaltParser: low complexity, more accurate for short-distance
  - MSTParser: high accuracy, more accurate for long-distance
• Merge MaltParser and MSTParser in learning stages
Choi’s Algorithm
• Projective dependency parsing algorithm
  - Motivation: do more exhaustive searches than MaltParser but
    keep the complexity lower than the one for MSTParser

  - Intuition: in projective dependency graph, every word can find
    its head from a word in adjacent phrases



   She    bought     a     car    yesterday    that   was    blue



  - Searching: starts with the edge-node, jump to its head
  - Complexity: O(k"n), k is the number of words in each phrase
Choi’s Algorithm
    0.9       0.6

A         B         C   D   E

    X
Choi’s Algorithm
    0.9       0.6                   0.9

A         B         C   D   E   A         B   C   D   E

    X                                     X
Choi’s Algorithm
    0.9         0.6                       0.9

A         B           C       D   E   A         B   C   D   E

    X                                           X


    0.9                   X

A           B         C       D   E

                0.5
      0.7
Choi’s Algorithm
    0.9         0.6                       0.9

A         B           C       D   E   A         B         C   D   E

    X                                           X


    0.9                   X                     0.7
                                          0.9
A           B         C       D   E
                                      A         B         C   D   E
                0.5
      0.7
                                                      X
Choi’s Algorithm
    0.9         0.6                             0.9

A         B           C       D         E   A         B         C   D   E

    X                                                 X


    0.9                   X                           0.7
                                                0.9
A           B         C       D         E
                                            A         B         C   D   E
                0.5
      0.7
                                                            X


          0.7
    0.9                           0.8

A           B         C       D         E

                          X
                  X
Choi’s Algorithm
    0.9         0.6                             0.9

A         B           C       D         E   A         B         C   D           E

    X                                                 X


    0.9                   X                           0.7
                                                0.9
A           B         C       D         E
                                            A          B        C   D           E
                0.5
      0.7
                                                            X


          0.7                                         0.7
    0.9                           0.8           0.9                       0.8

A           B         C       D         E   A          B        C   D           E

                          X                                         0.5
                  X                                   0.8
Applications
• Semantic Role Labeling
  - CoNLL 2008~9 shared task
• Sentence Compression
  - Relation extraction
• Sentence Alignment
  - Paraphrase detection, machine translation
• Sentiment Analysis

Dependency Parsing

  • 1.
    Dependency Parsing Jinho D. Choi University of Colorado Preliminary Exam March 4, 2009
  • 2.
    Contents • Dependency Structure - What is dependency structure? - Phrase structure vs. Dependency structure - Dependency Graph • Dependency Parsers - MaltParser: Nivre’s algorithm - MSTParser: Edmonds’s algorithm - MaltParser vs. MSTParser - Choi’s algorithm • Applications
  • 3.
    Dependency Structure • Whatis dependency? - Syntactic or semantic relation between lexicons - Syntactic: NMOD, AMOD, Semantic: LOC, MNR • Phrase Structure(PS) vs. Dependency Structure(DS) - Constituents vs. Dependencies - There are no phrasal nodes in DS. ! Each node in DS represents a word-token. - In DS, every node except the root is dependent in exactly one other node.
  • 4.
    Phrase vs. Dependency She bought a car Phrase Structure Dependency Structure S bought NP VP SBJ OBJ Pro V NP she car DET she bought Det N a a car • Not flexible with word-orders • Language dependent • No semantic information
  • 5.
    Dependency Graph • Fora sentence x = w ..w , a dependency graph G = (V , E ) 1 n x x x - V = {w = root, w , ... , w }, x 0 1 n - E = {(w , r, w ) : w " w , w ! V , w ! V - w , r ! R } x i j i j i x j x 0 x ! R = a set of all possible dependency relations in x x • Well-formed Dependency Graph Root - Unique root bought - Single head SBJ OBJ - Connected She car - Acyclic NMOD Jinho a
  • 6.
    Projectivity vs Non-projectivity •Projectivity means no cross-edges. root She bought a car root She bought a car yesterday that was blue • Why projectivity? - Regenerate the original sentence with the same word-orders - Parsing is less expressive (O(n) vs. O(n )) 2 - There are not many non-projective relations
  • 7.
    Dependency Parsers • Twostate-of-art dependency parsers - MaltParser: performed the best in CoNLL 2007 shared task - MSTParser: performed the best in CoNLL 2006 shared task • MaltParser - Developed by Johan Hall, Jens Nilsson, and Joakim Nivre - Nivre’s algorithm(p, O(n)), Covington’s algorithm(n, O(n )) 2 • MSTParser - Developed by Ryan McDonald - Eisner’s algorithm(p,O(k log k)), Edmonds’s algorithm(n, O(kn ) 2
  • 8.
    Nivre’s Algorithm • Basedon Shift-Reduce algorithm • S = a stack • I = a list of remaining input tokens she bought a car
  • 9.
    Nivre’s Algorithm she bought a car S I A
  • 10.
    Nivre’s Algorithm she bought a car S I A • Initialize
  • 11.
    Nivre’s Algorithm she bought a car she bought a car S I A • Initialize
  • 12.
    Nivre’s Algorithm she bought a car she bought a car S I A • Initialize • Shift : ‘she’
  • 13.
    Nivre’s Algorithm she bought a car bought a she car S I A • Initialize • Shift : ‘she’
  • 14.
    Nivre’s Algorithm she bought a car bought a she car S I A • Initialize • Shift : ‘she’ • Left-Arc : ‘she ! bought’
  • 15.
    Nivre’s Algorithm she bought a car bought a she car she ! bought S I A • Initialize • Shift : ‘she’ • Left-Arc : ‘she ! bought’
  • 16.
    Nivre’s Algorithm she bought a car bought a car she ! bought S I A • Initialize • Shift : ‘she’ • Left-Arc : ‘she ! bought’
  • 17.
    Nivre’s Algorithm she bought a car bought a car she ! bought S I A • Initialize • Shift : ‘she’ • Left-Arc : ‘she ! bought’ • Shift : ‘bought’
  • 18.
    Nivre’s Algorithm she bought a car a bought car she ! bought S I A • Initialize • Shift : ‘she’ • Left-Arc : ‘she ! bought’ • Shift : ‘bought’
  • 19.
    Nivre’s Algorithm she bought a car a bought car she ! bought S I A • Initialize • Shift : ‘a’ • Shift : ‘she’ • Left-Arc : ‘she ! bought’ • Shift : ‘bought’
  • 20.
    Nivre’s Algorithm she bought a car a bought car she ! bought S I A • Initialize • Shift : ‘a’ • Shift : ‘she’ • Left-Arc : ‘she ! bought’ • Shift : ‘bought’
  • 21.
    Nivre’s Algorithm she bought a car a bought car she ! bought S I A • Initialize • Shift : ‘a’ • Shift : ‘she’ • Left-Arc : ‘a ! car’ • Left-Arc : ‘she ! bought’ • Shift : ‘bought’
  • 22.
    Nivre’s Algorithm she bought a car a a ! car bought car she ! bought S I A • Initialize • Shift : ‘a’ • Shift : ‘she’ • Left-Arc : ‘a ! car’ • Left-Arc : ‘she ! bought’ • Shift : ‘bought’
  • 23.
    Nivre’s Algorithm she bought a car a ! car bought car she ! bought S I A • Initialize • Shift : ‘a’ • Shift : ‘she’ • Left-Arc : ‘a ! car’ • Left-Arc : ‘she ! bought’ • Shift : ‘bought’
  • 24.
    Nivre’s Algorithm she bought a car a ! car bought car she ! bought S I A • Initialize • Shift : ‘a’ • Shift : ‘she’ • Left-Arc : ‘a ! car’ • Left-Arc : ‘she ! bought’ • Right-Arc : ‘bought " car’ • Shift : ‘bought’
  • 25.
    Nivre’s Algorithm she bought a car bought " car a ! car bought car she ! bought S I A • Initialize • Shift : ‘a’ • Shift : ‘she’ • Left-Arc : ‘a ! car’ • Left-Arc : ‘she ! bought’ • Right-Arc : ‘bought " car’ • Shift : ‘bought’
  • 26.
    Nivre’s Algorithm she bought a car bought " car car a ! car bought she ! bought S I A • Initialize • Shift : ‘a’ • Shift : ‘she’ • Left-Arc : ‘a ! car’ • Left-Arc : ‘she ! bought’ • Right-Arc : ‘bought " car’ • Shift : ‘bought’
  • 27.
    Nivre’s Algorithm she bought a car bought " car car a ! car bought she ! bought S I A • Initialize • Shift : ‘a’ • Shift : ‘she’ • Left-Arc : ‘a ! car’ • Left-Arc : ‘she ! bought’ • Right-Arc : ‘bought " car’ • Shift : ‘bought’ • Terminate (no need to reduce ‘car’ or ‘bought’)
  • 28.
    Edmonds’s Algorithm • Basedon Maximum Spanning Tree algorithm • Algorithm 1. Build a complete graph 2. Keep only incoming edges with the maximum scores 3. If there is no cycle, goto #5 4. If there is a cycle, pretend the cycle as one vertex and update scores for all incoming edges to the cycle; goto #2 5. Break all cycles by removing appropriate edges in the cycle (edges that cause multiple heads)
  • 29.
    Edmonds’s Algorithm root 9 10 9 saw 20 0 30 30 John 11 Mary 3
  • 30.
    Edmonds’s Algorithm root 9 root 10 9 saw saw 20 0 20 30 30 30 30 John 11 Mary John Mary 3
  • 31.
    Edmonds’s Algorithm root 9 root 10 9 saw saw 20 0 20 30 30 30 30 John 11 Mary John Mary 3
  • 32.
    Edmonds’s Algorithm root 9 root 10 9 saw saw 20 0 20 30 30 30 30 John 11 Mary John Mary 3 root 9 40 29 saw 30 30 John 31 Mary 3
  • 33.
    Edmonds’s Algorithm root 9 root 10 9 saw saw 20 0 20 30 30 30 30 John 11 Mary John Mary 3 root 9 root 40 40 29 saw saw 30 30 30 John 31 Mary John Mary 3
  • 34.
    Edmonds’s Algorithm root 9 root 10 9 saw saw 20 0 20 30 30 30 30 John 11 Mary John Mary 3 root 9 root root 40 40 10 29 saw saw saw 30 30 30 30 30 John 31 Mary John Mary John Mary 3
  • 35.
    MaltParser vs. MSTParser •Advantages - MaltParser: low complexity, more accurate for short-distance - MSTParser: high accuracy, more accurate for long-distance • Merge MaltParser and MSTParser in learning stages
  • 36.
    Choi’s Algorithm • Projectivedependency parsing algorithm - Motivation: do more exhaustive searches than MaltParser but keep the complexity lower than the one for MSTParser - Intuition: in projective dependency graph, every word can find its head from a word in adjacent phrases She bought a car yesterday that was blue - Searching: starts with the edge-node, jump to its head - Complexity: O(k"n), k is the number of words in each phrase
  • 37.
    Choi’s Algorithm 0.9 0.6 A B C D E X
  • 38.
    Choi’s Algorithm 0.9 0.6 0.9 A B C D E A B C D E X X
  • 39.
    Choi’s Algorithm 0.9 0.6 0.9 A B C D E A B C D E X X 0.9 X A B C D E 0.5 0.7
  • 40.
    Choi’s Algorithm 0.9 0.6 0.9 A B C D E A B C D E X X 0.9 X 0.7 0.9 A B C D E A B C D E 0.5 0.7 X
  • 41.
    Choi’s Algorithm 0.9 0.6 0.9 A B C D E A B C D E X X 0.9 X 0.7 0.9 A B C D E A B C D E 0.5 0.7 X 0.7 0.9 0.8 A B C D E X X
  • 42.
    Choi’s Algorithm 0.9 0.6 0.9 A B C D E A B C D E X X 0.9 X 0.7 0.9 A B C D E A B C D E 0.5 0.7 X 0.7 0.7 0.9 0.8 0.9 0.8 A B C D E A B C D E X 0.5 X 0.8
  • 43.
    Applications • Semantic RoleLabeling - CoNLL 2008~9 shared task • Sentence Compression - Relation extraction • Sentence Alignment - Paraphrase detection, machine translation • Sentiment Analysis