8. DEFINITION OF PARSING
A parser is a compiler or interpreter component that breaks data into
smaller elements for easy translation into another language.
A parser takes input in the form of a sequence of tokens or program
instructions and usually builds a data structure in the form of a parse
tree or an abstract syntax tree.
9. Role of Parsers
• performs context-free syntax analysis
• guides context-sensitive analysis
• constructs an intermediate representation
• produces meaningful error messages
• attempts error correction
10. Parsing
• POS tags give information about the individual words, and their
internal form (eg sing vs plur, tense of verb)
• Additional level of information concerns the way the words relate to
each other
• the overall structure of each sentence
• the relationships between the words
• This can be achieved by parsing the corpus
11. Parsing Techniques
• Parsing adds information about sentence structure and constituents
• Allows us to see what constructions words enter into
• eg, transitivity, passivization, argument structure for verbs
• Allows us to see how words function relative to each other
• eg, what words can modify / be modified by other words
12. Parsing Issues
• Besides lexical ambiguities (usually resolved by tagger), language can
be structurally ambiguous
• global ambiguities due to ambiguous words and/or alternative possible
combinations
• local ambiguities, especially due to attachment ambiguities, and other
combinatorial possibilities
• sheer weight of alternatives available in the absence of (much) knowledge
13. Parsing strategies
• Start with a basic grammar, possibly written by hand, with all rules equally
probable
• Parse a small amount of text, then correct it manually
• this may involve correcting the trees and/or changing the grammar
• Learn new probabilities from this small treebank
• Parse another (similar) amount of text, then correct it manually
• Adjust the probabilities based on the old and new trees combined
• Repeat until the grammar stabilizes
14.
15.
16. Types of Parsing
Top-down parsers (LL(1), recursive descent)
• Start at the root of the parse tree and grow toward leaves
• Pick a production & try to match the input
• Bad “pick” may need to backtrack
• Some grammars are backtrack-free
Bottom-up parsers (LR(1), operator precedence)
• Start at the leaves and grow toward root
• As input is consumed, encode possibilities in an internal state
• Start in a state valid for legal first tokens
• Bottom-up parsers handle a large class of grammars