2. Grammar Development
What is a Grammar Development Platform good for?
English: Anna sees the man.
English c-str and f-str
MT
German f-str
German: Anna sieht den
Mann.
• Information Retrieval/Extraction
• Machine Translation (MT)
Parser
Generator
XLE
3. A Sample Development Platform
• Platforms: Unix (Solaris), Linux, MacOsX
• Software (Shareware): Emacs, Tcl/Tk
XLE (Xerox Linguistic Environment)
• Main Developer: John Maxwell (PARC)
4. A Sample Development Platform
• Performance: Worst-case exponential,
polynomial in practice (makes broad-coverage
grammars feasible)
• Parser: Bottom-Up, Left-to-Right
XLE (Xerox Linguistic Environment)
• Linguistic Theory: LFG (Lexical-Functional
Grammar) orginally developed by Ronald M.
Kaplan (PARC) and Joan Bresnan (Stanford)
5. Palo Alto Research
Center (PARC),
English Grammar
IMS, University of Stuttgart
German Grammar
Fuji Xerox
Japanese Grammar
University of Bergen
Norwegian: Bokmal and Nynorsk
UMIST
Urdu Grammar
XRCE Grenoble
French Grammar
The
ParGram
Project
6. ParGram
Possible Applications:
• Machine Translation (French, English)
• Tree Banking (English, German)
• Smart Text Annotation (German)
• Robust Parsing (English, German, French)
• Information Extraction (English)
• Teaching Tools (Urdu)
7. Grammar Components
Each Grammar Contains:
• Phrase Structure Rules (S NP VP)
• Lexicon (verb stems and functional elements)
• Finite-State Morphological Analyzer
No Semantics
8. Phrase Structure Rules
Formulation as used today goes back to Chomsky 1957.
Sample Set for English:
S NP VP
VP V NP
NP D (ADJ) N
Why these kinds of rules?
• Natural Language is recursive and potentially infinite.
• Constituency, X-bar Theory
9. Phrase Structure Rules
The syntax of natural languages is context-free.
Colorless green ideas sleep furiously.
However, we must also deal with context-sensitive
information.
The monkey sleeps.
The monkey sleep. The monkeys sleeps.
10. Features and Unifications
Context-Sensitivity can be achieved in many ways.
XLE and LFG (like many other theories/platforms) uses
phrase-structure annotation via attribute-value pairs.
S NP VP
(SUBJ) = (SUBJ NUM) = ( NUM)
XLE
Features are checked via Unificaition.
11. The Ambiguity Problem
PP-Attachment
The girl saw the monkey with the telescope.
XLE
Categorial Ambiguity
Flying planes can be dangerous.
Time flies like an arrow.
12. Lexicons
• Category Information (Terminal Node in Tree)
• Context Sensitive Featural Information
• Subcategorization Information
• Semantics (sometimes)
Typically Contain:
XLE
13. Ambiguity in Large Grammars
Ambiguity: a serious problem even in simple sentences
• PP-attachment (English)
• Subject/Object Ambiguities (German)
Within XLE various techniques have been invented to cut down
on the explosion of parses.
• Optimality Marking
• Packed Representations XLE
15. Parallel Analyses
English: Yassin was seen.
German: Yassin wurde gesehen.
Urdu: yassin dekha gaya
Languages Differ on the Surface (c-structure)
ParGram Goal: The same underlying f-structures
for all languages (modulo lexical semantics).
XLE
16. The “Parallel” in ParGram
Analyses at the level of f-structure are held as parallel as
possible across languages (crosslinguistic invariance).
• Theoretical Advantage: This models the idea of UG.
• Applicational Advantage: machine translation is made
easier.
Analyses at the level of c-structure are allowed to differ
much more (variance across languages).
17. FST Morphological Analyzers
Kaplan and Butt (2002): this LFG morphology-syntax interface is
natural:
calana ‘to drive’
(M.Sg)
drive+Verb+Inf+M+S
g
Sequence Relation
surface
form
[VFORM inf]
f-structure
(m-structure)
Lexical Relation [NUM sg]
[GEND masc]
Satisfaction Relation
Seq
L
Sat
PRED ‘drive<Subj,Obj>’
VFORM inf
GEND masc
NUM sg