2. Why Study Syntax?
Syntax provides
• systematic rules for forming new sentences in a language.
• can be used to verify if a sentence is legitimate in a language.
• a step closer to the “meaning” of a sentence.
– Who did what to whom semantics
Applications
• Improving precision in search applications
– Yankees beat red sox
– Red sox beat yankees
• Paraphrasing
– John loves Mary = Mary is loved by John
• Information Extraction
– Fill in a form by extracting information from a document.
3. Structure of Words
What are words?
• Orthographic tokens separated by white space.
In some languages the distinction between words and
sentences is less clear.
• Chinese, Japanese: no white space between words
– nowhitespace no white space/no whites pace/now hit esp ace
• Turkish: words could represent a complete “sentence”
– Eg: uygarlastiramadiklarimizdanmissinizcasina
Morphology: the structure of words
• Basic elements: morphemes
• Morphological Rules: how to combine morphemes.
Syntax: the structure of sentences
• Rules for ordering words in a sentence
• Elementary units: Phrasal and Clauses
4. Morphology and Syntax
Interplay between syntax and morphology
• How much information does a language allow to be packed in a
word, and how easy is it to unpack.
• More information less rigid syntax more free word order
• Hindi: “John likes Mary” – all six orders are possible, due to rich
morphological information.
– John-nom Mary-acc likes
English expresses relations between words through word order.
Morphologically rich languages have freer word order.
• However, some parts have rigid word order.
– Noun groups in Hindi: “one yellow book”
5. Outline
Constituency
• How does this notion arise?
• Type of constituents
• Representation: Tree Structure
Formal device: Context Free Grammars
• Derived tree and derivation tree
• Grammar Equivalence
– Strong and weak generative capacity
– Chomsky Normal Form
• Other Formal Frameworks (Tree-Adjoining Grammar)
Other topics in syntax
• Dependency
• Spoken language syntax
• Structural Priming
6. Constituency
Words are grouped into part-of-speech groups
• Similar morphological inflections
• Allows us to create new word forms (“blog”, “xerox”)
• Nouns, Verbs, Determiners, Adjectives etc…
Certain sequences of words in a sentence are grouped as
constituents
• Distributionally similar behavior
• cohesive units (move around in a sentence as a unit)
– In the morning I take a walk
– I take a walk in the morning
• Substrings are typed “Clause”, “Noun Phrase”, “Verb Phrase”
“Preposition Phrase” etc.
7. Constituency – contd.
Examples of constituents:
• Noun phrase:
– the dog, two big light blue vans
• Preposition phrase:
– in the box, under the bridge
• Clause:
– the dog bit the man, John thought the dog bit the man
The type of a constituent is derived from the “head word” of
the constituent.
8. Constituent Structure
Decomposition of a sentence into its constituents.
Attaching constituents to each other to reflect relations among words:
Emergence of Tree Structure
• John saw the man with the telescope
• (S (NP John) saw (NP (NP the man) (PP with (NP the telescope))))
• (S (NP John) saw (NP the man) (PP with (NP the telescope))))
Select a sentence from a newspaper text and provide its constituent
structure.
Evidence of another constituent – verb phrase (“VP”)
• Substring involving a verb move around and can be referred to as a unit.
– VP-fronting (and quickly clean the carpet he did! )
– VP-ellipsis (He cleaned the carpets quickly, and so did she )
– Can have adjuncts before and after VP, but not in VP (He often eats beans, *he
eats often beans )
9. Relations among Words
Types of relations between words
• Arguments: subject, object, indirect object, prepositional object
• Adjuncts: temporal, locative, causal, manner, …
• Function Words
Subcategorization: List of arguments of a word (verb)
• with features about realization (POS, perhaps case, verb form etc)
For English, the argument order: Subject-Object-IndirectObj
Example:
• like: NP-NP (“John likes Mary”), NP-VP(to-inf) (John likes to watch movies)
• think: NP-S (“John thought Mary was going to the party”)
• put: NP-NP-PP
Adjuncts are optional (typically modifiers of an action)
• John put the book on the table at 3pm yesterday
There are words with “demands” and words that fill the “demands”.
• Demands are typed (NP, VP, PP, S)
10. English Syntax: A Sample
Sentence types:
• Declarative (John closed the door)
• Imperative (close the door!!)
• Yes-No-Question (can you close the door?)
• Wh-question (who closed the door? What did John close?)
Clause types:
• Infinitival (to read a book)
• Gerundive (reading of a book)
• Relative Clause (that has a green cover)
11. English Syntax: A Sample – contd.
Noun Phrase:
• Before the head noun:
– Pre-determiner Determiner Post-determiner (Adjective|Noun) Noun
• After the head noun (Modifiers)
– Preposition phrases
– Relative Clauses (the book that has only one sentence)
– Gerundive (the flight arriving after 10pm)
Auxiliary Verbs
• Modal (could, might, will, should…) < perfect (have) < progressive (be) <
passive (be)
• “might have been destroyed”
Large wide-coverage grammars have been developed/under
development
• XTAG (www.cis.upenn.edu/~xtag), HPSG, LFG
12. Two Representations of Syntactic Structure
Phrase structure: illustrates the constituents and its type.
Dependency structure: Relations between words without
intervening structure.
reads
boy book
the a
boy
the
reads
book
a
DetP
NP NP
DetP
S
Adv
slowly
slowly
adj
arg0
arg1
fw
fw
13. Context Free-Grammars
String Rewriting Systems
• Transform one string to another (until termination)
G=(V,T,P,S)
where V: vocabulary of non-terminals
T: vocabulary of terminals
S: start symbol
P: set of productions of the form
a b where a V and b (V U T)*
Derivation: Rewrite a non-terminal with the production of the grammar until
no non-terminals exist in the string.
• Start with “S”
Sample Context-Free Grammar, derivation and derived structure.
14. Two Representations
String rewriting system: we derive a string (=derived structure)
But derivation history represented by phrase-structure tree
(=derivation structure)!
Grammar Equivalence
• Can have different grammars that generate same set of strings (weak
equivalence)
• Can have different grammars that have same set of derivation trees (strong
equivalence)
• Strong equivalence implies weak equivalence
CFG Normal Forms:
• Chomsky Normal Form (a b g)
• Griebarch Normal Form (a w b)
• Convert a grammar into CNF and GNF
15. Penn Treebank (PTB)
Syntactically annotated corpus (phrase structure)
Contains 1 miilion words of Wall Street Journal sentences marked
up with syntactic structure.
• Can be converted into a dependency Treebank.
– need for head percolation tables
• Completely flat structure in NP
– brown bag lunch, pink-and-yellow child seat
• Represents a particular linguistic theory
PropBank
• PTB with some grammatical relations made explicit
16. Unification
Mechanism needed to pass and check constraints.
Constraints, syntactic and semantic:
• Subject-verb agreement
– S NP VP
– the boy reads / the boys read / * the boys reads
• Subject/Auxiliary inversion: (Yes-no-question)
– S AuxVerb NP VP
– Do you have flights / * does you have flights
• Selectional restrictions:
– An apple reads a book
Need a mechanism to encode these constraints
• Refine the non-terminal set to encode these constraints.
• S 3sgAux 3sgNP VP ; 3sgAux does | has …
• S Non3sgAux Non3sgNP VP; Non3sgAux do | have | can
• We need to split the NP rule into the 3sgNP and Non3sgNP.
• Size of the grammar grows;
• can we factor these constraints out of the structure of the rules?
17. Unification – contd.
Attribute value matrix:
boy : Number
Person
sg
3
Cat N
read : Number
Person
pl
3
Cat V
Subj agr
NP.number = VP.subj.agr.number
NP.person = VP.subj.agr.person
S NP VP
reads: Number sg
Cat V
Subj agr
VP V
VP.number = V.subj.agr.number
VP.person = V.subj.agr.person
Percolate Constraints Check Constraints
The boy reads / * the boys reads / the boys read
boys : Number
Person
pl
3
Cat N
Number sg
Person 1|2
18. Structural Priming
Structure of preceding sentences helps/hinders the reading times of
subsequent sentences.
• Dative alternation
– The woman gave her car to the church
– The woman gave the church her car
• One of these forms is primed depending on what the prime was
– V NP NP gave the church her car
– V NP PP gave her car to the church
19. Spoken Language Syntax
Not as “clean”, rampant disfluency.
• edits (restarts, repairs)
• Filled pauses
• Ungrammaticality
Sentence utterance.
“Clean up” the utterance first before understanding it.