74.406 Natural Language Processing
- English Grammar -
(Mostly) English Grammar
 Morphology, Word Classes, POS Tagging
 Grammar Extensions on the Sentence and
Phrase Level
• Sentence Level Constructs
• Noun Phrase - Modifications
• Verb Phrase - Subcategorization
(Jurafsky, Ch. 3, 6.1, 8 and 9; Allen Ch. 2)
Morphology
Basics of Morphology
Morpheme = "minimal meaning-bearing unit in a language"
e.g. cats, cat, -s
• Non-Concatenative Morphology
– templatic morphology: modify word templates
– Hebrew: lmd (study, learn) - limed ("he taught") - lumad
("he was taught")
• Concatenative Morphology
– word stem + prefix + suffix (+ infix + circumfix)
• Inflectional Morphology
– word stem + grammatical morpheme; same word class; cat+s
• Derivational Morphology
– word stem + grammat. morpheme; other word class; mob+b+ing
Inflectional Morphology
Inflectional Morphology
word stem + grammatical morpheme cat+s
only for nouns, verbs, and some adjectives
• Nouns
– plural:
regular: +s, +es irregular: mouse - mice; ox - oxen
rules for exceptions: e.g. -y -> -ies like: butterfly - butterflies
– possessive: +'s, +'
• Verbs
– main verbs (sleep, eat, walk)
– modal verbs (can, will, should)
– primary verbs (be, have, do)
Inflectional Morphology (verbs)
Verb Inflections only for:
main verbs (sleep, eat, walk); primary verbs (be, have, do)
Morpholog. Form Regularly Inflected Form
• stem walk merge try map
• -s form walks merges tries maps
• -ing participle walking merging trying mapping
• past; -ed participle walked merged tried mapped
Morph. Form Irregularly Inflected Form
• stem eat catch cut
• -s form eats catches cuts
• -ing participle eating catching cutting
• -ed past ate caught cut
• -ed participle eaten caught cut
Inflectional and Derivational Morphology
(adjectives)
Adjective Inflections and Derivations:
• prefix un- unhappy adjective, negation
• suffix -ly happily adverb, mode
-er happier adjective, comparative 1
-est happiest adjective, comparative 2
• suffix -ness happiness noun
plus combinations, like unhappiest, unhappiness.
Distinguish different adjective classes, which can or cannot
take certain inflectional or derivational forms, e.g. no
negation for big.
Morphological Processing
• Knowledge
– lexical entry: stem plus possible prefixes, suffixes plus
word classes, e.g. endings for verb forms (see tables
above)
– rules: how to combine stem and affixes, e.g. add s to
form plural of noun as in dogs
– orthographic rules: spelling, e.g. double consonant as
in mapping
• Processing: Finite State Transducers
– take information above and analyze word token /
generate word form
Fig. 3.3 FSA for verb inflection.
Fig. 3.5 More detailed FSA for adjective inflection.
Fig. 3.4 Simple FSA for adjective inflection.
Fig. 3.7 Compiled FSA for noun inflection.
Fig. 3.12 Lexical and intermediate tape of a FS Transducer
Fig. 3.13 Lexical, intermediate, and surface tape after spelling transformation.
Word Classes and
POS Tagging
Word Classes
Sort words into categories according to:
• morphological properties
Which types of morphological forms do they take?
e.g. form plural: noun+s; 3rd person: verb+s
• distributional properties
What other words or phrases can occur nearby?
e.g. possessive pronoun before noun
• semantic coherence
Classify according to similar semantic type.
e.g. nouns refer to object-like entities
Open vs. Closed Word Classes
Open Class Types
The set of words in these classes can
change over time, with the development of
the language, e.g. spaghetti and download
Open Class Types:
nouns, verbs, adjectives, adverbs
Open vs. Closed Word Classes
Closed Class Types
The set of words in these classes are very
much determined and hardly ever change
for one language.
Closed Class Types:
prepositions, determiners, pronouns,
conjunctions, auxiliary verbs, particles,
numerals
Open Class Words: Nouns
Nouns
denote objects, concepts, entities, events
Proper Nouns
Names for specific individual objects, entities
e.g. the Eiffel Tower, Dr. Kemke
Common Nouns
Names for categories, classes, abstracts, events
e.g. fruit, banana, table, freedom, sleep, race, ...
Count Nouns
enumerable entities, e.g. two bananas
Mass Nouns
not countable items, e.g. water, salt, freedom
Open Class Words: Verbs
Verbs
denote actions, processes, and states
e.g. smoke, dream, rest, run
several morphological forms, e.g.
non-3rd person - eat
3rd person - eats
progressive/ - eating
present participle/
gerundive
past participle - eaten
simple past - ate
Open Class Words: Verbs (2)
Verbs - use of morphological forms, examples:
non-3rd person eat I eat. We eat. They eat.
3rd person eats He eats. She eats. It eats.
progressive eating He is eating.
He will be eating.
He has been eating.
e.g. present participle He is eating.
gerundive Eating scorpions [NP] is
common in China.
use as adjective Eating children [NP] are
common at McDonalds.
past participle eaten He has eaten the scorpion.
The scorpion was eaten.
simple past ate He ate the scorpion.
Open Class Words: Adjectives
Adjectives
denote qualities or properties of objects
e.g. heavy, blue, content
most languages have concepts for
colour - white, green, ...
age - young, old, ...
value - good, bad, ...
not all languages have adjectives as separate class
Open Class Words: Adverbs 1
Adverbs
denote modifications of actions (verbs) or qualities
(adjectives)
e.g. walk slowly or heavily drunk
Directional or Locational adverbs
specify direction or location
e.g. go home, stay here
Open Class Words: Adverbs 2
Degree Adverbs
specify extent of process, action, property
e.g. extremely slow, very modest
Manner Adverbs
specify manner of action or process
e.g. walk slowly, run fast
Temporal Adverbs
specify time of event or action
e.g. yesterday, Monday
Closed Word Classes
Closed Class Types:
Prepositions: on, under, over, at, from, to, with, ...
Determiners: a, an, the, ...
Pronouns: he, she, it, his, her, who, I, ...
Conjunctions: and, or, as, if, when, ...
Auxiliary verbs: can, may, should, are, …
Particles: up, down, on, off, in, out, …
Numerals: one, two, three, ..., first, second, ...
Closed Word Class: Prepositions
Prepositions
occur before noun phrases;
describe relations;
often spatial or temporal relations
e.g. on the table spatial
in two hours temporal
Closed Word Class: Pronouns
Pronouns
reference to entities, events, relations etc.
Personal Pronouns
refer to persons or entities,
e.g. you, he, it, ...
Possessive Pronouns
possession or relation between person and object,
e.g. his, her, my, its, ...
Wh-Pronouns
reference in question or back reference,
e.g. Who did this ..., Frieda, who is 80 years old ...
Closed Word Class: Conjunctions
Conjunctions
join phrases or sentences
semantics is varied and complex
Coordinating Conjunction
Join two phrases or sentences on the same level
through conjunctions like and, or, but, ...
e.g. He takes a cat and a dog.
He takes a dog and she takes a cat.
Subordinating Conjunction
Connect embedded phrases through e.g. that
e.g. He thinks that the cat is nicer than the dog.
Closed Word Class: Auxiliary Verbs
Auxiliary Verbs
Mark semantic features of main verb.
Often describe tense and modality aspects.
Semantics is difficult.
Tense
addition expressing present, past or future, ...
e.g. He will take the cat home.
Aspect
addition expressing completion of action
e.g. He is taking the cat home. (incomplete)
Mood
addition expressing necessity of action
e.g. He can take the cat home. (possible)
Closed Word Class: Copula, Modal Verbs
Copula (be, do, have) and Modal Verbs (can, should,
...) are subclasses of Auxiliary Verbs.
Describe state, process, or tense / modality of action.
Semantics: difficult (e.g. modal logic)
State / Process: be and do
e.g. He is at home. He does nothing.
Tense: have
e.g. He has taken the cat home.
Modality: can, ought to, should, must
e.g. He can take the cat home. (possibility)
POS Tagging - Taggers
Methods for POS Tagging:
Rule-Based Tagging
use dictionary to assign POS; then use rules to
disambiguate words
Stochastic Tagging
determines tags based on the probability of the
occurrence of the tag, given the observed word, in the
context of the preceding tags. Similar to Hidden
Markov Models (probabilistic finite state machines).
Learn tagging rules.
Problem in POS Tagging: Ambiguity
Problem in POS Tagging: Which tag set to use?
POS Tagging - Tagsets
Tagsets for English
 Penn Treebank, 45 tags
 Brown corpus, 87 tags
 C5 tagset, 61 tags
 C7 tagset, 146 tags
For references see Jurafsky, p.296
C5 and C7 tagsets are listed in Appendix C
Fig. 8.6 Penn Treebank, 45 tags
Fig. 8.5 English modal verbs and frequency counts from the CELEX
on-line dictionary.
Ambiguity in POS Tagging
Fig. 8.7 Word types and ambiguity in the Brown corpus.
Sentence Level Constructs
Sentence Level Constructs I
Sentence Level Constructs I
declarative
“This flight leaves at 9 am.”
S → NP VP
imperative
“Book this flight for me.”
S → VP
Sentence Level Constructs II
Sentence Level Constructs II
yes-no-question
“Does this flight leave at 9 am?”
S → Aux NP VP
wh-question
“When does this flight leave Winnipeg?”
S → Wh-NP Aux NP VP
Noun Phrase Modification 1
Noun Phrase Modifiers
head = the central noun of the NP
modifiers = additions to head noun included in NP
• modifiers before the head noun (prenominal)
• modifiers after the head noun (post-nominal)
examples: determiners, adjectives, PPs
e.g. the young man
the girl with the red hat
Noun Phrase Modification - Prenominal
 determiner
the, a, this, some, ...
 predeterminer
all the flights
 cardinal numbers, ordinal numbers
one flight, the first flight, ...
 quantifiers
much, little
Noun Phrase Modification - Prenominal
 adjectives
a first-class flight, a long flight
 adjective phrase
the least expensive flight
Grammar Rule
NP → (Det) (Card) (Ord) (Quant) (AP) Nominal
PROJECT!
Noun Phrase Modification - Postnominal
 prepositional phrase PP
all flights from Chicago
Nominal → Nominal PP (PP) (PP)
 non-finite clause, gerundive postmodifers
all flights arriving after 7 pm
Nominal → GerundVP
GerundVP → GerundV NP | GerundV PP | ...
 relative clause
a flight that serves breakfast
Nominal → Nominal RelClause
RelClause → (who | that) VP
Verb Subcategorization
Verb Subcategorization
• Different verbs accept or need different constituents
or complements.
VP = Verb + other constituents (complements)
e.g. He buys the books.
• Verbs can be classified according to the
complements they accept or need.
e.g. give needs two complements
He gave her the books.
sleep accepts no complement
He sleeps.
Verb Complements
 sentential complement
VP  Verb inf-sentence
I want to fly from Boston to Chicago.
 NP complement
VP  Verb NP
I want this flight.
 no complement
VP  Verb
I sleep.
Other Verb Complements
Prepositional Phrases + other Modifiers
can be added to specify location or time of
action, state or event described by verb
• VP  Verb PP PP
I fly from Boston to Chicago.
• VP  Verb PP
I sleep in the barn.
• VP  Verb PP ADV
I sleep in the barn tonight.
Assignment 1-B
Extend the grammar in the Earley Parser by
integrating:
1. complex VPs through sub-categorization and
complements
2. complex NPs through pre- and post-modifiers
3. some adverbs (e.g. temporal or manner) plus
rule extensions
You should define 3-5 new / modified rules in each
category.
Write down the new rules, and add sample parse
outputs generated with the parser program, to illustrate
the working of your rules (last chart state is sufficient).

Natural Language Processing - English Grammar

  • 1.
    74.406 Natural LanguageProcessing - English Grammar - (Mostly) English Grammar  Morphology, Word Classes, POS Tagging  Grammar Extensions on the Sentence and Phrase Level • Sentence Level Constructs • Noun Phrase - Modifications • Verb Phrase - Subcategorization (Jurafsky, Ch. 3, 6.1, 8 and 9; Allen Ch. 2)
  • 2.
  • 3.
    Basics of Morphology Morpheme= "minimal meaning-bearing unit in a language" e.g. cats, cat, -s • Non-Concatenative Morphology – templatic morphology: modify word templates – Hebrew: lmd (study, learn) - limed ("he taught") - lumad ("he was taught") • Concatenative Morphology – word stem + prefix + suffix (+ infix + circumfix) • Inflectional Morphology – word stem + grammatical morpheme; same word class; cat+s • Derivational Morphology – word stem + grammat. morpheme; other word class; mob+b+ing
  • 4.
    Inflectional Morphology Inflectional Morphology wordstem + grammatical morpheme cat+s only for nouns, verbs, and some adjectives • Nouns – plural: regular: +s, +es irregular: mouse - mice; ox - oxen rules for exceptions: e.g. -y -> -ies like: butterfly - butterflies – possessive: +'s, +' • Verbs – main verbs (sleep, eat, walk) – modal verbs (can, will, should) – primary verbs (be, have, do)
  • 5.
    Inflectional Morphology (verbs) VerbInflections only for: main verbs (sleep, eat, walk); primary verbs (be, have, do) Morpholog. Form Regularly Inflected Form • stem walk merge try map • -s form walks merges tries maps • -ing participle walking merging trying mapping • past; -ed participle walked merged tried mapped Morph. Form Irregularly Inflected Form • stem eat catch cut • -s form eats catches cuts • -ing participle eating catching cutting • -ed past ate caught cut • -ed participle eaten caught cut
  • 6.
    Inflectional and DerivationalMorphology (adjectives) Adjective Inflections and Derivations: • prefix un- unhappy adjective, negation • suffix -ly happily adverb, mode -er happier adjective, comparative 1 -est happiest adjective, comparative 2 • suffix -ness happiness noun plus combinations, like unhappiest, unhappiness. Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e.g. no negation for big.
  • 7.
    Morphological Processing • Knowledge –lexical entry: stem plus possible prefixes, suffixes plus word classes, e.g. endings for verb forms (see tables above) – rules: how to combine stem and affixes, e.g. add s to form plural of noun as in dogs – orthographic rules: spelling, e.g. double consonant as in mapping • Processing: Finite State Transducers – take information above and analyze word token / generate word form
  • 8.
    Fig. 3.3 FSAfor verb inflection.
  • 9.
    Fig. 3.5 Moredetailed FSA for adjective inflection. Fig. 3.4 Simple FSA for adjective inflection.
  • 10.
    Fig. 3.7 CompiledFSA for noun inflection.
  • 11.
    Fig. 3.12 Lexicaland intermediate tape of a FS Transducer Fig. 3.13 Lexical, intermediate, and surface tape after spelling transformation.
  • 12.
  • 13.
    Word Classes Sort wordsinto categories according to: • morphological properties Which types of morphological forms do they take? e.g. form plural: noun+s; 3rd person: verb+s • distributional properties What other words or phrases can occur nearby? e.g. possessive pronoun before noun • semantic coherence Classify according to similar semantic type. e.g. nouns refer to object-like entities
  • 14.
    Open vs. ClosedWord Classes Open Class Types The set of words in these classes can change over time, with the development of the language, e.g. spaghetti and download Open Class Types: nouns, verbs, adjectives, adverbs
  • 15.
    Open vs. ClosedWord Classes Closed Class Types The set of words in these classes are very much determined and hardly ever change for one language. Closed Class Types: prepositions, determiners, pronouns, conjunctions, auxiliary verbs, particles, numerals
  • 16.
    Open Class Words:Nouns Nouns denote objects, concepts, entities, events Proper Nouns Names for specific individual objects, entities e.g. the Eiffel Tower, Dr. Kemke Common Nouns Names for categories, classes, abstracts, events e.g. fruit, banana, table, freedom, sleep, race, ... Count Nouns enumerable entities, e.g. two bananas Mass Nouns not countable items, e.g. water, salt, freedom
  • 17.
    Open Class Words:Verbs Verbs denote actions, processes, and states e.g. smoke, dream, rest, run several morphological forms, e.g. non-3rd person - eat 3rd person - eats progressive/ - eating present participle/ gerundive past participle - eaten simple past - ate
  • 18.
    Open Class Words:Verbs (2) Verbs - use of morphological forms, examples: non-3rd person eat I eat. We eat. They eat. 3rd person eats He eats. She eats. It eats. progressive eating He is eating. He will be eating. He has been eating. e.g. present participle He is eating. gerundive Eating scorpions [NP] is common in China. use as adjective Eating children [NP] are common at McDonalds. past participle eaten He has eaten the scorpion. The scorpion was eaten. simple past ate He ate the scorpion.
  • 19.
    Open Class Words:Adjectives Adjectives denote qualities or properties of objects e.g. heavy, blue, content most languages have concepts for colour - white, green, ... age - young, old, ... value - good, bad, ... not all languages have adjectives as separate class
  • 20.
    Open Class Words:Adverbs 1 Adverbs denote modifications of actions (verbs) or qualities (adjectives) e.g. walk slowly or heavily drunk Directional or Locational adverbs specify direction or location e.g. go home, stay here
  • 21.
    Open Class Words:Adverbs 2 Degree Adverbs specify extent of process, action, property e.g. extremely slow, very modest Manner Adverbs specify manner of action or process e.g. walk slowly, run fast Temporal Adverbs specify time of event or action e.g. yesterday, Monday
  • 22.
    Closed Word Classes ClosedClass Types: Prepositions: on, under, over, at, from, to, with, ... Determiners: a, an, the, ... Pronouns: he, she, it, his, her, who, I, ... Conjunctions: and, or, as, if, when, ... Auxiliary verbs: can, may, should, are, … Particles: up, down, on, off, in, out, … Numerals: one, two, three, ..., first, second, ...
  • 23.
    Closed Word Class:Prepositions Prepositions occur before noun phrases; describe relations; often spatial or temporal relations e.g. on the table spatial in two hours temporal
  • 24.
    Closed Word Class:Pronouns Pronouns reference to entities, events, relations etc. Personal Pronouns refer to persons or entities, e.g. you, he, it, ... Possessive Pronouns possession or relation between person and object, e.g. his, her, my, its, ... Wh-Pronouns reference in question or back reference, e.g. Who did this ..., Frieda, who is 80 years old ...
  • 25.
    Closed Word Class:Conjunctions Conjunctions join phrases or sentences semantics is varied and complex Coordinating Conjunction Join two phrases or sentences on the same level through conjunctions like and, or, but, ... e.g. He takes a cat and a dog. He takes a dog and she takes a cat. Subordinating Conjunction Connect embedded phrases through e.g. that e.g. He thinks that the cat is nicer than the dog.
  • 26.
    Closed Word Class:Auxiliary Verbs Auxiliary Verbs Mark semantic features of main verb. Often describe tense and modality aspects. Semantics is difficult. Tense addition expressing present, past or future, ... e.g. He will take the cat home. Aspect addition expressing completion of action e.g. He is taking the cat home. (incomplete) Mood addition expressing necessity of action e.g. He can take the cat home. (possible)
  • 27.
    Closed Word Class:Copula, Modal Verbs Copula (be, do, have) and Modal Verbs (can, should, ...) are subclasses of Auxiliary Verbs. Describe state, process, or tense / modality of action. Semantics: difficult (e.g. modal logic) State / Process: be and do e.g. He is at home. He does nothing. Tense: have e.g. He has taken the cat home. Modality: can, ought to, should, must e.g. He can take the cat home. (possibility)
  • 28.
    POS Tagging -Taggers Methods for POS Tagging: Rule-Based Tagging use dictionary to assign POS; then use rules to disambiguate words Stochastic Tagging determines tags based on the probability of the occurrence of the tag, given the observed word, in the context of the preceding tags. Similar to Hidden Markov Models (probabilistic finite state machines). Learn tagging rules. Problem in POS Tagging: Ambiguity Problem in POS Tagging: Which tag set to use?
  • 29.
    POS Tagging -Tagsets Tagsets for English  Penn Treebank, 45 tags  Brown corpus, 87 tags  C5 tagset, 61 tags  C7 tagset, 146 tags For references see Jurafsky, p.296 C5 and C7 tagsets are listed in Appendix C
  • 30.
    Fig. 8.6 PennTreebank, 45 tags
  • 31.
    Fig. 8.5 Englishmodal verbs and frequency counts from the CELEX on-line dictionary.
  • 32.
    Ambiguity in POSTagging Fig. 8.7 Word types and ambiguity in the Brown corpus.
  • 33.
  • 34.
    Sentence Level ConstructsI Sentence Level Constructs I declarative “This flight leaves at 9 am.” S → NP VP imperative “Book this flight for me.” S → VP
  • 35.
    Sentence Level ConstructsII Sentence Level Constructs II yes-no-question “Does this flight leave at 9 am?” S → Aux NP VP wh-question “When does this flight leave Winnipeg?” S → Wh-NP Aux NP VP
  • 36.
    Noun Phrase Modification1 Noun Phrase Modifiers head = the central noun of the NP modifiers = additions to head noun included in NP • modifiers before the head noun (prenominal) • modifiers after the head noun (post-nominal) examples: determiners, adjectives, PPs e.g. the young man the girl with the red hat
  • 37.
    Noun Phrase Modification- Prenominal  determiner the, a, this, some, ...  predeterminer all the flights  cardinal numbers, ordinal numbers one flight, the first flight, ...  quantifiers much, little
  • 38.
    Noun Phrase Modification- Prenominal  adjectives a first-class flight, a long flight  adjective phrase the least expensive flight Grammar Rule NP → (Det) (Card) (Ord) (Quant) (AP) Nominal PROJECT!
  • 39.
    Noun Phrase Modification- Postnominal  prepositional phrase PP all flights from Chicago Nominal → Nominal PP (PP) (PP)  non-finite clause, gerundive postmodifers all flights arriving after 7 pm Nominal → GerundVP GerundVP → GerundV NP | GerundV PP | ...  relative clause a flight that serves breakfast Nominal → Nominal RelClause RelClause → (who | that) VP
  • 40.
    Verb Subcategorization Verb Subcategorization •Different verbs accept or need different constituents or complements. VP = Verb + other constituents (complements) e.g. He buys the books. • Verbs can be classified according to the complements they accept or need. e.g. give needs two complements He gave her the books. sleep accepts no complement He sleeps.
  • 41.
    Verb Complements  sententialcomplement VP  Verb inf-sentence I want to fly from Boston to Chicago.  NP complement VP  Verb NP I want this flight.  no complement VP  Verb I sleep.
  • 42.
    Other Verb Complements PrepositionalPhrases + other Modifiers can be added to specify location or time of action, state or event described by verb • VP  Verb PP PP I fly from Boston to Chicago. • VP  Verb PP I sleep in the barn. • VP  Verb PP ADV I sleep in the barn tonight.
  • 43.
    Assignment 1-B Extend thegrammar in the Earley Parser by integrating: 1. complex VPs through sub-categorization and complements 2. complex NPs through pre- and post-modifiers 3. some adverbs (e.g. temporal or manner) plus rule extensions You should define 3-5 new / modified rules in each category. Write down the new rules, and add sample parse outputs generated with the parser program, to illustrate the working of your rules (last chart state is sufficient).