1. Formal Languages
FSAs in NLP
Hinrich Schütze
IMS, Uni Stuttgart, WS 2006/07
Most slides borrowed from K. Haenelt and E. Gurari
2. Formal Language Theory
• Two different goals in computational linguistics
• Theoretical interest
– What is the correct formalization of natural language?
– What does this formalization tell us about the properties of natural
language?
– What are the limits of NLP algorithms – in principle?
• E.g., natural language is context free would imply syntactic
analysis of natural language is cubic.
• Practical interest
– Well-understood mathematical and computational framework for
solving NLP problems
– ... even if formalization is not „cognitively“ sound
• Today: finite state for practical applications
3. Language-Technology
Based on Finite-State-Devices
shallow parsing
head-modifier-
pairs
tokenising
speech
recognition
translation
spelling
correction
dictionaries
rules
analysis
synthesis
transfer
phonology
morphology
fact extraction
text:speech
speech:text
part-of-speech
tagging
4. Advantages of
Finite-State Devices
• efficiency
– time
• very fast
• if deterministic or low-degree non-determinism
– space:
• compressed representations of data
• = search structure (= hash function)
• system development and maintenance
– modular design and
automatic compilation of system components
– high level specifications
• language modelling
– uniform framework for modelling dictionaries and rules
5. Modelling Goal
• specification of a formal language that corresponds as closely
as possible to a natural language
• ideally the formal system should
– never undergenerate
(i.e. accept or generate all the strings that characterise a natural
language)
– never overgenerate
(i.e.not accept or generate any string which is not acceptable in a
real language)
• realistically
– natural languages are moving targets (productivity, variations)
– approximations are achievable
– Finite-state is a crude, but useful approximation
(Beesley/Karttunen, 2003)
6. Syn
Layers of Linguistic Modelling
Lex
The Commission promote information on product in third country
The Commission promotes information on products in third countries
noun
dete verb noun prpo noun prpo noun
NP VP NP PP PP
S
Text
Commission
promote
information product third country
on in
Sem
7. Main Types of Transducers for
Natural Language Processing
transducers with
- input label - natural language
- output label - interpretation
- weights - variability of data,
- ranking of hypotheses
- n additional output strings
at final states
- delayed output of ambiguities
- assignment of interpretations
s ε
s a w
ε
ee
aw
s a
s a w
w
e
a w
e
s
s
general case: - non-deterministic transducers with ε-transitions
optimisation: - determinisation and minimisation
8. Lexical Analysis of Natural
Language Texts
singular
num
pers
present
tense
features
verb
category
:
3
:
:
:
:
Lexical Analysis
Tokenisation Morphological Analysis
Identification of
Basic Units
(Words)
Mapping to
Normalised
Forms
Assignment of
Categories
Assigment of
Features
4
t
0 1
h
2 3
t h i n
5 6
k s
i n k
Recogn. Input L Mapping to Output Language
10. Properties of
Natural Language Words
• very large set (how many?)
• word formation: concatenation with constraints
– simple words word *dorw
– compound word
• productive compounding and derivations
Drosselklappenpotentiometer
organise → organisation → organisational,
re-organise → re-organisation → re-organisational
be-wald-en *be-feld-en
• contiguous dependencies
go-es *walk-es
un-expect-ed-ly *un-elephant-ed-ly
• discontiguous (long distance) dependencies
expect-s *un-expect-s
mach-st *ge-mach-st
11.
12. Modelling of
Natural Language Words
• Concatenation rules expressed in terms of
– meaningful word components (including simple words) (morphs)
– and their concatenation (morphotactics)
• Modelling approach
– lexicalisation of sets of meaningful components (morph classes)
– representation of these dictionaries with finite-state transducers
– specification of concatenation
w o
w o r k
r k
i n g
e d
s
b o
b o o k
o k
13. Modelling of Natural Language
Words: System Overview
affix
(ε:+noun +sg) <#>
(s:+noun +pl) <#>
morpheme classes
morphotactics
phon./orthograph.
alternation rules
special
expressions
lexicon
compiler
lexical transducer
stem
(book:book) <affix>
(box:box) <affix>
#
_
^
/ s
z
x
s
e
([0-9]2“.“)219[0-9]2
...
concatenation of
morpheme classes
15. r
Modelling of Words:
Standard Case
noun-stem
book <noun-suffix>
work <noun-suffix>
noun-suffix
(ε:+N +Sg) #
(s:+N +Pl) # w o
w o
r
ε
s
b o
b o o k
o k
+N+Sg +N+Pl
Lexical Transducer
-deterministic
-minimal
- additional output at final states
16. Modelling of Words: Mapping
Ambiguities between i:o-Language
leave
leave
leaves
left left ε ε
f t
l e
l e a v
a v
ave
ε
s
e
e
ft
Lexical Transducer
-deterministic
-delayed emission at final state
-minimal
17. Modelling of Words:
Overlapping of Matching Patterns
Wach-stube
Wachs-tube
s t
w a c h s
Lexical Transducer
-non-deterministic with ε-transitions:
- determinisation not possible
- infinite delay of output due to cycle
u b e
t
ε:-
Ambiguities
- cannot be resolved at lexical level
- must be preserved for later analysis steps
require non-deterministic traversal
18. Modelling of Words:
Non-Disjoint Morpheme Classes
• problem:
– multiplication of start sections
– high degree of non-determinism
• solutions
– pure finite-state devices
• transducer with high degree of non-determinism
– heavy backtracking / parallel search
– slow processing
• determinisation
– explosion of network in size
– extendend device: feature propogation along paths
• merging of morpheme classes
• reduction of degree of non-determinsms
• minimisation
• additional checking of bit-vectors
19. Modelling of Words:
Non-Disjoint Morpheme Classes:
high degree of non-determinism
noun-stem
book <noun-stem>
<noun-suffix>
work <noun-stem>
<noun-suffix>
verb-suffix
(ε:+V) #
(ed:+V +past) #
(s:+V +3rd) #
N
book
work
V
book
work
ε:+V
vs
ed:+V
s:+V
ns
ε:+N
s:+N
ε
ε
ε
ε
ε
noun-suffix
(ε:+N +Sg) #
(s:+N +Pl) #
verb-stem
book <verb-suffix>
work <verb-suffix>
problem:
- each of the subdivisions must be searched
separateley
- (determinization not feasible:
-leads to explosion of network or
-not possible)
20. Modelling of Words:
Non-Disjoint Morpheme Classes:
feature propagation
noun-stem
book <noun-stem>
<noun-suffix>
work <noun-stem>
<noun-suffix>
verb-suffix
(ε:+V) #
(ed:+V +past) #
(s:+V +3rd) #
noun-suffix
(ε:+N +Sg) #
(s:+N +Pl) #
verb-stem
book <verb-suffix>
work <verb-suffix>
book {N,V}
work {N,V}
ε {N,V}
ed {V}
s {N,V}
ε {N}
solution:
- interpreting continuation class as bit-feature
- added to the arcs
- checked during traversal (feature intersection)
- merging the dictionaries
- searching only one dictionary
(degree of non-determinism reduced considerably)
21. Modelling of Words:
Long-Distance Dependencies
present
tense
verb
category
:
:
participle
category:
*ge-mach-
*ge-mach-
e
st
*ge-wach-st
ge-mach-t
ge-wachs-t
mach-
mach-
mach-
mach-
mach-
mach-
...
wachs-e
wach-e
e
st
t
en
t
en
Contiguous
Constraints
Discontiguous
Constraints
invalid
sequences
Constraints on the co-occurrence of morphs within words
22. Modelling of Words:
Long-Distance Dependencies
modelling alternatives
• pure finite-state transducer
– copies of the network
– can cause explosion in the size of the resulting transducer
• extended finite-state transducer
– „context-free“ extension: simple memory: flags
– special treatment by analysis and generation routine
– keeps transducers small
23. Modelling of Words:
Long-Distance Dependencies:
network copies
ε
ε verb-stems
verb-stems
ge
inflection
suffixes
en
et
n
t
> 15.000 entries problem:
can cause explosion in the size
of the resulting transducer
24. Modelling of Words:
Long-Distance Dependencies:
simple memory: flags
verb-stems
ε
ge
inflection
suffixes
en
et
n
t
solution:
-state flags
- procedural interpretation
(bit vector operations)
@set(+ge) @require(+ge)
@require(-ge)
Beesley/Karttunen, 2003
25. Modelling of Words:
Phon./Orth. Alternation Rules
• phenomena
– pity → pitiless
– fly → flies
– swim → swimming
– delete → deleting
– fox → foxes
• dictionary
• rules: high level specification of regular expressions
• .
noun-stem
dog <noun-suffix>
fox <noun-suffix>
noun-suffix
(ε:+N +Sg) #
(s:+N +Pl) #
#
__
^
/ s
z
s
x
e
a→b_c [[[[?* b] a ?*] |
[?* a [c ?*]]]]
Beesley/Karttunen, 2003: 61
26. Modelling of Words:
Phon./Orth. Alternation Rules
• compilation of lexical transducer:
– construction of dictionary transducer
– construction of rule transducer
– composition of lexical transducer and rule transducer
27. Modelling of Words:
Phon./Orth. Alternation Rules
Jurafsky/Martin, 2000, S. 78
#
__
^
/ s
z
s
x
e
r0 r1 r2 r3 r4
r5
s
#
:e
^:
z,x
z,s,x
#,other
z,s,x
^:
#
other ^:
s
z,s,x
#,other
#,other
e-insertion rule
for English plural nouns
ending with x,s,z („foxes“)
39. Syn
Layers of Linguistic Modelling
Lex
The Commission promote information on product in third country
The Commission promotes information on products in third countries
noun
dete verb noun prpo noun prpo noun
NP VP NP PP PP
S
Text
Commission
promote
information product third country
on in
Sem
40. Syntactic Analysis of Natural
Language Texts
Syntactic Analysis
Input Language Output Language
Identification of
well-formed
sequences of words
Assignment of
Syntactic
Categories
Assigment of
Syntactic
Structure
the good example
[NP NP NP]
the good example
NP
41. Sequences of Words: Properties
and Well-Formedness Conditions
syntactic conditions: type-3
regular language concatenations
• local word ordering principles
– the good example * example good the
– could have been done *been could done have
• global word ordering principles
– (we) (gave) (him) (the book) * (gave) (him) (the book) (we)
42. concatenations beyond regular languages
• centre embedding (S → a S b)
• obligatorily paired correspondences
• either ... or, if ... then
• can be nested inside each other
the
regulation
defines
which the
commission
had
formulated
which the
Council
had
elected
.
.
.
Sequences of Words: Properties
and Well-Formedness Conditions
syntactic conditions: beyond type-3
43. Sequences of Words: Properties
and Well-Formedness Conditions
syntactic conditions: beyond type-2
• concatenations beyond context-free languages
– cross-serial dependencies
Jan säit das mer d’chind em Hans es huus lönd hälfe aastriiche
x1 x2 x3 y1 y2 y3
John said that we the children-acc let
Hans-dat help
the house paint
44. Syntactic Grammars
• complete parsing
– goal: recover complete, exact parses of sentences
– closed-world assumption
• lexicon and grammar are complete
• place all types of conditions into one grammar
• seeking the globally best parse of the entire search space
– problems
• not robust
• too slow for mass data processing
• partial parsing
– goal: recover syntactic information
efficiently and reliably from unrestricted text
– sacrificing completeness and depth of analysis
– open-world assumption
• lexicon and grammar are incomplete
• local decisions
Abney, 1996
45. Syntactic Grammars:
Complete Sentence Structure ?
with stopper fastening
cork material
or
made of with
bottle closed
PP
PP
AP
NP
NP-c
AP
NP
syntactic link
semantic link
text structure
46. Syntactic Grammars:
Complete Sentence Parsing
computational problem
• combinatorial explosion of readings
List the sales of products in 1973 3 readings
List the sales of products produced in 1973 10 readings
List the sales of products in 1973 with the
products in 1972
28 readings
List the sales of products produced in 1973
with the products produced in 1972
455 readings
Bod, 1998: 2
47. „All Grammars Leak“*
• Not possible to provide an exact and
complete characterization
– of all well-formed utterances
– that cleanly divides them from all other sequences
of words which are regarded as ill-formed
utterances
• Rules are not completely ill-founded
• Somehow we need to make things looser to
account for the creativity of language use
*Edward Sapir, 1921
49. All Grammars Leak
• Agreement in English
• “Why do some teachers, parents and
religious leaders feel that celebrating
their religious observances in home and
church are inadequate and deem it
necessary to bring those practices into
the public schools?”
50. Syntactic Structure: Partial
Parsing Approaches
• finite-state approximation of sentence structures (Abney 1995)
– finite-state cascades: sequences of levels of regular expressions
– recognition approximation: tail-recursion replaced by iteration
– interpretation approximation: embedding replaced by fixed levels
51. Syntactic Structure:
Finite State Cascades
• functionally equivalent to composition of transducers,
– but without intermediate structure output
– the individual transducers are considerably smaller than a
composed transducer
the good example
[NP NP NP]
2
1 T
T
the good example
dete adje nomn
1
T
[NP NP NP]
dete adje nomn
2
T
52. Syntactic Structure:
Finite-State Cascades (Abney)
D N P D N N V-tns Pron
the woman in the lab coat thought you
Aux V-ing
were sleeping
NP P NP VP NP VP
NP PP VP NP VP
S S
L2 ----
L1 ----
L0 ----
L3 ----
T2
T1
T3
Finite-State Cascade
}
{
:
}
{
:
|
*
?
:
3
2
1
PP*
VP
*
PP
NP
*
PP
S
L
NP
P
PP
L
ing
-
V
Aux
tns
V
VP
N
N
D
NP
L
Regular-Expression
Grammar
53. Syntactic Structure:
Finite-State Cascades (Abney)
• cascade consists of a sequence of levels
• phrases at one level are built on phrases at the
previous level
• no recursion: phrases never contain same level or
higher level phrases
• two levels of special importance
– chunks: non-recursive cores (NX, VX) of major phrases (NP,
VP)
– simplex clauses: embedded clauses as siblings
• patterns: reliable indicators of gist of syntactic
structure
54. Syntactic Structure:
Finite-State Cascades (Abney)
• each transduction is defined by a set of patterns
– category
– regular expression
• regular expression is translated into a finite-state automaton
• level transducer
– union of pattern automata
– deterministic recognizer
– each final state is associated with a unique pattern
• heuristics
– longest match (resolution of ambiguities)
• external control process
– if the recognizer blocks without reaching a final state,
a single input element is punted to the output and
recognition resumes at the following word
55. Syntactic Structure:
Finite-State Cascades (Abney)
• patterns: reliable indicators of bits of syntactic
structure
• parsing
– easy-first parsing (easy calls first)
– proceeds by growing islands of certainty into
larger and larger phrases
– no systematic parse tree from bottom to top
– recognition of recognizable structures
– containment of ambiguity
• prepositional phrases and the like are left unattached
• noun-noun modifications not resolved
56. Syntactic Structure:
Bounding of Centre Embedding
• Sproat, 2002
• observation: unbounded centre embedding
– does not occur in language use
– seems to be too complex for human mental capacities
• finite state modelling of bounded centre embedding
4
the
0 1
man
2
3
6
7
the
man
the
man 9
bites
8
5
dog
dog
dog
walkes walkes
bites
walkes
bites
walkes
bites
bites
walkes
S → the (man|dog) S1 (bites|walks)
S1 → the (man|dog) S2 (bites|walks)
S2 → the (man|dog) (bites|walks)
S1 → ε
S2 → ε
58. Well-Formedness Conditions Finite-State Cascades Context-free and
mildly Context-
Sensitive
Syntactic
type-3 grammar
local sequences (chunks) lower level of cascade rules with terminal
symbols
global sequences (phrases) higher level of cascade rules with terminal
and non-terminal
symbols
beyond type-3 grammar approximation modelling
embeddings of phrases fixed number of levels variable number of
levels
centre embedding bounded centre embedding centre embedding
obligatorily paired correspondences individual patterns feature processing
beyond type-2 grammar
cross-serial dependencies - special
construction
Modelling of Natural Language
Word Sequences: Cases (1)
59. Well-Formedness
Conditions
Finite-State Cascades Context-free and mildly
Context-Sensitive
Semantic
ambiguity of
noun sequences
longest match heuristics overgeneration of reading
hypotheses
ambiguity of prepositional
phrase attachment
containment of ambiguity,
underspecification
overgeneration of reading
hypotheses
semantic compatibility of
words
(local grammars) (static) lexicalisation of
coocurrence restrictions
case frames procedural feature structure
meaning of words - -
Textual
information organisation
principles
selected patterns -
Modelling of Natural Language
Word Sequences: Cases (2)
60. • message understanding
– filling in relational database templates from newswire texts
– approach of FASTUS 1): cascade of five transducers
• recognition of names,
• fixed form expressions,
• basic noun and
verb groups
• patterns of events
– <company> <form><joint venture> with <company>
– "Bridgestone Sports Co. said Friday it has set up a joint venture in
Taiwan with a local concern and a Japanese trading house to
produce golf clubs to be shipped to Japan.”
• identification of event structures that describe the same event
Semantic Analysis
An example
1) Hobbs/Appelt/Bear/Israel/Kehler/Martin/Meyers/Kameyama/Stickel/Tyson (1997)
Relationship TIE-UP
Entities Bridgestone Sports Co.
a local concern
a Japanese trading house
JV Company -
Capitalization -
61. Summary: Linguistic Adequacy
• word formation
– essentially regular language
• sentence formation
– reduced recognition capacity (approximations)
• corresponds to language use
rather than natural language system
– flat interpretation structures
• clearly separate syntactic constraints from other (semantic, textual)
constraints
– partial interpretation structures
• clearly identify the contribution of syntactic structure in the interplay
with other structuring principles
• content
– suitable for restricted fact extraction
– deep text understanding generally still poorly understood
62. Summary: Practical Usefulness
• not all natural language phenomena can be
described with finite-state devices
• many actually occurring phenomena can be
described with regular devices
• not all practical applications require a
complete and deep processing of natural
language
• partial solutions allow for the development of
many useful applications
63. Summary: Complexity of Finite-
State Transducers for NLP
• theoretically: computationally intractable
(Barton/Berwick/Ristad,1987)
• SAT-problem is unnatural:
– natural language problems are bounded in size
• input and output alphabets,
• word length of linguistic words,
• partiality of functions and relations
– combinatorial possibilities are locally restricted.
• practically, natural language finite-state systems
– do not involve complex search
– are remarkably fast
– can in many relevant cases be determinised and minimised
64. Summary: Large Scale
Processing
• Context-free devices
– run-time complexity: |G| * n3 , |G| >> n3
– too slow for mass data processing
• Finite-state devices
– run-time complexity: best case: linear
(with low degree of non-determinism)
– best suited for mass data processing
65. References
Abney, Steven (1996). Tagging and Partial Parsing. In: Ken Church, Steve Young, and Gerrit Bloothooft (eds.),
Corpus-Based Methods in Language and Speech. Kluwer Academic Publishers, Dordrecht.
http://www.vinartus.net/spa/95a.pdf
Abney, Steven (1996a) Cascaded Finite-State Parsing. Viewgraphs for a talk given at Xerox Research Centre,
Grenoble, France. http://www.vinartus.net/spa/96a.pdf
Abney, Steven (1995). Partial Parsing via Finite-State Cascades. In: Journal of Natural Language Engineering,
2(4): 337-344. http://www.vinartus.net/spa/97a.pdf
Barton Jr., G. Edward; Berwick, Robert, C. und Eric Sven Ristad (1987). Computational Complexity and Natural
Language. MIT Press.
Beesley Kenneth R. und Lauri Karttunen (2003). Finite-State Morphology. Distributed for the Center for the Study
of Language and Information. (CSLI- Studies in Computational Linguistics)
Bod, Rens (1998). Beyond Grammar. An Experienced-Based Theory of Language. CSLI Lecture Notes, 88,
Standford, California: Center for the Study of Information and Language
Grefenstette, Gregory (1999). Light Parsing as Finite State Filtering. In: Kornai 1999, S. 86-94. earlier version in:
Workshop on Extended finite state models of language, Budapest, Hungary, Aug 11--12, 1996. ECAI'96.
http://citeseer.nj.nec.com/grefenstette96light.html
Hobbs, Jerry; Doug Appelt, John Bear, David Israel, Andy Kehler, David Martin, Karen Meyers, Megumi
Kameyama, Mark Stickel, Mabry Tyson (1997). Breaking the Text Barrier. FASTUS Presentation slides. SRI
International. http://www.ai.sri.com/~israel/Generic-FASTUS-talk.pdf
Jurafsky, Daniel und James H. Martin (2000): Speech and Language Processing. An Introduction to Natural
Language Processing, Computational Linguistics and Speech Recognition. New Jersey: Prentice Hall.
Kornai, András (ed.) (1999). Extended Finite State Models of Language. (Studies in Natural Language
Processing). Cambridge: Cambridge University Press.
Koskenniemi, Kimmo (1983). Two-level morphology: a general computational model for word-form recognition
and production. Publication 11, University of Helsinki. Helsinki: Department of Genral Linguistics
66. References
Kunze, Jürgen (2001). Computerlinguistik. Voraussetzungen, Grundlagen, Werkzeuge. Vorlesungsskript.
Humboldt Universität zu Berlin. http://www2.rz.hu-
berlin.de/compling/Lehrstuhl/Skripte/Computerlinguistik_1/index.html
Manning, Christopher D.; Schütze, Hinrich (1999). Foundations of Statistical Natural Language Processing.
Cambridge, Mass., London: The MIT Press. http://www.sultry.arts.usyd.edu.au/fsnlp
Mohri, Mehryar (1997). Finite State Transducers in Language and Speech Processing. In: Computational
Linguistics, 23, 2, 1997, S. 269-311. http://citeseer.nj.nec.com/mohri97finitestate.html
Mohri, Mehryar (1996). On some Applications of finite-state automata theory to natural language processing. In:
Journal of Natural Language Egineering, 2, S. 1-20.
Mohri, Mehryar und Michael Riley (2002). Weighted Finite-State Transducers in Speech Recognition (Tutorial).
Teil 1: http://www.research.att.com/~mohri/postscript/icslp.ps, Teil 2:
http://www.research.att.com/~mohri/postscript/icslp-tut2.ps
Partee, Barbara; ter Meulen, Alice and Robert E. Wall (1993). Mathematical Methods in Linguistics. Dordrecht:
Kluwer Academic Publishers.
Pereira, Fernando C. N. and Rebecca N. Wright (1997). Finite-State Approximation of Phrase-Structure
Grammars. In: Roche/Schabes 1997.
Roche, Emmanuel und Yves Schabes (Eds.) (1997). Finite-State Language Processing. Cambridge (Mass.) und
London: MIT Press.
Sproat, Richard (2002). The Linguistic Significance of Finite-State Techniques. February 18, 2002.
http://www.research.att.com/~rws
Strzalkowski, Tomek; Lin, Fang; Ge, Jin Wang; Perez-Carballo, Jose (1999). Evaluating Natural Language
Processing Techniques in Information Retrieval. In: Strzalkowski, Tomek (Ed.): Natural Language
Information Retrieval, Kluwer Academic Publishers, Holland : 113-145
Woods, W.A. (1970). Transition Network Grammar for Natural Language Analysis. In: Communications of the
ACM 13: 591-602.