SlideShare a Scribd company logo
1 of 29
Finite-State Morphological
Parsing
K.A.S.H. Kulathilake
B.Sc.(Sp.Hons.)IT, MCS, Mphil, SEDA(UK)
Introduction
• Consider a simple example: parsing just the productive nominal
plural (-s) and the verbal progressive (-ing).
• Our goal will be to take input forms like those in the first column
below and produce output forms like those in the second column.
• The second column contains the stem of each word as well as
assorted morphological features.
• These features specify additional information about the stem.
– E.g. +SG  singular, +PL  plural
Introduction (Cont…)
• In order to build a morphological parser, we’ll need at
least the following:
– A lexicon:
• The list of stems and affixes, together with basic information about
them (whether a stem is a Noun stem or a Verb stem, etc).
– Morphotactics:
• The model of morpheme ordering that explains which classes of
morphemes can follow other classes of morphemes inside a word.
• For example, the rule that the English plural morpheme follows
the noun rather than preceding it.
– Orthographic rules:
• These spelling rules are used to model the changes that occur in a
word, usually when two morphemes combine (for example the y
ie spelling rule that changes city + -s to cities rather than citys).
The Lexicon and Morphotactics
• A lexicon is a repository for words.
• The simplest possible lexicon would consist of an explicit list of
every word of the language (every word, i.e. including abbreviations
(‘AAA’) and proper names (‘Jane’ or ‘Beijing’) as follows:
a
AAA
AA
Aachen
aardvark
aardwolf
aba
abaca
aback
. . .
The Lexicon and Morphotactics
(Cont…)
• Since it will often be inconvenient or impossible,
for the various reasons , to list every word in the
language, computational lexicons are usually
structured with a list of each of the stems and
affixes of the language together with a
representation of the morphotactics that tells us
how they can fit together.
• There are many ways to model morphotactics;
one of the most common is the finite-state
automaton.
The Lexicon and Morphotactics
(Cont…)
• The following FSA assumes that the lexicon includes regular
nouns (reg-noun) that take the regular -s plural (e.g. cat,
dog, fox, aardvark), also includes irregular noun forms that
don’t take -s, both singular irreg-sg-noun (goose, mouse)
and plural irreg-pl-noun (geese, mice).
The Lexicon and Morphotactics
(Cont…)
• A similar model for English verbal inflection is
given below.
This lexicon has three stem
classes (reg-verb-stem,
irreg-verb-stem, and irreg-
past-verb-form), plus 4
more affix classes (-ed past,
-ed participle, -ing
participle, and 3rd singular -
s)
The Lexicon and Morphotactics
(Cont…)
• Small part of the morphotactics of English adjectives;
• An initial hypothesis might be that adjectives can have
an optional prefix (un-), an obligatory root (big, cool,
etc) and an optional suffix (-er, -est, or -ly).
The Lexicon and Morphotactics
(Cont…)
• While previous FSA will recognize all the adjectives in the given
table, it will also recognize ungrammatical forms like unbig, redly,
and realest.
• We need to set up classes of roots and specify which can occur with
which suffixes.
• So adj-root1 would include adjectives that can occur with un- and -
ly (clear, happy, and real) while adj-root2 will include adjectives
that can’t (big, cool, and red).
The Lexicon and Morphotactics
(Cont…)
The Lexicon and Morphotactics
(Cont…)
• Noun Recognition FSA
Morphological Parsing with Finite-
State Transducers
• Given the input cats, we’d like to output cat +N +PL, telling us that
cat is a plural noun.
• We will do this via a version of two-level morphology, first
proposed by Koskenniemi (1983).
• Two level morphology represents a word as a correspondence
between a lexical level, which represents a simple concatenation of
morphemes making up a word, and the surface level, which
represents the actual spelling of the final word.
• Morphological parsing is implemented by building mapping rules
that map letter sequences like cats on the surface level into
morpheme and features sequences like cat +N +PL on the lexical
level.
Morphological Parsing with Finite-
State Transducers (Cont…)
• The automaton that we use for performing the mapping
between lexical and surface levels is the finite-state
transducer or FST.
• A transducer maps between FST one set of symbols and
another; a finite-state transducer does this via a finite
automaton.
• Thus we usually visualize an FST as a two-tape automaton
which recognizes or generates pairs of strings.
• The FST thus has a more general function than an FSA;
where an FSA defines a formal language by defining a set of
strings, an FST defines a relation between sets of strings.
• This relates to another view of an FST; as a machine that
reads one string and generates another.
Morphological Parsing with Finite-
State Transducers (Cont…)
• Here’s a summary of this four-fold way of
thinking about transducers:
– FST as recognizer: a transducer that takes a pair of
strings as input and outputs accept if the string-pair is
in the string-pair language, and a reject if it is not.
– FST as generator: a machine that outputs pairs of
strings of the language. Thus the output is a yes or no,
and a pair of output strings.
– FST as translator: a machine that reads a string and
outputs another string.
– FST as set relater: a machine that computes relations
between sets.
Morphological Parsing with Finite-
State Transducers (Cont…)
• An FST can be formally defined in a number of ways; we
will rely on the following definition, based on what is called
the Mealy machine extension to a simple FSA:
– Q: a finite set of N states q0,q1,….,qN
– Σ: a finite alphabet of complex symbols. Each complex symbol is
composed of an input-output pair i : o; one symbol i from an
input alphabet I, and one symbol o from an output alphabet O,
thus Σ is a subset of I×O. I and O may each also include the
epsilon symbol ε.
– q0: the start state
– F: the set of final states, F subset of Q
– δ(Q, i:0): the transition function or transition matrix between
states. Given a state q ϵQ and complex symbol i : o ϵ Σ, δ(q;i : o)
returns a new state q’ϵ Q. δ is thus a relation from Q ×Σ to Q;
Morphological Parsing with Finite-
State Transducers (Cont…)
• Where an FSA accepts a language stated over a
finite alphabet of single symbols, such as the
alphabet of our sheep language (baaa):
• Σ ={b, a, !}
• an FST accepts a language stated over pairs of
symbols, as in:
• Σ ={ a : a, b : b, ! : !, a : !, a : ε, ε : ! }
• In two-level morphology, the pairs of symbols in Σ
are also called feasible pairs.
Morphological Parsing with Finite-
State Transducers (Cont…)
• We mentioned that for two-level morphology it’s convenient to
view an FST as having two tapes. The upper or lexical tape, is
composed from characters from the left side of the a : b pairs; the
lower or surface tape, is composed of characters from the right side
of the a : b pairs.
• Thus each symbol a : b in the transducer alphabet Σ expresses how
the symbol a from one tape is mapped to the symbol b on the
another tape.
• For example a : ε means that an a on the upper tape will
correspond to nothing on the lower tape.
• Just as for an FSA, we can write regular expressions in the complex
alphabet Σ.
• Since it’s most common for symbols to map to themselves, in two-
level morphology we call pairs like a : a default pairs, and just refer
to them by the single letter a.
Morphological Parsing with Finite-
State Transducers (Cont…)
• Let’s build an FST morphological parser out of our earlier
morphotactic FSAs and lexica by adding an extra “lexical”
tape and the appropriate morphological features.
Note that these features
map to the empty string ε or
the word/morpheme
boundary symbol # since
there is no segment
corresponding to them on
the output tape.
Morphological Parsing with Finite-
State Transducers (Cont…)
• In order to do this we need to update the lexicon for this
transducer, so that irregular plurals like geese will parse
into the correct stem goose +N +PL.
• We do this by allowing the lexicon to also have two levels.
• Since surface geese maps to underlying goose, the new
lexical entry will be ‘g:g o:e o:e s:s e:e’.
• Regular forms are simpler; the two-level entry for fox will
now be ‘f:f o:o x:x’, but by relying on the orthographic
convention that f stands for f:f and so on, we can simply
refer to it as fox and the form for geese as ‘g o:e o:e s e’.
• Thus the lexicon will look only slightly more complex:
Morphological Parsing with Finite-
State Transducers (Cont…)
• Our proposed morphological parser needs to map from surface forms like geese to
lexical forms like goose +N +SG.
• We could do this by cascading the lexicon above with the singular/plural
automaton.
• Cascading two automata means running them in series with the output of the first
feeding the input to the second.
• We would first represent the lexicon of stems in the above table as the FST Tstems as
follows;
Morphological Parsing with Finite-
State Transducers (Cont…)
• Previous FST maps e.g. dog to reg-noun-stem.
• In order to allow possible suffixes, Tstems in FST
allows the forms to be followed by @ the
wildcard @ symbol; @:@ stands for ‘any feasible
pair’.
• A pair of the form @:x, for example will mean
‘any feasible pair which has x on the surface
level’, and correspondingly for the form x:@.
• The output of this FST would then feed the
number automaton Tnum.
Morphological Parsing with Finite-
State Transducers (Cont…)
• Composed Automation
– Following transducer will map plural nouns into the stem plus
the morphological marker +PL, and singular nouns into the stem
plus the morpheme +SG.
– Thus a surface cats will map to cat +N +PL as follows: c:c a:a t:t
+N:ε +PL:ˆs#
That is, c maps to itself, as do a
and t, while the morphological
feature +N (recall that this
means ‘noun’) maps to nothing
(ε), and the feature +PL
(meaning ‘plural’) maps to ˆs.
The symbol ˆ indicates a
morpheme boundary, while
the symbol # indicates a word
boundary,
Morphological Parsing with Finite-
State Transducers (Cont…)
• It creates intermediate tapes as follows;
• How to remove the boundary marker will be
discussed in the next section.
Orthographic Rules and Finite-State
Transducers
• The method described in the previous section will
successfully recognize words like aardvarks and mice.
• But just concatenating the morphemes won’t work for
cases where there is a spelling change; it would
incorrectly reject an input like foxes and accept an
input like foxs.
• We need to deal with the fact that English often
requires spelling changes at morpheme boundaries by
introducing spelling rules (or orthographic rules).
• This section introduces a number of notations for
writing such rules and shows how to implement the
rules as transducers.
Orthographic Rules and Finite-State
Transducers (Cont…)
• Some of these spelling rules:
Orthographic Rules and Finite-State
Transducers (Cont…)
• We can think of these spelling changes as taking as
input a simple concatenation of morphemes (the
‘intermediate output’ of the lexical transducer) and
producing as output a slightly-modified, (correctly
spelled) concatenation of morphemes.
• So for example we could write an E-insertion rule that
performs the mapping from the intermediate to
surface levels.
Orthographic Rules and Finite-State
Transducers (Cont…)
• Such a rule might say something like “insert an e on the
surface tape just when the lexical tape has a morpheme
ending in x (or z, etc) and the next morpheme is -s. Here’s a
formalization of the rule:
• This is the rule notation of Chomsky and Halle (1968); a rule
of the form a b/ c___d means ‘rewrite a as b when it
occurs between c and d.
Orthographic Rules and Finite-State
Transducers (Cont…)
• Following figure shows an automaton that
corresponds to this spelling rule.
Orthographic Rules and Finite-State
Transducers (Cont…)
• The idea in building a transducer for a
particular rule is to express only the
constraints necessary for that rule, allowing
any other string of symbols to pass through
unchanged.
• In parsing, word ambiguities are still there.
– Foxes  noun or Foxes  verb (meaning ‘to
baffle or confuse’)

More Related Content

What's hot

Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 
Lecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfLecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfDeptii Chaudhari
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingSaurabh Kaushik
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLPkartikaVashisht
 
Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Iffat Anjum
 
Challenges in nlp
Challenges in nlpChallenges in nlp
Challenges in nlpZareen Syed
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationMarina Santini
 
First order predicate logic (fopl)
First order predicate logic (fopl)First order predicate logic (fopl)
First order predicate logic (fopl)chauhankapil
 
Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translationAkshaya Arunan
 
Natural language processing
Natural language processing Natural language processing
Natural language processing Md.Sumon Sarder
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptxHeneWijaya
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Context free grammars
Context free grammarsContext free grammars
Context free grammarsRonak Thakkar
 
Semantics analysis
Semantics analysisSemantics analysis
Semantics analysisBilalzafar22
 

What's hot (20)

Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Lecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfLecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdf
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 
Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01
 
Challenges in nlp
Challenges in nlpChallenges in nlp
Challenges in nlp
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
First order predicate logic (fopl)
First order predicate logic (fopl)First order predicate logic (fopl)
First order predicate logic (fopl)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Syntax directed translation
Syntax directed translationSyntax directed translation
Syntax directed translation
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Context free grammars
Context free grammarsContext free grammars
Context free grammars
 
Machine Translation
Machine TranslationMachine Translation
Machine Translation
 
Semantics analysis
Semantics analysisSemantics analysis
Semantics analysis
 

Similar to NLP_KASHK:Finite-State Morphological Parsing

MorphologyAndFST.pdf
MorphologyAndFST.pdfMorphologyAndFST.pdf
MorphologyAndFST.pdfssuser97943d
 
INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptLamhotNaibaho3
 
02-Lexical-Analysis.ppt
02-Lexical-Analysis.ppt02-Lexical-Analysis.ppt
02-Lexical-Analysis.pptBabanDeep5
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammarmeresie tesfay
 
context free language
context free languagecontext free language
context free languagekhush_boo31
 
A Statistical Model for Morphology Inspired by the Amis Language
A Statistical Model for Morphology Inspired by the Amis LanguageA Statistical Model for Morphology Inspired by the Amis Language
A Statistical Model for Morphology Inspired by the Amis Languagedannyijwest
 
A statistical model for morphology inspired by the Amis language
A statistical model for morphology inspired by the Amis languageA statistical model for morphology inspired by the Amis language
A statistical model for morphology inspired by the Amis languageIJwest
 
The Theory of Finite Automata.pptx
The Theory of Finite Automata.pptxThe Theory of Finite Automata.pptx
The Theory of Finite Automata.pptxssuser039bf6
 
Natural Language Processing Topics for Engineering students
Natural Language Processing Topics for Engineering studentsNatural Language Processing Topics for Engineering students
Natural Language Processing Topics for Engineering studentsRosnaPHaroon
 
Free Ebooks Download ! Edhole
Free Ebooks Download ! EdholeFree Ebooks Download ! Edhole
Free Ebooks Download ! EdholeEdhole.com
 
Mba ebooks ! Edhole
Mba ebooks ! EdholeMba ebooks ! Edhole
Mba ebooks ! EdholeEdhole.com
 
CH 2.pptx
CH 2.pptxCH 2.pptx
CH 2.pptxObsa2
 
Context Free Grammar
Context Free GrammarContext Free Grammar
Context Free GrammarAkhil Kaushik
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)bolovv
 

Similar to NLP_KASHK:Finite-State Morphological Parsing (20)

MorphologyAndFST.pdf
MorphologyAndFST.pdfMorphologyAndFST.pdf
MorphologyAndFST.pdf
 
INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.ppt
 
02-Lexical-Analysis.ppt
02-Lexical-Analysis.ppt02-Lexical-Analysis.ppt
02-Lexical-Analysis.ppt
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammar
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
context free language
context free languagecontext free language
context free language
 
A Statistical Model for Morphology Inspired by the Amis Language
A Statistical Model for Morphology Inspired by the Amis LanguageA Statistical Model for Morphology Inspired by the Amis Language
A Statistical Model for Morphology Inspired by the Amis Language
 
A statistical model for morphology inspired by the Amis language
A statistical model for morphology inspired by the Amis languageA statistical model for morphology inspired by the Amis language
A statistical model for morphology inspired by the Amis language
 
The Theory of Finite Automata.pptx
The Theory of Finite Automata.pptxThe Theory of Finite Automata.pptx
The Theory of Finite Automata.pptx
 
Natural Language Processing Topics for Engineering students
Natural Language Processing Topics for Engineering studentsNatural Language Processing Topics for Engineering students
Natural Language Processing Topics for Engineering students
 
Free Ebooks Download ! Edhole
Free Ebooks Download ! EdholeFree Ebooks Download ! Edhole
Free Ebooks Download ! Edhole
 
Mba ebooks ! Edhole
Mba ebooks ! EdholeMba ebooks ! Edhole
Mba ebooks ! Edhole
 
RegexCat
RegexCatRegexCat
RegexCat
 
CH 2.pptx
CH 2.pptxCH 2.pptx
CH 2.pptx
 
Unit2 Toc.pptx
Unit2 Toc.pptxUnit2 Toc.pptx
Unit2 Toc.pptx
 
Morphology.ppt
Morphology.pptMorphology.ppt
Morphology.ppt
 
Context Free Grammar
Context Free GrammarContext Free Grammar
Context Free Grammar
 
lec3 AI.pptx
lec3  AI.pptxlec3  AI.pptx
lec3 AI.pptx
 
Lexical1
Lexical1Lexical1
Lexical1
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)
 

More from Hemantha Kulathilake

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar Hemantha Kulathilake
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation Hemantha Kulathilake
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops Hemantha Kulathilake
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingHemantha Kulathilake
 
COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants Hemantha Kulathilake
 
COM1407: Variables and Data Types
COM1407: Variables and Data Types COM1407: Variables and Data Types
COM1407: Variables and Data Types Hemantha Kulathilake
 
COM1407: Introduction to C Programming
COM1407: Introduction to C Programming COM1407: Introduction to C Programming
COM1407: Introduction to C Programming Hemantha Kulathilake
 

More from Hemantha Kulathilake (20)

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
 
NLP_KASHK: Introduction
NLP_KASHK: Introduction NLP_KASHK: Introduction
NLP_KASHK: Introduction
 
COM1407: File Processing
COM1407: File Processing COM1407: File Processing
COM1407: File Processing
 
COm1407: Character & Strings
COm1407: Character & StringsCOm1407: Character & Strings
COm1407: Character & Strings
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation
 
COM1407: Input/ Output Functions
COM1407: Input/ Output FunctionsCOM1407: Input/ Output Functions
COM1407: Input/ Output Functions
 
COM1407: Working with Pointers
COM1407: Working with PointersCOM1407: Working with Pointers
COM1407: Working with Pointers
 
COM1407: Arrays
COM1407: ArraysCOM1407: Arrays
COM1407: Arrays
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & Branching
 
COM1407: C Operators
COM1407: C OperatorsCOM1407: C Operators
COM1407: C Operators
 
COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants
 
COM1407: Variables and Data Types
COM1407: Variables and Data Types COM1407: Variables and Data Types
COM1407: Variables and Data Types
 
COM1407: Introduction to C Programming
COM1407: Introduction to C Programming COM1407: Introduction to C Programming
COM1407: Introduction to C Programming
 

Recently uploaded

Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

NLP_KASHK:Finite-State Morphological Parsing

  • 2. Introduction • Consider a simple example: parsing just the productive nominal plural (-s) and the verbal progressive (-ing). • Our goal will be to take input forms like those in the first column below and produce output forms like those in the second column. • The second column contains the stem of each word as well as assorted morphological features. • These features specify additional information about the stem. – E.g. +SG  singular, +PL  plural
  • 3. Introduction (Cont…) • In order to build a morphological parser, we’ll need at least the following: – A lexicon: • The list of stems and affixes, together with basic information about them (whether a stem is a Noun stem or a Verb stem, etc). – Morphotactics: • The model of morpheme ordering that explains which classes of morphemes can follow other classes of morphemes inside a word. • For example, the rule that the English plural morpheme follows the noun rather than preceding it. – Orthographic rules: • These spelling rules are used to model the changes that occur in a word, usually when two morphemes combine (for example the y ie spelling rule that changes city + -s to cities rather than citys).
  • 4. The Lexicon and Morphotactics • A lexicon is a repository for words. • The simplest possible lexicon would consist of an explicit list of every word of the language (every word, i.e. including abbreviations (‘AAA’) and proper names (‘Jane’ or ‘Beijing’) as follows: a AAA AA Aachen aardvark aardwolf aba abaca aback . . .
  • 5. The Lexicon and Morphotactics (Cont…) • Since it will often be inconvenient or impossible, for the various reasons , to list every word in the language, computational lexicons are usually structured with a list of each of the stems and affixes of the language together with a representation of the morphotactics that tells us how they can fit together. • There are many ways to model morphotactics; one of the most common is the finite-state automaton.
  • 6. The Lexicon and Morphotactics (Cont…) • The following FSA assumes that the lexicon includes regular nouns (reg-noun) that take the regular -s plural (e.g. cat, dog, fox, aardvark), also includes irregular noun forms that don’t take -s, both singular irreg-sg-noun (goose, mouse) and plural irreg-pl-noun (geese, mice).
  • 7. The Lexicon and Morphotactics (Cont…) • A similar model for English verbal inflection is given below. This lexicon has three stem classes (reg-verb-stem, irreg-verb-stem, and irreg- past-verb-form), plus 4 more affix classes (-ed past, -ed participle, -ing participle, and 3rd singular - s)
  • 8. The Lexicon and Morphotactics (Cont…) • Small part of the morphotactics of English adjectives; • An initial hypothesis might be that adjectives can have an optional prefix (un-), an obligatory root (big, cool, etc) and an optional suffix (-er, -est, or -ly).
  • 9. The Lexicon and Morphotactics (Cont…) • While previous FSA will recognize all the adjectives in the given table, it will also recognize ungrammatical forms like unbig, redly, and realest. • We need to set up classes of roots and specify which can occur with which suffixes. • So adj-root1 would include adjectives that can occur with un- and - ly (clear, happy, and real) while adj-root2 will include adjectives that can’t (big, cool, and red).
  • 10. The Lexicon and Morphotactics (Cont…)
  • 11. The Lexicon and Morphotactics (Cont…) • Noun Recognition FSA
  • 12. Morphological Parsing with Finite- State Transducers • Given the input cats, we’d like to output cat +N +PL, telling us that cat is a plural noun. • We will do this via a version of two-level morphology, first proposed by Koskenniemi (1983). • Two level morphology represents a word as a correspondence between a lexical level, which represents a simple concatenation of morphemes making up a word, and the surface level, which represents the actual spelling of the final word. • Morphological parsing is implemented by building mapping rules that map letter sequences like cats on the surface level into morpheme and features sequences like cat +N +PL on the lexical level.
  • 13. Morphological Parsing with Finite- State Transducers (Cont…) • The automaton that we use for performing the mapping between lexical and surface levels is the finite-state transducer or FST. • A transducer maps between FST one set of symbols and another; a finite-state transducer does this via a finite automaton. • Thus we usually visualize an FST as a two-tape automaton which recognizes or generates pairs of strings. • The FST thus has a more general function than an FSA; where an FSA defines a formal language by defining a set of strings, an FST defines a relation between sets of strings. • This relates to another view of an FST; as a machine that reads one string and generates another.
  • 14. Morphological Parsing with Finite- State Transducers (Cont…) • Here’s a summary of this four-fold way of thinking about transducers: – FST as recognizer: a transducer that takes a pair of strings as input and outputs accept if the string-pair is in the string-pair language, and a reject if it is not. – FST as generator: a machine that outputs pairs of strings of the language. Thus the output is a yes or no, and a pair of output strings. – FST as translator: a machine that reads a string and outputs another string. – FST as set relater: a machine that computes relations between sets.
  • 15. Morphological Parsing with Finite- State Transducers (Cont…) • An FST can be formally defined in a number of ways; we will rely on the following definition, based on what is called the Mealy machine extension to a simple FSA: – Q: a finite set of N states q0,q1,….,qN – Σ: a finite alphabet of complex symbols. Each complex symbol is composed of an input-output pair i : o; one symbol i from an input alphabet I, and one symbol o from an output alphabet O, thus Σ is a subset of I×O. I and O may each also include the epsilon symbol ε. – q0: the start state – F: the set of final states, F subset of Q – δ(Q, i:0): the transition function or transition matrix between states. Given a state q ϵQ and complex symbol i : o ϵ Σ, δ(q;i : o) returns a new state q’ϵ Q. δ is thus a relation from Q ×Σ to Q;
  • 16. Morphological Parsing with Finite- State Transducers (Cont…) • Where an FSA accepts a language stated over a finite alphabet of single symbols, such as the alphabet of our sheep language (baaa): • Σ ={b, a, !} • an FST accepts a language stated over pairs of symbols, as in: • Σ ={ a : a, b : b, ! : !, a : !, a : ε, ε : ! } • In two-level morphology, the pairs of symbols in Σ are also called feasible pairs.
  • 17. Morphological Parsing with Finite- State Transducers (Cont…) • We mentioned that for two-level morphology it’s convenient to view an FST as having two tapes. The upper or lexical tape, is composed from characters from the left side of the a : b pairs; the lower or surface tape, is composed of characters from the right side of the a : b pairs. • Thus each symbol a : b in the transducer alphabet Σ expresses how the symbol a from one tape is mapped to the symbol b on the another tape. • For example a : ε means that an a on the upper tape will correspond to nothing on the lower tape. • Just as for an FSA, we can write regular expressions in the complex alphabet Σ. • Since it’s most common for symbols to map to themselves, in two- level morphology we call pairs like a : a default pairs, and just refer to them by the single letter a.
  • 18. Morphological Parsing with Finite- State Transducers (Cont…) • Let’s build an FST morphological parser out of our earlier morphotactic FSAs and lexica by adding an extra “lexical” tape and the appropriate morphological features. Note that these features map to the empty string ε or the word/morpheme boundary symbol # since there is no segment corresponding to them on the output tape.
  • 19. Morphological Parsing with Finite- State Transducers (Cont…) • In order to do this we need to update the lexicon for this transducer, so that irregular plurals like geese will parse into the correct stem goose +N +PL. • We do this by allowing the lexicon to also have two levels. • Since surface geese maps to underlying goose, the new lexical entry will be ‘g:g o:e o:e s:s e:e’. • Regular forms are simpler; the two-level entry for fox will now be ‘f:f o:o x:x’, but by relying on the orthographic convention that f stands for f:f and so on, we can simply refer to it as fox and the form for geese as ‘g o:e o:e s e’. • Thus the lexicon will look only slightly more complex:
  • 20. Morphological Parsing with Finite- State Transducers (Cont…) • Our proposed morphological parser needs to map from surface forms like geese to lexical forms like goose +N +SG. • We could do this by cascading the lexicon above with the singular/plural automaton. • Cascading two automata means running them in series with the output of the first feeding the input to the second. • We would first represent the lexicon of stems in the above table as the FST Tstems as follows;
  • 21. Morphological Parsing with Finite- State Transducers (Cont…) • Previous FST maps e.g. dog to reg-noun-stem. • In order to allow possible suffixes, Tstems in FST allows the forms to be followed by @ the wildcard @ symbol; @:@ stands for ‘any feasible pair’. • A pair of the form @:x, for example will mean ‘any feasible pair which has x on the surface level’, and correspondingly for the form x:@. • The output of this FST would then feed the number automaton Tnum.
  • 22. Morphological Parsing with Finite- State Transducers (Cont…) • Composed Automation – Following transducer will map plural nouns into the stem plus the morphological marker +PL, and singular nouns into the stem plus the morpheme +SG. – Thus a surface cats will map to cat +N +PL as follows: c:c a:a t:t +N:ε +PL:ˆs# That is, c maps to itself, as do a and t, while the morphological feature +N (recall that this means ‘noun’) maps to nothing (ε), and the feature +PL (meaning ‘plural’) maps to ˆs. The symbol ˆ indicates a morpheme boundary, while the symbol # indicates a word boundary,
  • 23. Morphological Parsing with Finite- State Transducers (Cont…) • It creates intermediate tapes as follows; • How to remove the boundary marker will be discussed in the next section.
  • 24. Orthographic Rules and Finite-State Transducers • The method described in the previous section will successfully recognize words like aardvarks and mice. • But just concatenating the morphemes won’t work for cases where there is a spelling change; it would incorrectly reject an input like foxes and accept an input like foxs. • We need to deal with the fact that English often requires spelling changes at morpheme boundaries by introducing spelling rules (or orthographic rules). • This section introduces a number of notations for writing such rules and shows how to implement the rules as transducers.
  • 25. Orthographic Rules and Finite-State Transducers (Cont…) • Some of these spelling rules:
  • 26. Orthographic Rules and Finite-State Transducers (Cont…) • We can think of these spelling changes as taking as input a simple concatenation of morphemes (the ‘intermediate output’ of the lexical transducer) and producing as output a slightly-modified, (correctly spelled) concatenation of morphemes. • So for example we could write an E-insertion rule that performs the mapping from the intermediate to surface levels.
  • 27. Orthographic Rules and Finite-State Transducers (Cont…) • Such a rule might say something like “insert an e on the surface tape just when the lexical tape has a morpheme ending in x (or z, etc) and the next morpheme is -s. Here’s a formalization of the rule: • This is the rule notation of Chomsky and Halle (1968); a rule of the form a b/ c___d means ‘rewrite a as b when it occurs between c and d.
  • 28. Orthographic Rules and Finite-State Transducers (Cont…) • Following figure shows an automaton that corresponds to this spelling rule.
  • 29. Orthographic Rules and Finite-State Transducers (Cont…) • The idea in building a transducer for a particular rule is to express only the constraints necessary for that rule, allowing any other string of symbols to pass through unchanged. • In parsing, word ambiguities are still there. – Foxes  noun or Foxes  verb (meaning ‘to baffle or confuse’)