SlideShare a Scribd company logo
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
Lecture Notes - Are Natural Languages Regular?
This is an important question for two reasons: first, it places an upper bound on the running time of
algorithms that process natural language; second, it may tell us something about human language
processing and language acquisition.
To answer this question let us first understand…
• What is a language (natural language / formal language)?
• What is a regular language?
• What are regular grammars?
What is a natural language?
A natural language is a human communication system. A natural language can be thought of as a
mutually understandable communication system that is used between members of some population.
When communicating, speakers of a natural language are tacitly agreeing on what strings are
allowed (i.e., which strings are grammatical). Dialects and specialized languages (including e.g.,
the language used on social media) are all natural languages in their own right.
Named languages that you are familiar with, such as French, Chinese, English etc, are usually
historically, politically or geographically derived labels for populations of speakers.
A natural language has high ambiguity.
Example: I made her duck
1. I cooked waterfowl* for her.
2. I cooked waterfowl* belonging to her.
3. I created the (plaster?) duck she owns.
4. I caused her to quickly lower her head.
5. I turned her into a duck.
Several types of ambiguity combine to cause many meanings:
• morphological (her can be a dative pronoun or possessive pronoun and duck can be a noun
or a verb)
• syntactic (make can behave both transitively and ditransitively; make can select a direct
object or a verb)
• semantic (make can mean create, cause, cook ...)
What is a formal language?
A formal language is a set of strings over an alphabet.
Alphabet: An alphabet is specified by a finite set, ∑ , whose elements are called symbols. Some
examples are shown below:
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9} the 10-element set of decimal digits.
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
{a, b, c, …. x, y, z} the 26-element set of lower-case characters of written English.
{aardvark, ….. zebra} the 250,000-element set of words in the Oxford English Dictionary.
The set of natural numbers N = {0, 1, 2, 3, ….} cannot be an alphabet because it is infinite.
Strings: A string of length n over an alphabet ∑ is an ordered n-tuple of elements of ∑.
∑ * denotes the set of all strings over ∑ of finite length.
If ∑ = {a, b} then ∊, ba, bab, aab are examples of strings over ∑.
If ∑ = {a} then ∑ * = {∊, a, aa, aaa, ….}
If ∑ = {cats, dogs, eat} then
∑ * = {∊, cats, cats eat, cats eat dogs, …..}
Languages: Given an alphabet ∑ any subset of ∑ * is a formal language over alphabet ∑.
What is a regular language?
A language is regular if it is equal to the set of strings accepted by some deterministic finite-state
automaton (DFA).
Regular languages are accepted by DFAs.
Given a DFA M = (Q,∑,∆,s,F) the language, L(M), of strings accepted by M can be generated by
the regular grammar Greg = (N, ∑, S,P) where:
N= {Q} the non-terminals are the states of M
∑ = ∑ the terminals, set of transition symbols of M
S = s the starting symbol is the starting state of M
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
P = qi → aqj when (qi , a) = qj ∊ ∆
or qi → ∊ when q ∊ F (i.e. when q is an end state)
In order to derive a string from a grammar
• start with the designated starting symbol
• then non-terminal symbols are repeatedly expanded using the rewrite rules until there is
nothing further left to expand.
The rewrite rules derive the members of a language from their internal structure (or phrase
structure).
A regular language has a left- and right-linear grammar.
For every regular grammar the rewrite rules of the grammar can all be expressed in the form:
X → aY
X → a
or alternatively, they can all be expressed as:
X → Ya
X → a
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
A phrase structure grammar over an alphabet ∑ is defined by a tuple G = (N, ∑, S,P). The language
generated by grammar G is L(G):
Non-terminals N: Non-terminal symbols (often uppercase letters) may be rewritten using the rules
of the grammar.
Terminals ∑ : Terminal symbols (often lowercase letters) are elements of ∑ and cannot be rewritten.
Note N ∩ ∑ = ϕ.
Start Symbol S: A distinguished non-terminal symbol S ∊ N. This non-terminal provides the starting
point for derivations.
Phrase Structure Rules P: Phrase structure rules are pairs of the form (w, v) usually written :
w → v, where w ∊ (∑ ∪ N)*N(∑ ∪ N)* and v ∊ (∑ ∪ N)*
Now lets try and the answer the question Can regular grammars model natural language?
It turns out that regular grammars have limitations when modelling natural languages for following
reasons:
• Centre Embedding
• Redundancy
• Useful internal structures
Problems using regular grammars for natural language
1. Centre Embedding
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
In principle, the syntax of natural languages cannot be described by a regular language due to the
presence of centre-embedding; i.e. infinitely recursive structures described by the rule, A → αAβ,
which generate language examples of the form, an
bn
.
For instance, the sentences below have a center embedded structure.
1. The students the police arrested complained.
2. The luggage that the passengers checked arrived.
3. The luggage that the passengers that the storm delayed checked arrived
Intuitively, the reason that a regular language cannot describe centre-embedding is that its
associated automaton has no memory of what has occurred previously in a string.
In order to ‘know’ that n verbs were required to match n nominals already seen, an automaton would
need to ‘record’ that n nominals had been seen; but a DFA has no mechanism to do this.
Formally, we can prove this using Pumping Lemma property to show that strings of the form anbn
are not regular.
The pumping lemma for regular languages is used to prove that a language is not regular. The
pumping lemma property is:
All w ∊ L with |w| ≥ l can be expressed as a concatenation of three strings, w = u1vu2, where u1, v
and u2 satisfy:
|v| ≥ 1 (i.e. v ≠ ∊)
u1|v| ≤ l
for all n ≥ 0, u1vnu2 ∊ L (i.e. u1u2 ∊ L, u1vu2 ∊ L, u1vvu2 2 L, u1vvvu2 ∊ L, etc.)
If you intersect a regular language with another regular language you should get a third regular
language.
Lreg1 ∩ Lreg2 = Lreg3
Also regular languages are closed under homomorphism (we can map all nouns to a and all verbs
to b)
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
So if English is regular and we intersect it with another regular language (e.g. the one generated by
/the a (that the a)*b*/) we should get another regular language.
if Leng then Leng ∩ La*b* = Lreg3
However the intersection of an a*b* with English is anbn ( in our example case specifically /the a
(that the a)n-1bn/), which is not regular as it fails the pumping lemma property.
but Leng ∩ La*b* = La
n
b
n
(which is not regular )
The assumption that English is regular must be incorrect.
2. Redundancy
Grammars written using regular grammar rules alone are highly redundant: since the rules are very
simple we need a great many of them to describe the language. This makes regular grammars very
difficult to build and maintain.
Useful internal structures
There are instances where a regular language can recognize the strings of a language but in doing
so does not provide a structure that is linguistically useful to us. The left-linear or right-linear
internal structures derived by regular grammars are generally not very useful for higher level NLP
applications.
We need informative internal structure so that we can, for example, build up good semantic
representations.
In practice, regular grammars can be useful for partial grammars (i.e. when we don’t need to know
the syntax tree for the whole sentence but rather just some part of it) and also when we don’t care
about derivational structure (i.e. when we just want a Boolean for whether a string is in a language).
For example, in information extraction, we need to recognize named entities.
The internal structure of named entities is normally unimportant to us, we just want to recognize
when we encounter them.
For instance, using rules such as:
NP → nnsb NP
NP → np1 NP
NP → np1
where NP is a non-terminal and nnsb and np1 are terminals representing tags from the large tagset,
you could match a titled name like, Prof. Stephen William Hawking.
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
For every natural language that exists, can we find a context-free grammar to generate it?
There is some evidence that natural language can contain cross serial dependencies. A small
number of languages exhibit strings of the form shown below.
There is a Zurich dialect of Swiss German in which constructions like the following are found:
mer d’chind em Hans es huus haend wele laa hälfe aastriiche.
we the children Hans the house have wanted to let help paint.
we have wanted to let the children help Hans paint the house.
Such expressions may not be derivable by a context-free grammar.
Where do natural languages fit in Chomsky hierarchy?
If we are to use formal grammars to represent natural language, it is useful to know where they
appear in the Chomsky hierarchy. With respect to natural language, it might turn out that the set of
all attested natural languages is actually as depicted in Figure.
The overlap with the context-sensitive languages which accounts for those languages that have
cross-serial dependencies.
Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT
Natural languages are an infinite set of sentences constructed out of a finite set of characters.
Words in a sentence don’t have defined upper limits either.
When natural languages are reverse engineered into their component parts, they get broken down
into four parts - syntax, semantics, morphology, phonology.
Natural languages are believed to be at least context-free. However, Dutch and Swiss German
contain grammatical constructions with cross-serial dependencies which make them context
sensitive.
Extensions to Chomsky hierarchy that find relevance in NLP
There are two extensions to the traditional Chomsky hierarchy that have proved useful in linguistics
and cognitive science:
Mildly context-sensitive languages – CFGs are not adequate (weakly or strongly) to characterize
some aspects of language structure. To derive extra power beyond CFG, a grammatical formalism
called Tree Adjoining Grammars (TAG) was proposed as an approximate characterization of Mildly
Context-Sensitive Grammars. composition, called 'adjoining’.
Another classification called Minimalist Grammars (MG) describes an even larger class of formal
languages.
Sub-regular languages
A sub-regular language is a set of strings that can be described without employing the full power of
finite state automata. Many aspects of human language are manifestly sub-regular, such as some
‘strictly local’ dependencies.
Example – identifying recurring sub-string patterns within words is one such common application.

More Related Content

What's hot

Semantic analysis
Semantic analysisSemantic analysis
Semantic analysis
Ibrahim Muneer
 
5. phases of nlp
5. phases of nlp5. phases of nlp
5. phases of nlp
monircse2
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
kartikaVashisht
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
Hemantha Kulathilake
 
Regular expressions-Theory of computation
Regular expressions-Theory of computationRegular expressions-Theory of computation
Regular expressions-Theory of computation
Bipul Roy Bpl
 
Minimization of DFA
Minimization of DFAMinimization of DFA
Minimization of DFA
kunj desai
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
Hemantha Kulathilake
 
Semantics analysis
Semantics analysisSemantics analysis
Semantics analysis
Bilalzafar22
 
Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2
DigiGurukul
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
HeneWijaya
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
ramya marichamy
 
word level analysis
word level analysis word level analysis
word level analysis
tjs1
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
Marina Santini
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
Rajnish Raj
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
Gurram Poorna Prudhvi
 
recursive transition_networks
recursive transition_networksrecursive transition_networks
recursive transition_networks
Rajendran
 
Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weights
Vaibhav Khanna
 
Formal Languages and Automata Theory Unit 1
Formal Languages and Automata Theory Unit 1Formal Languages and Automata Theory Unit 1
Formal Languages and Automata Theory Unit 1
Srimatre K
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
Marina Santini
 
Treebank annotation
Treebank annotationTreebank annotation
Treebank annotation
Mohit Jasapara
 

What's hot (20)

Semantic analysis
Semantic analysisSemantic analysis
Semantic analysis
 
5. phases of nlp
5. phases of nlp5. phases of nlp
5. phases of nlp
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
Regular expressions-Theory of computation
Regular expressions-Theory of computationRegular expressions-Theory of computation
Regular expressions-Theory of computation
 
Minimization of DFA
Minimization of DFAMinimization of DFA
Minimization of DFA
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
 
Semantics analysis
Semantics analysisSemantics analysis
Semantics analysis
 
Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2Artificial Intelligence Notes Unit 2
Artificial Intelligence Notes Unit 2
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
 
word level analysis
word level analysis word level analysis
word level analysis
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
recursive transition_networks
recursive transition_networksrecursive transition_networks
recursive transition_networks
 
Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weights
 
Formal Languages and Automata Theory Unit 1
Formal Languages and Automata Theory Unit 1Formal Languages and Automata Theory Unit 1
Formal Languages and Automata Theory Unit 1
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 
Treebank annotation
Treebank annotationTreebank annotation
Treebank annotation
 

Similar to Lecture Notes-Are Natural Languages Regular.pdf

Regular expression
Regular expressionRegular expression
Regular expression
lahirubanuka1
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammar
meresie tesfay
 
INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.ppt
LamhotNaibaho3
 
Types of Language in Theory of Computation
Types of Language in Theory of ComputationTypes of Language in Theory of Computation
Types of Language in Theory of Computation
Ankur Singh
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
shrey bhate
 
Mba ebooks ! Edhole
Mba ebooks ! EdholeMba ebooks ! Edhole
Mba ebooks ! Edhole
Edhole.com
 
Free Ebooks Download ! Edhole
Free Ebooks Download ! EdholeFree Ebooks Download ! Edhole
Free Ebooks Download ! Edhole
Edhole.com
 
01-Introduction&Languages.pdf
01-Introduction&Languages.pdf01-Introduction&Languages.pdf
01-Introduction&Languages.pdf
TariqSaeed80
 
Word level language identification in code-switched texts
Word level language identification in code-switched textsWord level language identification in code-switched texts
Word level language identification in code-switched texts
Harsh Jhamtani
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Varunjeet Singh Rekhi
 
5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf  5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf
SVTaylor123
 
Nlp
NlpNlp
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithmsRaghu nath
 
ToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdf
jaishreemane73
 
NLP-my-lecture (3).ppt
NLP-my-lecture (3).pptNLP-my-lecture (3).ppt
NLP-my-lecture (3).ppt
KrishnaGupta717939
 
Алексей Чеусов - Расчёсываем своё ЧСВ
Алексей Чеусов - Расчёсываем своё ЧСВАлексей Чеусов - Расчёсываем своё ЧСВ
Алексей Чеусов - Расчёсываем своё ЧСВ
Minsk Linux User Group
 
Final formal languages
Final formal languagesFinal formal languages
Final formal languagesMegha Khanna
 
Formal language
Formal languageFormal language
Formal language
Rajendran
 
7. ku gr.sem 2013: Syntax
7. ku gr.sem 2013: Syntax7. ku gr.sem 2013: Syntax
7. ku gr.sem 2013: SyntaxTikaram Poudel
 

Similar to Lecture Notes-Are Natural Languages Regular.pdf (20)

Regular expression
Regular expressionRegular expression
Regular expression
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammar
 
INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.ppt
 
Types of Language in Theory of Computation
Types of Language in Theory of ComputationTypes of Language in Theory of Computation
Types of Language in Theory of Computation
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Mba ebooks ! Edhole
Mba ebooks ! EdholeMba ebooks ! Edhole
Mba ebooks ! Edhole
 
Free Ebooks Download ! Edhole
Free Ebooks Download ! EdholeFree Ebooks Download ! Edhole
Free Ebooks Download ! Edhole
 
01-Introduction&Languages.pdf
01-Introduction&Languages.pdf01-Introduction&Languages.pdf
01-Introduction&Languages.pdf
 
Word level language identification in code-switched texts
Word level language identification in code-switched textsWord level language identification in code-switched texts
Word level language identification in code-switched texts
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf  5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf
 
Nlp
NlpNlp
Nlp
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithms
 
ToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdf
 
NLP-my-lecture (3).ppt
NLP-my-lecture (3).pptNLP-my-lecture (3).ppt
NLP-my-lecture (3).ppt
 
Алексей Чеусов - Расчёсываем своё ЧСВ
Алексей Чеусов - Расчёсываем своё ЧСВАлексей Чеусов - Расчёсываем своё ЧСВ
Алексей Чеусов - Расчёсываем своё ЧСВ
 
Final formal languages
Final formal languagesFinal formal languages
Final formal languages
 
Formal language
Formal languageFormal language
Formal language
 
7. ku gr.sem 2013: Syntax
7. ku gr.sem 2013: Syntax7. ku gr.sem 2013: Syntax
7. ku gr.sem 2013: Syntax
 

Recently uploaded

DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 

Recently uploaded (20)

DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 

Lecture Notes-Are Natural Languages Regular.pdf

  • 1. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT Lecture Notes - Are Natural Languages Regular? This is an important question for two reasons: first, it places an upper bound on the running time of algorithms that process natural language; second, it may tell us something about human language processing and language acquisition. To answer this question let us first understand… • What is a language (natural language / formal language)? • What is a regular language? • What are regular grammars? What is a natural language? A natural language is a human communication system. A natural language can be thought of as a mutually understandable communication system that is used between members of some population. When communicating, speakers of a natural language are tacitly agreeing on what strings are allowed (i.e., which strings are grammatical). Dialects and specialized languages (including e.g., the language used on social media) are all natural languages in their own right. Named languages that you are familiar with, such as French, Chinese, English etc, are usually historically, politically or geographically derived labels for populations of speakers. A natural language has high ambiguity. Example: I made her duck 1. I cooked waterfowl* for her. 2. I cooked waterfowl* belonging to her. 3. I created the (plaster?) duck she owns. 4. I caused her to quickly lower her head. 5. I turned her into a duck. Several types of ambiguity combine to cause many meanings: • morphological (her can be a dative pronoun or possessive pronoun and duck can be a noun or a verb) • syntactic (make can behave both transitively and ditransitively; make can select a direct object or a verb) • semantic (make can mean create, cause, cook ...) What is a formal language? A formal language is a set of strings over an alphabet. Alphabet: An alphabet is specified by a finite set, ∑ , whose elements are called symbols. Some examples are shown below: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} the 10-element set of decimal digits.
  • 2. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT {a, b, c, …. x, y, z} the 26-element set of lower-case characters of written English. {aardvark, ….. zebra} the 250,000-element set of words in the Oxford English Dictionary. The set of natural numbers N = {0, 1, 2, 3, ….} cannot be an alphabet because it is infinite. Strings: A string of length n over an alphabet ∑ is an ordered n-tuple of elements of ∑. ∑ * denotes the set of all strings over ∑ of finite length. If ∑ = {a, b} then ∊, ba, bab, aab are examples of strings over ∑. If ∑ = {a} then ∑ * = {∊, a, aa, aaa, ….} If ∑ = {cats, dogs, eat} then ∑ * = {∊, cats, cats eat, cats eat dogs, …..} Languages: Given an alphabet ∑ any subset of ∑ * is a formal language over alphabet ∑. What is a regular language? A language is regular if it is equal to the set of strings accepted by some deterministic finite-state automaton (DFA). Regular languages are accepted by DFAs. Given a DFA M = (Q,∑,∆,s,F) the language, L(M), of strings accepted by M can be generated by the regular grammar Greg = (N, ∑, S,P) where: N= {Q} the non-terminals are the states of M ∑ = ∑ the terminals, set of transition symbols of M S = s the starting symbol is the starting state of M
  • 3. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT P = qi → aqj when (qi , a) = qj ∊ ∆ or qi → ∊ when q ∊ F (i.e. when q is an end state) In order to derive a string from a grammar • start with the designated starting symbol • then non-terminal symbols are repeatedly expanded using the rewrite rules until there is nothing further left to expand. The rewrite rules derive the members of a language from their internal structure (or phrase structure). A regular language has a left- and right-linear grammar. For every regular grammar the rewrite rules of the grammar can all be expressed in the form: X → aY X → a or alternatively, they can all be expressed as: X → Ya X → a
  • 4. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT A phrase structure grammar over an alphabet ∑ is defined by a tuple G = (N, ∑, S,P). The language generated by grammar G is L(G): Non-terminals N: Non-terminal symbols (often uppercase letters) may be rewritten using the rules of the grammar. Terminals ∑ : Terminal symbols (often lowercase letters) are elements of ∑ and cannot be rewritten. Note N ∩ ∑ = ϕ. Start Symbol S: A distinguished non-terminal symbol S ∊ N. This non-terminal provides the starting point for derivations. Phrase Structure Rules P: Phrase structure rules are pairs of the form (w, v) usually written : w → v, where w ∊ (∑ ∪ N)*N(∑ ∪ N)* and v ∊ (∑ ∪ N)* Now lets try and the answer the question Can regular grammars model natural language? It turns out that regular grammars have limitations when modelling natural languages for following reasons: • Centre Embedding • Redundancy • Useful internal structures Problems using regular grammars for natural language 1. Centre Embedding
  • 5. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT In principle, the syntax of natural languages cannot be described by a regular language due to the presence of centre-embedding; i.e. infinitely recursive structures described by the rule, A → αAβ, which generate language examples of the form, an bn . For instance, the sentences below have a center embedded structure. 1. The students the police arrested complained. 2. The luggage that the passengers checked arrived. 3. The luggage that the passengers that the storm delayed checked arrived Intuitively, the reason that a regular language cannot describe centre-embedding is that its associated automaton has no memory of what has occurred previously in a string. In order to ‘know’ that n verbs were required to match n nominals already seen, an automaton would need to ‘record’ that n nominals had been seen; but a DFA has no mechanism to do this. Formally, we can prove this using Pumping Lemma property to show that strings of the form anbn are not regular. The pumping lemma for regular languages is used to prove that a language is not regular. The pumping lemma property is: All w ∊ L with |w| ≥ l can be expressed as a concatenation of three strings, w = u1vu2, where u1, v and u2 satisfy: |v| ≥ 1 (i.e. v ≠ ∊) u1|v| ≤ l for all n ≥ 0, u1vnu2 ∊ L (i.e. u1u2 ∊ L, u1vu2 ∊ L, u1vvu2 2 L, u1vvvu2 ∊ L, etc.) If you intersect a regular language with another regular language you should get a third regular language. Lreg1 ∩ Lreg2 = Lreg3 Also regular languages are closed under homomorphism (we can map all nouns to a and all verbs to b)
  • 6. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT So if English is regular and we intersect it with another regular language (e.g. the one generated by /the a (that the a)*b*/) we should get another regular language. if Leng then Leng ∩ La*b* = Lreg3 However the intersection of an a*b* with English is anbn ( in our example case specifically /the a (that the a)n-1bn/), which is not regular as it fails the pumping lemma property. but Leng ∩ La*b* = La n b n (which is not regular ) The assumption that English is regular must be incorrect. 2. Redundancy Grammars written using regular grammar rules alone are highly redundant: since the rules are very simple we need a great many of them to describe the language. This makes regular grammars very difficult to build and maintain. Useful internal structures There are instances where a regular language can recognize the strings of a language but in doing so does not provide a structure that is linguistically useful to us. The left-linear or right-linear internal structures derived by regular grammars are generally not very useful for higher level NLP applications. We need informative internal structure so that we can, for example, build up good semantic representations. In practice, regular grammars can be useful for partial grammars (i.e. when we don’t need to know the syntax tree for the whole sentence but rather just some part of it) and also when we don’t care about derivational structure (i.e. when we just want a Boolean for whether a string is in a language). For example, in information extraction, we need to recognize named entities. The internal structure of named entities is normally unimportant to us, we just want to recognize when we encounter them. For instance, using rules such as: NP → nnsb NP NP → np1 NP NP → np1 where NP is a non-terminal and nnsb and np1 are terminals representing tags from the large tagset, you could match a titled name like, Prof. Stephen William Hawking.
  • 7. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT For every natural language that exists, can we find a context-free grammar to generate it? There is some evidence that natural language can contain cross serial dependencies. A small number of languages exhibit strings of the form shown below. There is a Zurich dialect of Swiss German in which constructions like the following are found: mer d’chind em Hans es huus haend wele laa hälfe aastriiche. we the children Hans the house have wanted to let help paint. we have wanted to let the children help Hans paint the house. Such expressions may not be derivable by a context-free grammar. Where do natural languages fit in Chomsky hierarchy? If we are to use formal grammars to represent natural language, it is useful to know where they appear in the Chomsky hierarchy. With respect to natural language, it might turn out that the set of all attested natural languages is actually as depicted in Figure. The overlap with the context-sensitive languages which accounts for those languages that have cross-serial dependencies.
  • 8. Prof. Deptii Chaudhari, Department of Computer Engineering, I2IT Natural languages are an infinite set of sentences constructed out of a finite set of characters. Words in a sentence don’t have defined upper limits either. When natural languages are reverse engineered into their component parts, they get broken down into four parts - syntax, semantics, morphology, phonology. Natural languages are believed to be at least context-free. However, Dutch and Swiss German contain grammatical constructions with cross-serial dependencies which make them context sensitive. Extensions to Chomsky hierarchy that find relevance in NLP There are two extensions to the traditional Chomsky hierarchy that have proved useful in linguistics and cognitive science: Mildly context-sensitive languages – CFGs are not adequate (weakly or strongly) to characterize some aspects of language structure. To derive extra power beyond CFG, a grammatical formalism called Tree Adjoining Grammars (TAG) was proposed as an approximate characterization of Mildly Context-Sensitive Grammars. composition, called 'adjoining’. Another classification called Minimalist Grammars (MG) describes an even larger class of formal languages. Sub-regular languages A sub-regular language is a set of strings that can be described without employing the full power of finite state automata. Many aspects of human language are manifestly sub-regular, such as some ‘strictly local’ dependencies. Example – identifying recurring sub-string patterns within words is one such common application.