TOC Introduction

Introduction of Theory of
Computation
• Automata theory (also known as Theory Of
Computation) is a theoretical branch of Computer
Science and Mathematics, which mainly deals with the
logic of computation with respect to simple machines,
referred to as automata.
• Automata* enables the scientists to understand how
machines compute the functions and solve problems.
The main motivation behind developing Automata
Theory was to develop methods to describe and
analyse the dynamic behavior of discrete systems.
• Automata is originated from the word “Automaton”
which is closely related to “Automation”.

• Now, let’s understand the basic terminologies,
which are important and frequently used in
Theory of Computation.

• String: String is a finite sequence of symbols
from some alphabet.
• String is generally denoted as w and length of
a string is denoted as |w|.

• Language: A language is a set of strings,
chosen from some Σ* or we can say- ‘A
language is a subset of Σ* ‘.
• A language which can be formed over ‘ Σ ‘ can
be Finite or Infinite.

• Powers of ‘ Σ ‘ :
Say Σ = {a,b} then
Σ0 = Set of all strings over Σ of length 0. {ε}
Σ1 = Set of all strings over Σ of length 1. {a, b}
Σ2 = Set of all strings over Σ of length 2. {aa,
ab, ba, bb}
i.e. |Σ2|= 4 and Similarly, |Σ3| = 8

Finite Automata
• Finite automata are used to recognize patterns.
• It takes the string of symbol as input and changes
its state accordingly. When the desired symbol is
found, then the transition occurs.
• At the time of transition, the automata can either
move to the next state or stay in the same state.
• Finite automata have two states, Accept
state or Reject state. When the input string is
processed successfully, and the automata
reached its final state, then it will accept.

Formal Definition of FA
• A finite automaton is a collection of 5-tuple
(Q, ∑, δ, q0, F), where:

Finite Automata Model:
• Finite automata can be represented by input
tape and finite control.
• Input tape: It is a linear tape having some
number of cells. Each input symbol is placed in
each cell.
Finite control: The finite control decides the
next state on receiving particular input from
input tape.
The tape reader reads the cells one by one from
left to right, and at a time only one input symbol
is read.

Types of Automata
• There are two types of finite automata:
1. DFA(deterministic finite automata)
2. NFA(non-deterministic finite automata)

• 1. DFA
• DFA refers to deterministic finite automata.
Deterministic refers to the uniqueness of the
computation.
• In the DFA, the machine goes to one state only
for a particular input character.
• DFA does not accept the null move.
• 2. NFA
• NFA stands for non-deterministic finite
automata. It is used to transmit any number of
states for a particular input.
• It can accept the null move.

• Some important points about DFA and NFA:
• Every DFA is NFA, but NFA is not DFA.
• There can be multiple final states in both NFA
and DFA.
• DFA is used in Lexical Analysis in Compiler.
• NFA is more of a theoretical concept.

Transition Diagram
• A transition diagram or state transition diagram is
a directed graph which can be constructed as
follows:
• There is a node for each state in Q, which is
represented by the circle.
• There is a directed edge from node q to node p
labeled a if δ(q, a) = p.
• In the start state, there is an arrow with no
source.
• Accepting states or final states are indicating by a
double circle.

• Some Notations that are used in the transition
diagram:

• DFA with ∑ = {0, 1} accepts all strings starting
with 1.

• The finite automata can be represented using
a transition graph.
• In the above diagram, the machine initially is
in start state q0 then on receiving input 1 the
machine changes its state to q1.
• From q0 on receiving 0, the machine changes
its state to q2, which is the dead state.
• From q1 on receiving input 0, 1 the machine
changes its state to q1, which is the final state.
• The possible input strings that can be
generated are 10, 11, 110, 101, 111......., that
means all string starts with 1.

• NFA with ∑ = {0, 1} accepts all strings starting
with 1.
The NFA can be represented using a transition graph.
In the above diagram, the machine initially is in start
state q0 then on receiving input 1 the machine changes
its state to q1.
From q1 on receiving input 0, 1 the machine changes its
state to q1.
The possible input string that can be generated is 10, 11,
110, 101, 111......, that means all string starts with 1.

DFA (Deterministic finite automata)
• DFA refers to deterministic finite automata.
Deterministic refers to the uniqueness of the
computation. The finite automata are called
deterministic finite automata if the machine is
read an input string one symbol at a time.
• In DFA, there is only one path for specific input
from the current state to the next state.
• DFA does not accept the null move, i.e., the DFA
cannot change state without any input character.
• DFA can contain multiple final states. It is used in
Lexical Analysis in Compiler.

Formal Definition of DFA
• A DFA is a collection of 5-tuples same as we
described in the definition of FA.

• Transition function can be defined as:

Graphical Representation of DFA
• A DFA can be represented by digraphs called
state diagram. In which:
1. The state is represented by vertices.
2. The arc labeled with an input character show
the transitions.
3. The initial state is marked with an arrow.
4. The final state is denoted by a double circle.

• Design FA with ∑ = {0, 1} accepts the set of all
strings with three consecutive 0's.
• The strings that will be generated for this
particular languages are 000, 0001, 1000,
10001, .... in which 0 always appears in a
clump of 3.
• The transition graph is as follows:

• Design a FA with ∑ = {0, 1} accepts the strings
with an even number of 0's followed by
single 1.

• Arden’s Theorem and Challenging
Applications
Having gained the knowledge of how to draw a
basic Finite State Machine ( DFA, NFA or -NFA
).
We head to deriving a Regular Expression from
the provided state machine.

Arden’s Theorem states that, if P & Q are two regular expressions
over , and if P does not contain , then the following equation R
given by R = Q + RP has a unique solution ; R = QP*

• Let’s solve the provided automata above with
the help of Arden’s Theorem.
• We see, that on state C, there is a state
transition coming from B when a is the input
• C = Ba ……1
• On state B, There is a self loop on input b, a
transition from A when input is a, a transition
from state C when input is b
• B = Bb + Cb + Aa ……..2
• On state A, There is a transition ( being the
start state, transition must be included), a self
loop on input a, a transition from B when
input is b.

Regular Grammar
• A grammar is regular if it has rules of form
• A -> a or A -> aB or A -> ɛ where ɛ is a special
symbol called NULL.
Regular Languages :
• A language is regular if it can be expressed in
terms of regular expression.

2018
What are the closure properties of regular
language?(2.5 marks)

Closure Properties of Regular
Languages

Designing Finite Automata from
Regular Expression
• Even number of a’s : The regular expression
for even number of a’s is (b|ab*ab*)*. We can
construct a finite automata as shown in Figure
1.

• The above automata will accept all strings
which have even number of a’s.
• For zero a’s, it will be in q0 which is final state.
For one ‘a’, it will go from q0 to q1 and the
string will not be accepted.
• For two a’s at any positions, it will go from q0
to q1 for 1st ‘a’ and q1 to q0 for second ‘a’.
• So, it will accept all strings with even number
of a’s.

• String with ‘ab’ as substring :
• The regular expression for strings with ‘ab’ as
substring is (a|b)*ab(a|b)*.
• We can construct finite automata as shown in
Figure 2.

• The above automata will accept all string
which have ‘ab’ as substring.
• The automata will remain in initial state q0 for
b’s.
• It will move to q1 after reading ‘a’ and remain
in same state for all ‘a’ afterwards.
• Then it will move to q2 if ‘b’ is read.
• That means, the string has read ‘ab’ as
substring if it reaches q2.

• String with count of ‘a’ divisible by 3 :
• The regular expression for strings with count
of a divisible by 3 is {a3n | n >= 0}.
• We can construct automata as shown in Figure
3.

• The above automata will accept all string of form
a3n.
• The automata will remain in initial state q0 for ɛ
and it will be accepted.
• For string ‘aaa’, it will move from q0 to q1 then q1
to q2 and then q2 to q0. For every set of three
a’s, it will come to q0, hence accepted.
Otherwise, it will be in q1 or q2, hence rejected.
• Note : If we want to design a finite automata with
number of a’s as 3n+1, same automata can be
used with final state as q1 instead of q0.
If we want to design a finite automata with
language {akn | n >= 0}, k states are required. We
have used k = 3 in our example.

• Binary numbers divisible by 3 :
• The regular expression for binary numbers
which are divisible by three is (0|1(01*0)*1)*.
• The examples of binary number divisible by 3
are 0, 011, 110, 1001, 1100, 1111, 10010 etc.
• The DFA corresponding to binary number
divisible by 3 can be shown in Figure 4.

• The above automata will accept all binary
numbers divisible by 3.
• For 1001, the automata will go from q0 to q1,
then q1 to q2, then q2 to q1 and finally q2 to
q0, hence accepted.
• For 0111, the automata will go from q0 to q0,
then q0 to q1, then q1 to q0 and finally q0 to
q1, hence rejected.

• String with regular expression (111 + 11111)*
: The string accepted using this regular
expression will have 3, 5, 6(111 twice), 8
(11111 once and 111 once), 9 (111 thrice), 10
(11111 twice) and all other counts of 1
afterwards.
• The DFA corresponding to given regular
expression is given in Figure 5.

Union process in DFA
• Designing a DFA for the set of string over {a, b}
such that string of the language start and end
with different symbols.
• There two desired language will be formed:

• This DFA accepts all the string starting with a
and ending with b.
• Here, State A is initial state and state C is final
state.

• This DFA accepts all the string starting with b
and ending with a.
• Here, State A is initial state and state C is final
state.
• Now, Taking the union of L1 and L2 language
which gives the final result of the language
which starts and end with different elements.

• Thus as we see that L1 and L2 have been
combined through union process and this final
DFA accept all the language containing strings
starting and ending with a different symbols.
Note: From above example we can also infer
that regular languages are closed under
union(i.e Union of two regular languages is
also regular).

Concatenation process in DFA
• Designing a DFA for the set of string over {a, b}
such that string of the language start with “a”
and end with “b”.
• There two desired language will be formed:

• This DFA acceept all the string which starts
with “a”.
• Here, state C is the final state and B is the
dead state this is called so because after
getting any alphabet this state will never go to
final state.

Minimization of DFA
• Minimization of DFA means reducing the
number of states from given FA.
• Thus, we get the FSM(finite state machine)
with redundant states after minimizing the
FSM.
• We have to follow the various steps to
minimize the DFA.
• These are as follows

• Step 1: Remove all the states that are
unreachable from the initial state via any set
of the transition of DFA.
• Step 2: Draw the transition table for all pair of
states.
• Step 3: Now split the transition table into two
tables T1 and T2. T1 contains all final states,
and T2 contains non-final states.
• Step 4: Find similar rows from T1 such that:

• That means, find the two states which have
the same value of a and b and remove one of
them.
• Step 5: Repeat step 3 until we find no similar
rows available in the transition table T1.
• Step 6: Repeat step 3 and step 4 for table T2
also.
• Step 7: Now combine the reduced T1 and T2
tables. The combined transition table is the
transition table of minimized DFA.

Pair of each state
We have 2 pair of cell for each state

• We want single pair of cell so we divide the
table by diagonal and remove the upper part

Step 4
• Combined all unmarked pair and make them
a single state
(A,B),(D,C),(E,C) AND (E,D)

Regular Expression
• The language accepted by finite automata can be
easily described by simple expressions called
Regular Expressions.
• It is the most effective way to represent any
language.
• The languages accepted by some regular
expression are referred to as Regular languages.
• A regular expression can also be described as a
sequence of pattern that defines a string.
• Regular expressions are used to match character
combinations in strings.
• String searching algorithm used this pattern to
find the operations on a string.

Operations on Regular Language
• The various operations on regular language
are:
• Union: If L and M are two regular languages
then their union L U M is also a union.

• Intersection: If L and M are two regular
languages then their intersection is also an
intersection.
• Kleen closure: If L is a regular language then
its Kleen closure L1* will also be a regular
language.

• Write the regular expression for the language
accepting all the string containing any
number of a's and b's.
• The regular expression will be:
• This will give the set as L = {ε, a, aa, b, bb, ab,
ba, aba, bab, .....}, any combination of a and b.
• The (a + b)* shows any combination with a
and b even a null string.

Examples of Regular Expression
accepting all the string which are starting
with 1 and ending with 0, over ∑ = {0, 1}.
• In a regular expression, the first symbol should
be 1, and the last symbol should be 0.
• The r.e. is as follows:

starting and ending with a and having any
combination of b's in between.
• The regular expression will be:
starting with a but not having consecutive
b's.

L over ∑ = {0, 1} such that all the string do not
contain the substring 01.

over ∑ = {0} having even length of the string.

Conversion of RE to FA
• To convert the RE to FA, we are going to use a
method called the subset method.
• This method is used to obtain FA from the
given regular expression.
• This method is given below:
• Step 1: Design a transition diagram for given
regular expression, using NFA with ε moves.
• Step 2: Convert this NFA with ε to NFA without
ε.
• Step 3: Convert the obtained NFA to
equivalent DFA.

• Design a FA from given regular expression
10 + (0 + 11)0* 1.
• Solution: First we will construct the transition
diagram for a given regular expression.

Kleene’s Theorem
• A language is said to be regular if it can be represented
by using a Finite Automata or if a Regular
Expression can be generated for it.
• This definition leads us to the general definition that;
For every Regular Expression corresponding to the
language, a Finite Automata can be generated.
• For certain expressions like :- (a+b), ab, (a+b)* ; It’s
fairly easier to make the Finite Automata by just
intuition as shown below.
• The problem arises when we are provided with a
longer Regular Expression.
• This brings about the need for a systematic approach
towards FA generation, which has been put forward by
Kleene in Kleene’s Theorem – I

• To understand Kleene’s Theorem-I, Let’s take
in account the basic definition of Regular
Expression where we observe that , and a
single input symbol “a” can be included in a
Regular Language and the corresponding
operations that can be performed by the
combination of these are:

• We can further use this definition in
association with Null Transitions to give rise to
a FA by the combination of two or more
smaller Finite Automata (each corresponding
to a Regular Expression).
• Let S accept L = {a} and T accept L = {b}, then R
can be represented as a combination of S and
T using the provided operations as:

• We observe that,
• In case of union operation we can have a new
start state, from which, null transition
proceeds to the starting state of both the
Finite State Machines.
• The final states of both the Finite Automata’s
are converted to intermediate states.
• The final state is unified into one which can be
traversed by null transitions.

In case of concatenation operation we can have the same
starting state as that of S, the only change occurs in the end
state of S, which is converted to an intermediate state
followed by a Null Transition.
The Null transition is followed by the starting state of T, the
final state of T is used as the end state of R.

Identities of Regular Expression

• Deterministic FA and Non-Deterministic FA:
• In deterministic FA, there is only one move from
every state on every input symbol but in Non-
Deterministic FA, there can be zero or more than
one move from one state for an input symbol.
• Note:
• Language accepted by NDFA and DFA are same.
• Power of NDFA and DFA is same.
• No. of states in NDFA is less than or equal to no.
of states in equivalent DFA.
• For NFA with n-states, in worst case, the
maximum states possible in DFA is 2n
• Every NFA can be converted to corresponding
DFA.

Mealy and Moore Machines in TOC
• Moore Machines: Moore machines are finite stat
e machines with output value and its output
• depends only on present state.
• It can be defined as (Q, q0, ∑, O, δ, λ) where:
• Q is finite set of states.
• q0 is the initial state.
• ∑ is the input alphabet.
• O is the output alphabet.
• δ is transition function which maps Q×∑ → Q.
• λ is the output function which maps Q → O.

• In the moore machine shown in Figure 1, the
output is represented with each input state se
parated by /.
• The length
of output for a moore machine is greater than
input by 1.
• Input: 11
• Transition: δ (q0,11)=> δ(q2,1)=>q2
• Output: 000 (0 for q0, 0 for q2 and again 0 for
q2)

Mealy Machines:
• Mealy machines are also finite state machines
with output value and its output depends on
present state and current input symbol.
• It can be defined as (Q, q0, ∑, O, δ, λ’) where:
• Q is finite set of states.
• q0 is the initial state.
• ∑ is the input alphabet.
• O is the output alphabet.
• δ is transition function which maps Q×∑ → Q.
• ‘λ’ is the output function which maps Q×∑→ O

Conversion from Mealy machine to
Moore Machine
• In Moore machine, the output is associated with
every state, and in Mealy machine, the output is
given along the edge with input symbol.
• To convert Moore machine to Mealy machine,
state output symbols are distributed to input
symbol paths.
• But while converting the Mealy machine to
Moore machine, we will create a separate state
for every new output symbol and according to
incoming and outgoing edges are distributed.

• The following steps are used for converting
Mealy machine to the Moore machine:
Step 1: For each state(Qi), calculate the number
of different outputs that are available in the
transition table of the Mealy machine.
Step 2: Copy state Qi, if all the outputs of Qi are
the same. Break qi into n states as Qin, if it has n
distinct outputs where n = 0, 1, 2....
Step 3: If the output of initial state is 0, insert a
new initial state at the starting which gives 1
output.

Example 1:
• Convert the following Mealy machine into
equivalent Moore machine.

Transition table for above Mealy
machine is as follows:

• For state q1, there is only one incident edge
with output 0. So, we don't need to split this
state in Moore machine.
• For state q2, there is 2 incident edge with
output 0 and 1. So, we will split this state into
two states q20( state with output 0) and
q21(with output 1).
• For state q3, there is 2 incident edge with
output 0 and 1. So, we will split this state into
two states q30( state with output 0) and q31(
state with output 1).
• For state q4, there is only one incident edge
with output 0. So, we don't need to split this
state in Moore machine.

Transition table for Moore machine
will be

Transition diagram for Moore
machine will be:

Conversion from Moore machine to
Mealy Machine
• In the Moore machine, the output is associated
with every state, and in the mealy machine, the
output is given along the edge with input symbol.
• The equivalence of the Moore machine and
Mealy machine means both the machines
generate the same output string for same input
string.
• We cannot directly convert Moore machine to its
equivalent Mealy machine because the length of
the Moore machine is one longer than the Mealy
machine for the given input.

• To convert Moore machine to Mealy machine,
state output symbols are distributed into
input symbol paths.
• We are going to use the following method to
convert the Moore machine to Mealy machine

Method for conversion of Moore
machine to Mealy machine
• Let M = (Q, ∑, δ, λ, q0) be a Moore machine.
The equivalent Mealy machine can be
represented by M' = (Q, ∑, δ, λ', q0).
• The output function λ' can be obtained as:

Convert the following Moore machine
into its equivalent Mealy machine.

The transition table of given Moore
machine is as follows:

The equivalent Mealy machine can be
obtained as follows:

• The λ for state q1 is as follows:

• Hence the transition table for the Mealy
machine can be drawn as follows:

The equivalent Mealy machine will be

Conversion from NFA to DFA
• An NFA can have zero, one or more than one
move from a given state on a given input
symbol.
• An NFA can also have NULL moves (moves
without input symbol).
• On the other hand, DFA has one and only one
move from a given state on a given input
symbol.

• Conversion from NFA to DFA
Suppose there is an NFA N < Q, ∑, q0, δ, F >
which recognizes a language L.
• Then the DFA D < Q’, ∑, q0, δ’, F’ > can be
constructed for language L as:
Step 1: Initially Q’ = ɸ.
Step 2: Add q0 to Q’.
Step 3: For each state in Q’, find the possible
set of states for each input symbol using
transition function of NFA.
• If this set of states is not in Q’, add it to Q’.
Step 4: Final state of DFA will be all states with
contain F (final states of NFA)

DFA machines accepting odd number
of 0’s or/and even number of 1’s
• Problem – Construct a DFA machine over input
alphabet = {0, 1}, that accepts:
• Odd number of 0’s or even number of 1’s
• Odd number of 0’s and even number of 1’s
• Either odd number of 0’s or even number of 1’s
but not the both together
• Solution – Let first design two separate machines
for the two conditions:
• Accepting only odd number of 0’s
• Accepting only even number of 1’s
• Then, merge these two and find the required final
states.

DFA of a string with at least two 0’s
and at least two 1’s
• Problem – Draw deterministic finite automata
(DFA) of a string with at least two 0’s and at
least two 1’s.
• The first thing that come to mind after reading
this question us that we count the number of
1’s and 0’s.
• Thereafter if they both are at least 2 the string
is accepted else not accepted.
• But we do not have any concept of memory in
a DFA so we cannot do it by this method.

Complementation process in DFA
• Suppose we have a DFA that is defined by
and it accepts the language L1.
Then, the DFA which accepts the language
L2 where L2 = ̅L1‘, will be defined as below:
The complement of a DFA can be obtained
by making the non-final states as final states and
vice-versa. The language accepted by the
complemented DFA L2 is the complement of the
language L1.

• Example-1:
L1: set of all strings over {a, b} of even length

• Note: Regular languages are closed under
complement (i.e Complement of regular
language will also be regular).

2011
What are the type of grammar according to
Chomsky classification?(2 marks)

Chomsky Hierarchy in Theory of
Computation
• According to Chomsky hierarchy, grammars
are divided of 4 types:

Type 0: Unrestricted Grammar
• In Type 0
Type-0 grammars include all formal grammars.
Type 0 grammar language are recognized by
turing machine.
• These languages are also known as the
Recursively Enumerable languages.
• Grammar Production in the form of

For example,
• Sab –> ba
A –> S.
• Here, Variables are S, A and Terminals a, b.

Type 1: Context Sensitive Grammar
• Type-1 grammars generate the context-
sensitive languages.
• The language generated by the grammar are
recognized by the Linear Bound Automata
In Type 1
I. First of all Type 1 grammar should be Type 0.
II. Grammar Production in the form of

Type 2: Context Free Grammar
• Type-2 grammars generate the context-free
languages.
• The language generated by the grammar is
recognized by a Pushdown automata.
• Type-2 grammars generate the context-free
languages.
In Type 2,
1. First of all it should be Type 1.
2. Left hand side of production can have only
one variable.

For example,
S –> AB
A –> a
B –> b

Type 3: Regular Grammar
• Type-3 grammars generate regular languages.
These languages are exactly all languages that
can be accepted by a finite state automaton.
• Type 3 is most restricted form of grammar.
Type 3 should be in the given form only :
• V –> VT* / T*.
(or)
V –> T*V /T*

Relationship between grammar and
language in Theory of Computation
• A grammar is a set of production rules which
are used to generate strings of a language.
• we will discussed how to find the language
generated by a grammar and vice versa as
well.

Language generated by a grammar
• Given a grammar G, its corresponding
language L(G) represents the set of all strings
generated from G.
• Consider the following grammar

• Using S->a and S->b, a and b can be
generated.
• Similarly using S=>aSa=>aba, aba can be
generated.
• Other strings which can be generated from
grammar are: a, b, aba, bab, aaa, bbb, ababa,
…
All odd length palindromes.

CFL Closure Property
• Context-free languages are closed under −
1. Union
2. Concatenation
3. Kleene Star operation

Properties of Context Free Languages

• Thus, if L is a CFL, there exists an integer n,
such that for all x ∈ L with |x| ≥ n, there exists
u, v, w, x, y ∈ Σ∗, such that x = uvwxy, and
(1) |vwx| ≤ n
(2) |vx| ≥ 1
(3) for all i ≥ 0: uviwxiy ∈ L

For above example, 0n1n is CFL, as any string can be the result of
pumping at two places, one for 0 and other for 1.

• Let us prove, L012 = {0n1n2n | n ≥ 0} is not
Context-free.
Let us assume that L is Context-free, then by
Pumping Lemma, the above given rules follow.
Now, let x ∈ L and |x| ≥ n.
• So, by Pumping Lemma, there exists u, v, w, x,
y such that (1) – (3) hold.
We show that for all u, v, w, x, y (1) – (3) do
not hold.

• If (1) and (2) hold then x = 0n1n2n = uvwxy with
|vwx| ≤ n and |vx| ≥ 1.
(1) tells us that vwx does not contain both 0
and 2.
• Thus, either vwx has no 0’s, or vwx has no 2’s.
• Thus, we have two cases to consider.
Suppose vwx has no 0’s.
• By (2), vx contains a 1 or a 2.
• Thus uwy has ‘n’ 0’s and uwy either has less
than ‘n’ 1’s or has less than ‘n’ 2’s.

• But (3) tells us that uwy = uv0wx0y ∈ L.
So, uwy has an equal number of 0’s, 1’s and
2’s gives us a contradiction.
• The case where vwx has no 2’s is similar and
also gives us a contradiction.
• Thus L is not context-free.

Context-Free Grammar (CFG)
• CFG stands for context-free grammar.
• It is is a formal grammar which is used to
generate all possible patterns of strings in a
given formal language.
• Context-free grammar G can be defined by
four tuples as:

• Where,
• G is the grammar, which consists of a set of the
production rule.
• It is used to generate the string of a language.
• T is the final set of a terminal symbol. It is
denoted by lower case letters.
• V is the final set of a non-terminal symbol.
• It is denoted by capital letters.
• P is a set of production rules, which is used for
replacing non-terminals symbols(on the left side
of the production) in a string with other terminal
or non-terminal symbols(on the right side of the
production).

• S is the start symbol which is used to derive
the string.
• We can derive the string by repeatedly
replacing a non-terminal by the right-hand
side of the production until all non-terminal
have been replaced by terminal symbols.

• Construct the CFG for the language having
any number of a's over the set ∑= {a}.

• Now if we want to derive a string "aaaaaa",
we can start with start symbols.
The r.e. = a* can generate a set of string {ε, a, aa, aaa,.....}.
We can have a null string because S is a start symbol and rule 2
gives S → ε.

• Construct a CFG for a language L = {wcwR |
where w € (a, b)*}.
• The string that can be generated for a given
language is {aacaa, bcb, abcba, bacab,
abbcbba, ....}
• The grammar could be:

• Now if we want to derive a string "abbcbba",
we can start with start symbols.
Thus any of this kind of string can be derived from the given
production rules.

Derivation
• Derivation is a sequence of production rules.
• It is used to get the input string through these
production rules.
• During parsing, we have to take two decisions.
• These are as follows:
• We have to decide the non-terminal which is to
be replaced.
• We have to decide the production rule by which
the non-terminal will be replaced.
• We have two options to decide which non-
terminal to be placed with production rule.

1. Leftmost Derivation:
• In the leftmost derivation, the input is
scanned and replaced with the production
rule from left to right.
• So in leftmost derivation, we read the input
string from left to right

2. Rightmost Derivation
• In rightmost derivation, the input is scanned
and replaced with the production rule from
right to left.
• So in rightmost derivation, we read the input
string from right to left.

The rightmost derivation is:
When we use the leftmost derivation or rightmost derivation, we
may get the same string.
This type of derivation does not affect on getting of a string.

• Derive the string "00101" for leftmost
derivation and rightmost derivation using a
CFG given by,

Derivation Tree
• Derivation tree is a graphical representation for
the derivation of the given production rules for a
given CFG.
• It is the simple way to show how the derivation
can be done to obtain some string from a given
set of production rules.
• The derivation tree is also called a parse tree.
• Parse tree follows the precedence of operators.
The deepest sub-tree traversed first.
• So, the operator in the parent node has less
precedence over the operator in the sub-tree.

• A parse tree contains the following properties:
1. The root node is always a node indicating
start symbols.
2. The derivation is read from left to right.
3. The leaf node is always terminal nodes.
4. The interior nodes are always the non-
terminal nodes.

Ambiguity in Grammar
• A grammar is said to be ambiguous if there exists
more than one leftmost derivation or more than
one rightmost derivation or more than one parse
tree for the given input string.
• If the grammar is not ambiguous, then it is called
unambiguous.
• If the grammar has ambiguity, then it is not good
for compiler construction.
• No method can automatically detect and remove
the ambiguity, but we can remove ambiguity by
re-writing the whole grammar without ambiguity.

• Check whether the given grammar G is
ambiguous or not.

Solution:
• From the above grammar String "id + id - id"
can be derived in 2 ways:
First Leftmost derivation

Second Leftmost derivation
Since there are two leftmost derivation for a single string
"id + id - id", the grammar G is ambiguous.

Check whether the given grammar G
is ambiguous or not.

• For the string "aabb" the above grammar can
generate two parse trees
Since there are two parse trees for a single string "aabb", the
grammar G is ambiguous.

Unambiguous Grammar
• A grammar can be unambiguous if the grammar
does not contain ambiguity that means if it does
not contain more than one leftmost derivation or
more than one rightmost derivation or more than
one parse tree for the given input string.
• To convert ambiguous grammar to unambiguous
grammar, we will apply the following rules:
• 1. If the left associative operators (+, -, *, /) are
used in the production rule, then apply left
recursion in the production rule.
• Left recursion means that the leftmost symbol on
the right side is the same as the non-terminal on
the left side. For example,

2. If the right associative operates(^) is used in the production
rule then apply right recursion in the production rule.
Right recursion means that the rightmost symbol on the left
side is the same as the non-terminal on the right side.
For example,

Pushdown Automata(PDA)
• Pushdown automata is a way to implement a CFG
in the same way we design DFA for a regular
grammar.
• A DFA can remember a finite amount of
information, but a PDA can remember an infinite
amount of information.
• Pushdown automata is simply an NFA augmented
with an "external stack memory".
• The addition of stack is used to provide a last-in-
first-out memory management capability to
Pushdown automata.
• Pushdown automata can store an unbounded
amount of information on the stack.

• It can access a limited amount of information
on the stack.
• A PDA can push an element onto the top of
the stack and pop off an element from the top
of the stack.
• To read an element into the stack, the top
elements must be popped off and are lost
• A PDA is more powerful than FA.
• Any language which can be acceptable by FA
can also be acceptable by PDA.
• PDA also accepts a class of language which
even cannot be accepted by FA.
• Thus PDA is much more superior to FA.

Formal definition of PDA:
• The PDA can be defined as a collection of 7
components:
• Q: the finite set of states
• ∑: the input set
• Γ: a stack symbol which can be pushed and
popped from the stack
• q0: the initial state
• Z: a start symbol which is in Γ.
• F: a set of final states
• δ: mapping function which is used for moving
from current state to next state

Instantaneous Description (ID)
• ID is an informal notation of how a PDA
computes an input string and make a decision
that string is accepted or rejected.
• An instantaneous description is a triple (q, w,
α) where:
• q describes the current state.
• w describes the remaining input.
• α describes the stack contents, top at the left.

Turnstile Notation:
• ⊢ sign describes the turnstile notation and
represents one move.
• ⊢* sign describes a sequence of moves.
• For example,
• (p, b, T) ⊢ (q, w, α)
• In the above example, while taking a
transition from state p to q, the input symbol
'b' is consumed, and the top of the stack 'T' is
represented by a new string α.

Example 1
• Design a PDA for accepting a language
{anb2n | n>=1}.
• Solution: In this language, n number of a's
should be followed by 2n number of b's.
Hence, we will apply a very simple logic, and
that is if we read single 'a', we will push two
a's onto the stack.
• As soon as we read 'b' then for every single 'b'
only one 'a' should get popped from the stack.

• The ID can be constructed as follows:
• Now when we read b, we will change the state
from q0 to q1 and start popping
corresponding 'a'.
• Hence,

• Thus this process of popping 'b' will be
repeated unless all the symbols are read.
• Note that popping action occurs in state q1
only.

• After reading all b's, all the corresponding a's
should get popped.
• Hence when we read ε as input symbol then
there should be nothing in the stack.
• Hence the move will be:

• Where
• PDA = ({q0, q1, q2}, {a, b}, {a, Z}, δ, q0, Z, {q2})
• We can summarize the ID as:

Design a PDA for accepting a language
{0n1m0n | m, n>=1}.
• In this PDA, n number of 0's are followed by
any number of 1's followed n number of 0's.
Hence the logic for design of such PDA will be
as follows:
• Push all 0's onto the stack on encountering
first 0's.
• Then if we read 1, just do nothing.
• Then read 0, and on each read of 0, pop one 0
from the stack.

• This scenario can be written in the ID form
as:

PDA Acceptance
• A language can be accepted by Pushdown
automata using two approaches:
1. Acceptance by Final State: The PDA is said to
accept its input by the final state if it enters
any final state in zero or more moves after
reading the entire input.
• Let P =(Q, ∑, Γ, δ, q0, Z, F) be a PDA.
• The language acceptable by the final state can
be defined as:

• 2. Acceptance by Empty Stack:
• On reading the input string from the initial
configuration for some PDA, the stack of PDA
gets empty.
• Let P =(Q, ∑, Γ, δ, q0, Z, F) be a PDA.
• The language acceptable by empty stack can
be defined as:

Non-deterministic Pushdown
Automata
• The non-deterministic pushdown automata is
very much similar to NFA.
• We will discuss some CFGs which accepts
NPDA.
• The CFG which accepts deterministic PDA
accepts non-deterministic PDAs as well.
• Similarly, there are some CFGs which can be
accepted only by NPDA and not by DPDA.
Thus NPDA is more powerful than DPDA.

• Suppose the language consists of string L = {aba, aa,
bb, bab, bbabb, aabaa, ......].
• The string can be odd palindrome or even palindrome.
• The logic for constructing PDA is that we will push a
symbol onto the stack till half of the string then we will
read each symbol and then perform the pop operation.
• We will compare to see whether the symbol which is
popped is similar to the symbol which is read.
• Whether we reach to end of the input, we expect the
stack to be empty.
• This PDA is a non-deterministic PDA because finding
the mid for the given string and reading the string from
left and matching it with from right (reverse) direction
leads to non-deterministic moves.
• Here is the ID.

• Here, we need to maintain the order of a’s, b’s
and c’s.
• That is, all the a’s are are coming first and then
all the b’s and then c’s are coming.
• Thus, we need a stack along with the state
diagram.
• The count of a’s and c’s is maintained by the
stack.
• The number of a’s is exactly equal to the
number of c’s
• We will take 2 stack alphabets:

Representation of State Transition

Construct a PDA for language L =
{0n1m2m3n | n>=1, m>=1}
• Approach used in this PDA –
First 0’s are pushed into stack.
• Then 1’s are pushed into stack.
Then for every 2 as input a 1 is popped out of
stack.
• If some 2’s are still left and top of stack is a 0 then
string is not accepted by the PDA. Thereafter if 2’s
are finished and top of stack is a 0 then for every
3 as input equal number of 0’s are popped out of
stack.
• If string is finished and stack is empty then string
is accepted by the PDA otherwise not accepted.

• Step-1: On receiving 0 push it onto stack. On
receiving 1, push it onto stack and goto next state
• Step-2: On receiving 1 push it onto stack. On
receiving 2, pop 1 from stack and goto next state
• Step-3: On receiving 2 pop 1 from stack. If all the
1’s have been popped out of stack and now
receive 3 then pop a 0 from stack and goto next
state
• Step-4: On receiving 3 pop 0 from stack. If input
is finished and stack is empty then goto last state
and string is accepted

Approach used in the construction of
PDA
• In designing a NPDA, for every ‘a’ comes
before ‘b’. If ‘b’ comes then

• So that the stack becomes empty.
• If stack is empty then we can say that the
string is accepted by the PDA.

Computable and non-computable
problems
• Computable Problems –
You are familiar with many problems (or
functions) that are computable (or decidable),
meaning there exists some algorithm that
computes an answer (or output) to any
instance of the problem (or for any input to
the function) in a finite number of simple
steps.

A simple example is the integer increment operation:
It should be intuitive that given any integer x, we can
compute x + 1 in a finite number of steps.
Since x is finite, it may be represented by a finite string
of digits.
Using the addition method (or algorithm) we all learned
in school, we can clearly compute another string of
digits representing the integer equivalent to x + 1.

• Yet there are also problems and functions that
that are non-computable (or undecidable or
uncomputable), meaning that there exists no
algorithm that can compute an answer or
output for all inputs in a finite number of
simple steps.
• (Undecidable simply means non-computable
in the context of a decision problem, whose
answer (or output) is either “true” or “false”).

• Non-Computable Problems –
A non-computable is a problem for which
there is no algorithm that can be used to solve
it.
• Most famous example of a non-computablity
(or undecidability) is the Halting Problem.
• Given a description of a Turing machine and
its initial input, determine whether the
program, when executed on this input, ever
halts (completes).
• The alternative is that it runs forever without
halting.

• The halting problem is about seeing if a
machine will ever come to a halt when a
certain input is given to it or if it will finish
running.
• This input itself can be something that keeps
calling itself forever which means that it will
cause the program to run forever.

Turing Machine
• Turing machine was invented in 1936 by Alan
Turing.
• It is an accepting device which accepts
Recursive Enumerable Language generated
by type 0 grammar.

• There are various features of the Turing
machine:
1. It has an external memory which remembers
arbitrary long sequence of input.
2. It has unlimited memory capability.
3. The model has a facility by which the input at
left or right on the tape can be read easily.
4. The machine can produce a certain output
based on its input. Sometimes it may be
required that the same input has to be used to
generate the output. So in this machine, the
distinction between input and output has been
removed. Thus a common set of alphabets can
be used for the Turing machine.

Turing Machine in TOC
• Turing Machine was invented by Alan Turing in
1936 and it is used to accept Recursive
Enumerable Languages (generated by Type-0
Grammar).
• A turing machine consists of a tape of infinite
length on which read and writes operation can
be performed.
• The tape consists of infinite cells on which each
cell either contains input symbol or
a special symbol called blank.
• It also consists of a head pointer which points to
cell currently being read and it can move in both
directions.

• A TM is expressed as a 7-tuple (Q, T, B, ∑, δ, q0,
F) where:
• Q is a finite set of states
• T is the tape alphabet (symbols which can be
written on Tape)
• B is blank symbol (every cell is filled with B except
input alphabet initially)
• ∑ is the input alphabet (symbols which are part of
input alphabet)
• δ is a transition function which maps Q × T → Q ×
T × {L,R}.
• Depending on its present state and present tape
alphabet (pointed by head pointer), it will move
to new state, change the tape symbol (may or
may not) and move head pointer to either left or
right.

• q0 is the initial state
• F is the set of final states.
• If any state of F is reached, input string is
accepted.

Variation of Turing Machine
1. Multiple track Turing Machine
• A k-tack Turing machine(for some k>0) has k-
tracks and one R/W head that reads and
writes all of them one by one.
• A k-track Turing Machine can be simulated by
a single track Turing machine

2. Two-way infinite Tape Turing Machine:
• Infinite tape of two-way infinite tape Turing
machine is unbounded in both directions left
and right.
• Two-way infinite tape Turing machine can be
simulated by one-way infinite Turing
machine(standard Turing machine).

3. Multi-tape Turing Machine:
• It has multiple tapes and controlled by a single
head.
• The Multi-tape Turing machine is different
from k-track Turing machine but expressive
power is same.
• Multi-tape Turing machine can be simulated
by single-tape Turing machine.

4. Multi-tape Multi-head Turing Machine:
• The multi-tape Turing machine has multiple
tapes and multiple heads
• Each tape controlled by separate head
• Multi-Tape Multi-head Turing machine can be
simulated by standard Turing machine.

5. Multi-dimensional Tape Turing Machine:
• It has multi-dimensional tape where head can
move any direction that is left, right, up or down.
• Multi dimensional tape Turing machine can be
simulated by one-dimensional Turing machine
6. Multi-head Turing Machine:
• A multi-head Turing machine contain two or
more heads to read the symbols on the same
tape.
• In one step all the heads sense the scanned
symbols and move or write independently.
• Multi-head Turing machine can be simulated by
single head Turing machine.

• 7. Non-deterministic Turing Machine:
• A non-deterministic Turing machine has a
single, one way infinite tape.
• For a given state and input symbol has atleast
one choice to move (finite number of choices
for the next move), each choice several
choices of path that it might follow for a given
input string.
• A non-deterministic Turing machine is
equivalent to deterministic Turing machine.

• Construct a Turing Machine for language
L = {0n1n2n | n≥1}
• The language L = {0n1n2n | n≥1} represents a
kind of language where we use only 3
character, i.e., 0, 1 and 2.
• In the beginning language has some number
of 0’s followed by equal number of 1’s and
then followed by equal number of 2’s.
• Any such string which falls in this category will
be accepted by this language.
• The beginning and end of string is marked by $
sign.

Approach used
• First replace a 0 from front by X, then keep
moving right till you find a 1 and replace this 1 by
Y. Again, keep moving right till you find a 2,
replace it by Z and move left.
• Now keep moving left till you find a X.
• When you find it, move a right, then follow the
same procedure as above.
• A condition comes when you find a X
immediately followed by a Y.
• At this point we keep moving right and keep on
checking that all 1’s and 2’s have been converted
to Y and Z.
• If not then string is not accepted. If we reach $
then string is accepted.

• Step-1:
Replace 0 by X and move right, Go to state Q1.
• Step-2:
Replace 0 by 0 and move right, Remain on same
state
Replace Y by Y and move right, Remain on same
state
Replace 1 by Y and move right, go to state Q2.
• Step-3:
Replace 1 by 1 and move right, Remain on same
state
Replace Z by Z and move right, Remain on same
state
Replace 2 by Z and move right, go to state Q3.

• Step-4:
Replace 1 by 1 and move left, Remain on same state
Replace 0 by 0 and move left, Remain on same state
Replace Z by Z and move left, Remain on same state
Replace Y by Y and move left, Remain on same state
Replace X by X and move right, go to state Q0.
• Step-5:
If symbol is Y replace it by Y and move right and Go to
state Q4
Else go to step 1
• Step-6:
Replace Z by Z and move right, Remain on same state
Replace Y by Y and move right, Remain on same state
If symbol is $ replace it by $ and move left, STRING IS
ACCEPTED, GO TO FINAL STATE Q5

Turing Machine for addition
• A number is represented in binary format in
different finite automatas like 5 is represented as
(101) but in case of addition using a turing
machine unary format is followed.
• In unary format a number is represented by
either all ones or all zeroes.
• For example, 5 will be represented by a sequence
of five zeroes or five ones. 5 = 1 1 1 1 1 or 0 0 0 0
0. Lets use zeroes for representation.
• For adding 2 numbers using a Turing machine,
both these numbers are given as input to the
Turing machine separated by a “c”.

• Examples – (2 + 3) will be given as 0 0 c 0 0 0:

Approach
• Convert a 0 in the first number in to X and
then traverse entire input and convert the first
blank encountered into 0.
• Then move towards left ignoring all 0’s and
“c”.
• Come the position just next to X and then
repeat the same procedure till the time we get
a “c” instead of X on returning.
• Convert the c into blank and addition is
completed.

• We have to perform left shift so get addition
of two numbers.

• Deterministic and Non-Deterministic Turing
Machines:
• In deterministic turing machine, there is only
one move from every state on every input
symbol but in Non-Deterministic turing
machine, there can be more than one move
from one state for an input symbol.

Construct a Turing Machine for
language L = {wwr | w ∈ {0, 1}}
• The language L = {wwr | w ∈ {0, 1}} represents a kind of
language where you use only 2 character, i.e., 0 and 1.
• The first part of language can be any string of 0 and 1.
• The second part is the reverse of the first part.
Combining both these parts out string will be formed.
Any such string which falls in this category will be
accepted by this language.
• The beginning and end of string is marked by $ sign.
• For example, if first part w = 1 1 0 0 1 then second part
wr = 1 0 0 1 1. It is clearly visible that wr is the reverse
of w, so the string 1 1 0 0 1 1 0 0 1 1 is a part of given
language.

• Assumption: We will replace 0 by Y and 1 by X.
• Approach Used –
First check the first symbol, if it’s 0 then replace it
by Y and by X if it’s 1.
• Then go to the end of string.
• So last symbol is same as first.
• We replace it also by X or Y depending on it.
Now again come back to the position next to the
symbol replace from the starting and repeat the
same process as told above.
• One important thing to note is that since wr is
reverse of w of both of them will have equal
number of symbols.
• Every time replace a nth symbol from beginning
of string, replace a corresponding nth symbol
from the end

2011
When a problem is called decidable?(2 marks)

Halting Problem in Theory of
Computation
• Decidable Problems
A problem is decidable if we can construct a
Turing machine which will halt in finite
amount of time for every input and give
answer as ‘yes’ or ‘no’.
• A decidable problem has an algorithm to
determine the answer for a given input.

Examples
• Equivalence of two regular languages: Given
two regular languages, there is an algorithm
and Turing machine to decide whether two
regular languages are equal or not.
• Finiteness of regular language: Given a
regular language, there is an algorithm and
Turing machine to decide whether regular
language is finite or not.
• Emptiness of context free language: Given a
context free language, there is an algorithm
whether CFL is empty or not.

Undecidable Problems
• A problem is undecidable if there is no Turing
machine which will always halt in finite
amount of time to give answer as ‘yes’ or ‘no’.
• An undecidable problem has no algorithm to
determine the answer for a given input.

Examples
• Ambiguity of context-free languages: Given a context-
free language, there is no Turing machine which will
always halt in finite amount of time and give answer
whether language is ambiguous or not.
• Equivalence of two context-free languages: Given two
context-free languages, there is no Turing machine
which will always halt in finite amount of time and give
answer whether two context free languages are equal
or not.
• Everything or completeness of CFG: Given a CFG and
input alphabet, whether CFG will generate all possible
strings of input alphabet (∑*)is undecidable.
• Regularity of CFL, CSL, REC and REC: Given a CFL, CSL,
REC or REC, determining whether this language is
regular is undecidable.

• A semi-decidable problem is subset of
undecidable problems for which Turing
machine will always halt in finite amount of
time for answer as ‘yes’ and may or may not
halt for answer as ‘no’.
• Relationship between semi-decidable and
decidable problem has been shown in Figure 1
as:

Rice’s Theorem
• Every non-trivial (answer is not known)
problem on Recursive Enumerable languages
is undecidable. e.g.;
• If a language is Recursive Enumerable, its
complement will be recursive enumerable or
not is undecidable.

Reducibility and Undecidability
• Language A is reducible to language B
(represented as A≤B) if there exists a function
f which will convert strings in A to strings in B
as:
• w ɛ A <=> f(w) ɛ B
• Theorem 1: If A≤B and B is decidable then A is
also decidable.
Theorem 2: If A≤B and A is undecidable then B
is also undecidable.

• Turing machine –
A Turing machine is a mathematical model of
computation.
• A Turing machine is a general example of a
CPU that controls all data manipulation done
by a computer.
• Turing machine can be halting as well as non
halting and it depends on algorithm and
input associated with the algorithm.

2011
When a problem is classified as undecidable?
Whether halting problem is decidable problem
or undecidable? justify your answer (5 marks)

• The Halting problem – Given a
program/algorithm will ever halt or not?
Halting means that the program on certain
input will accept it and halt or reject it and
halt and it would never go into an infinite
loop.
• Basically halting means terminating.
• So can we have an algorithm that will tell that
the given program will halt or not.
• In terms of Turing machine, will it terminate
when run on some machine with some
particular given input string.

• The answer is no we cannot design a
generalized algorithm which can
appropriately say that given a program will
ever halt or not?
The only way is to run the program and check
whether it halts or not.
We can refrain the halting problem question
in such a way also:
• Given a program written in some
programming language(c/c++/java) will it ever
get into an infinite loop(loop never stops) or
will it always terminate(halt)?

• This is an undecidable problem because we
cannot have an algorithm which will tell us
whether a given program will halt or not in a
generalized way i.e by having specific
program/algorithm.
• In general we can’t always know that’s why we
can’t have a general algorithm.
• The best possible way is to run the program
and see whether it halts or not.
• In this way for many programs we can see that
it will sometimes loop and always halt.

• Proof by Contradiction –
Problem statement: Can we design a machine
which if given a program can find out if that
program will always halt or not halt on a
particular input?
• Solution: Let us assume that we can design
that kind of machine called as HM(P, I) where
HM is the machine/program, P is the program
and I is the input.
• On taking input the both arguments the
machine HM will tell that the program P either
halts or not..

If we can design such a program this allows us to
write another program we call this program CM(X)
where X is any program(taken as argument) and
according to the definition of the program CM(X)
shown in the figure

• In the program CM(X) we call the function
HM(X), which we have already defined and to
HM() we pass the arguments (X, X), according
to the definition of HM() it can take two
arguments i.e one is program and another is
the input.
• Now in the second program we pass X as a
program and X as input to the function HM().
• We know that the program HM() gives two
output either “Halt” or “Not Halt”.
• But in case second program, when HM(X, X)
will halt loop body tells to go in loop and
when it doesn’t halt that means loop, it is
asked to return.

• Now we take one more situation where the
program CM is passed to CM() function as an
argument.
• Then there would be some impossibility, i.e., a
condition arises which is not possible.

• It is impossible for outer function to halt if its
code (inner body) is in loop and also it is
impossible for outer non halting function to
halt even after its inner code is halting.
• So the both condition is non halting for CM
machine/program even we had assumed in
the beginning that it would halt.
• So this is the contradiction and we can say
that our assumption was wrong and this
problem, i.e., halting problem is undecidable.
• This is how we proved that halting problem is
undecidable.

Non-Deterministic Turing Machine
• In a Non-Deterministic Turing Machine, for every
state and symbol, there are a group of actions the
TM can have.
• So, here the transitions are not deterministic.
• The computation of a non-deterministic Turing
Machine is a tree of configurations that can be
reached from the start configuration.
• An input is accepted if there is at least one node
of the tree which is an accept configuration,
otherwise it is not accepted.
• If all branches of the computational tree halt on
all inputs, the non-deterministic Turing Machine
is called a Decider and if for some input, all
branches are rejected, the input is also rejected.

NP-Completeness
• Can all computational problems be solved by a
computer?
• There are computational problems that can not be
solved by algorithms even with unlimited time.
• For example Turing Halting problem (Given a program
and an input, whether the program will eventually halt
when run with that input, or will run forever).
• Alan Turing proved that general algorithm to solve the
halting problem for all possible program-input pairs
cannot exist.
• A key part of the proof is, Turing machine was used as a
mathematical definition of a computer and program
(Source Halting Problem).

• Status of NP Complete problems is another
failure story, NP complete problems are
problems whose status is unknown.
• No polynomial time algorithm has yet been
discovered for any NP complete problem, nor
has anybody yet been able to prove that no
polynomial-time algorithm exist for any of
them.
• The interesting part is, if any one of the NP
complete problems can be solved in
polynomial time, then all of them can be
solved.

• What are NP, P, NP-complete and NP-
Hard problems?
P is set of problems that can be solved by a
deterministic Turing machine in Polynomial time.
• NP is set of decision problems that can be solved
by a Non-deterministic Turing Machine
in Polynomial time.
• P is subset of NP (any problem that can be solved
by deterministic machine in polynomial time can
also be solved by non-deterministic machine in
polynomial time).
Informally, NP is set of decision problems which
can be solved by a polynomial time via a “Lucky
Algorithm”, a magical algorithm that always
makes a right guess among the given set of
choices (Source Ref 1).

• NP-complete problems are the hardest
problems in NP set.
• A decision problem L is NP-complete if:
1) L is in NP (Any given solution for NP-
complete problems can be verified quickly, but
there is no efficient known solution).
2) Every problem in NP is reducible to L in
polynomial time (Reduction is defined below).
• A problem is NP-Hard if it follows property 2
mentioned above, doesn’t need to follow
property 1. Therefore, NP-Complete set is also
a subset of NP-Hard set.

Decision vs Optimization Problems
• NP-completeness applies to the realm of decision
problems.
• It was set up this way because it’s easier to compare the
difficulty of decision problems than that of optimization
problems.
• In reality, though, being able to solve a decision problem in
polynomial time will often permit us to solve the
corresponding optimization problem in polynomial time
(using a polynomial number of calls to the decision
problem). So, discussing the difficulty of decision problems
is often really equivalent to discussing the difficulty of
optimization problems. (Source Ref 2).
For example, consider the vertex cover problem (Given a
graph, find out the minimum sized vertex set that covers all
edges). It is an optimization problem. Corresponding
decision problem is, given undirected graph G and k, is
there a vertex cover of size k?

What is Reduction?
• Let L1 and L2 be two decision problems.
Suppose algorithm A2 solves L2.
• That is, if y is an input for L2 then algorithm
A2 will answer Yes or No depending upon
whether y belongs to L2 or not.
• The idea is to find a transformation from L1 to
L2 so that the algorithm A2 can be part of an
algorithm A1 to solve L1.

What was the first problem proved as
NP-Complete?
• There must be some first NP-Complete problem
proved by definition of NP-Complete problems.
• SAT (Boolean satisfiability problem) is the first
NP-Complete problem proved by Cook.
• It is always useful to know about NP-
Completeness even for engineers.
• Suppose you are asked to write an efficient
algorithm to solve an extremely important
problem for your company.
• After a lot of thinking, you can only come up
exponential time approach which is impractical.

• If you don’t know about NP-Completeness,
you can only say that I could not come with an
efficient algorithm.
• If you know about NP-Completeness and
prove that the problem as NP-complete, you
can proudly say that the polynomial time
solution is unlikely to exist.
• If there is a polynomial time solution possible,
then that solution solves a big problem of
computer science many scientists have been
trying for years.

Parsing | Set 1 (Introduction,
Ambiguity and Parsers)
• Role of the parser :
In the syntax analysis phase, a compiler verifies
whether or not the tokens generated by the
lexical analyzer are grouped according to the
syntactic rules of the language.
This is done by a parser.
The parser obtains a string of tokens from the
lexical analyzer and verifies that the string can be
the grammar for the source language.
It detects and reports any syntax errors and
produces a parse tree from which intermediate
code can be generated

• Before going to types of parsers we will
discuss on some ideas about the some
important things required for understanding
parsing.

Ambiguity in Context free Grammar
and Context free Languages
• Suppose we have a context free grammar G
with production rules:
• S->aSb|bSa|SS|ℇ

• Left most derivation (LMD) and Derivation
Tree:
• Leftmost derivation of a string from starting
symbol S is done by replacing leftmost non-
terminal symbol by RHS of corresponding
production rule.
• For example: The leftmost derivation of string
abab from grammar G above is done as:

Derivation tree
• It tells how string is derived using production
rules from S and has been shown in Figure 1.

• Right most derivation (RMD):
• Rightmost derivation of a string from staring
symbol S is done by replacing rightmost non-
terminal symbol by RHS of corresponding
production rule.
• For Example: The rightmost derivation of
string abab from grammar G
• S->aSb|bSa|SS|ℇ above is done as:

• Ambiguous Context Free Grammar:
• A context free grammar is called ambiguous if
there exists more than one LMD or more than
one RMD for a string which is generated by
grammar.
• There will also be more than one derivation
tree for a string in ambiguous grammar.
• The grammar described above is ambiguous
because there are two derivation trees (Figure
1 and Figure 2).
• There can be more than one RMD for string
abab which are:

Ambiguous Context Free Languages:
A context free language is called ambiguous if there
is no unambiguous grammar to define that language
and it is also called inherently ambiguous Context
Free Languages.

• Note:
• If a context free grammar G is ambiguous,
language generated by grammar L(G) may or may
not be ambiguous
• It is not always possible to convert ambiguous
CFG to unambiguous CFG.
• Only some ambiguous CFG can be converted to
unambiguous CFG.
• There is no algorithm to convert ambiguous CFG
to unambiguous CFG.
• There always exist a unambiguous CFG
corresponding to unambiguous CFL.
• Deterministic CFL are always unambiguous.

Ambiguity
• A grammar that produces more than one
parse tree for some sentence is said to be
ambiguous.
Eg- consider a grammar
S -> aS | Sa | a
Now for string aaa we will have 4 parse trees,
hence ambiguous

Removing Left Recursion
• A grammar is left recursive if it has a non
terminal (variable) S such that their is a
derivation
S -> Sα | β
where α ?(V+T)* and β ?(V+T)* (sequence of
terminals and non terminals that do not start
with S)
Due to the presence of left recursion some
top down parsers enter into infinite loop so
we have to eliminate left recursion.

• Let the productions is of the form A -> Aα1 |
Aα2 | Aα3 | ….. | Aαm | β1 | β2 | …. | βn
Where no βi begins with an A .
• then we replace the A-productions by
A -> β1 A’ | β2 A’ | ….. | βn A’
A’ -> α1A’ | α2A’ | α3A’| ….. | αmA’ | ε
The nonterminal A generates the same strings
as before but is no longer left recursive.

Removing Left Factoring
• A grammar is said to be left factored when it is of the
form –
A -> αβ1 | αβ2 | αβ3 | …… | αβn | γ i.e the productions
start with the same terminal (or set of terminals).
• On seeing the input α we cannot immediately tell
which production to choose to expand A.
Left factoring is a grammar transformation that is
useful for producing a grammar suitable for predictive
or top down parsing.
• When the choice between two alternative A-
productions is not clear, we may be able to rewrite the
productions to defer the decision until enough of the
input has been seen to make the right choice.

• The process of deriving the string from the
given grammar is known as derivation
(parsing).
Depending upon how derivation is done we
have two kinds of parsers :-
• Top Down Parser
• Bottom Up Parser

Top Down Parser
Top down parsing attempts to build the parse
tree from root to leaf.
Top down parser will start from start symbol and
proceeds to string.
It follows leftmost derivation.
In leftmost derivation, the leftmost non-terminal
in each sentential is always chosen.

Classification of Context Free
Grammars
• Context Free Grammars (CFG) can be classified
on the basis of following two properties:
1) Based on number of strings it generates.
• If CFG is generating finite number of strings,
then CFG is Non-Recursive (or the grammar is
said to be Non-recursive grammar)
• If CFG can generate infinite number of strings
then the grammar is said to
be Recursive grammar

• During Compilation, the parser uses the
grammar of the language to make a parse
tree(or derivation tree) out of the source
code.
• The grammar used must be unambiguous.
• An ambiguous grammar must not be used for
parsing.

Examples of Recursive and Non-
Recursive Grammars

2) Based on number of derivation trees.
• If there is only 1 derivation tree then the CFG
is unambiguous.
• If there are more than 1 derivation tree, then
the CFG is ambiguous.

Chomsky's Normal Form (CNF)
• CNF stands for Chomsky normal form. A
CFG(context free grammar) is in CNF(Chomsky
normal form) if all production rules satisfy one
of the following conditions:
• Start symbol generating ε. For example, A → ε.
• A non-terminal generating two non-terminals.
For example, S → AB.
• A non-terminal generating a terminal. For
example, S → a.

For example
The production rules of Grammar G1 satisfy the
rules specified for CNF, so the grammar G1 is in CNF.
However, the production rule of Grammar G2 does
not satisfy the rules specified for CNF as S → aZ
contains terminal followed by non-terminal.
So the grammar G2 is not in CNF.

Construction of LL(1) Parsing Table
• A top-down parser builds the parse tree from
the top down, starting with the start non-
terminal. There are two types of Top Down
Parsers:
1. Top Down Parser with Backtracking
2. Top Down Parsers without Backtracking
• Top Down Parsers without Backtracking can
further be divided into two parts:

LL(1) Parsing
• Here the 1st L represents that the scanning of
the Input will be done from Left to Right
manner and second L shows that in this
Parsing technique we are going to use Left
most Derivation Tree. and finally
the 1 represents the number of look ahead,
means how many symbols are you going to
see when you want to make a decision.

Construction of LL(1) Parsing Table:
• To construct the Parsing table, we have two
functions:
1: First(): If there is a variable, and from that
variable if we try to drive all the strings then
the beginning Terminal Symbol is called the
first.
2: Follow(): What is the Terminal Symbol which
follow a variable in the process of derivation.

• Now, after computing the First and Follow set
for each Non-Terminal symbol we have to
construct the Parsing table.
• In the table Rows will contain the Non-
Terminals and the column will contain the
Terminal Symbols.
• All the Null Productions of the Grammars will
go under the Follow elements and the
remaining productions will lie under the
elements of First set.

• FOLLOW Set FOLLOW(E) = { $ , ) } // Note ')' is
there because of 5th rule
• FOLLOW(E’) = FOLLOW(E) = { $, ) } // See 1st
production rule
• FOLLOW(T) = { FIRST(E’) – Є } U FOLLOW(E’) U
FOLLOW(E) = { + , $ , ) }
• FOLLOW(T’) = FOLLOW(T) = { + , $ , ) }
• FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T’) U
FOLLOW(T) = { *, +, $, ) }

• Є as a FOLLOW doesn’t mean anything (Є is an
empty string).
• $ is called end-marker, which represents the
end of the input string, hence used while
parsing to indicate that the input string has
been completely processed.
• The grammar used above is Context-Free
Grammar (CFG). The syntax of a programming
language can be specified using CFG.
• CFG is of the form A -> B , where A is a single
Non-Terminal, and B can be a set of grammar
symbols ( i.e. Terminals as well as Non-
Terminals)

SLR, CLR and LALR Parsers
• SLR Parser
The SLR parser is similar to LR(0) parser except
that the reduced entry.
• The reduced productions are written only in
the FOLLOW of the variable whose production
is reduced.
•

CLR PARSER
• In the SLR method we were working with LR(0))
items.
• In CLR parsing we will be using LR(1) items. LR(k)
item is defined to be an item using lookaheads of
length k.
• So , the LR(1) item is comprised of two parts : the
LR(0) item and the look a head associated with
the item.
• LR(1) parsers are more powerful parser.
For LR(1) items we modify the Closure and GOTO
function.

• Note – if a state has two reductions and both
have same lookahead then it will in multiple
entries in parsing table thus a conflict.
• If a state has one reduction and their is a shift
from that state on a terminal same as the
lookahead of the reduction then it will lead to
multiple entries in parsing table thus a
conflict.

LALR PARSER
• LALR parser are same as CLR parser with one
difference.
• In CLR parser if two states differ only in
lookahead then we combine those states in
LALR parser.
• After minimisation if the parsing table has no
conflict that the grammar is LALR also.
Eg:

Pumping Lemma
• There are two Pumping Lemmas, which are
defined for
1. Regular Languages, and

Pumping Lemma for Regular
Languages
• For any regular language L, there exists an
integer n, such that for all x ∈ L with |x| ≥ n,
there exists u, v, w ∈ Σ∗, such that x = uvw,
and
(1) |uv| ≤ n
(2) |v| ≥ 1
(3) for all i ≥ 0: uviw ∈ L
• In simple terms, this means that if a string v is
‘pumped’, i.e., if v is inserted any number of
times, the resultant string still remains in L.

• Pumping Lemma is used as a proof for
irregularity of a language.
• Thus, if a language is regular, it always satisfies
pumping lemma.
• If there exists at least one string made from
pumping which is not in L, then L is surely not
regular.
• The opposite of this may not always be true.
That is, if Pumping Lemma holds, it does not
mean that the language is regular.

• For example, let us prove L01 = {0n1n | n ≥ 0} is
irregular.
• Let us assume that L is regular, then by Pumping
Lemma the above given rules follow.
Now, let x ∈ L and |x| ≥ n.
• So, by Pumping Lemma, there exists u, v, w such
that (1) – (3) hold.
• We show that for all u, v, w, (1) – (3) does not
hold.
If (1) and (2) hold then x = 0n1n = uvw with |uv| ≤
n and |v| ≥ 1.
So, u = 0a, v = 0b, w = 0c1n where : a + b ≤ n, b ≥ 1,
c ≥ 0, a + b + c = n
But, then (3) fails for i = 0
uv0w = uw = 0a0c1n = 0a + c1n ∉ L, since a + c ≠ n.

Post Correspondence Problem
• In this section, we will discuss the undecidability of
string and not of Turing machines.
• The undecidability of the string is determined with the
help of Post's Correspondence Problem (PCP).
• Let us define the PCP.
• "The Post's correspondence problem consists of two
lists of string that are of equal length over the input.
The two lists are A = w1, w2, w3, .... , wn and B = x1, x2,
x3, .... xn then there exists a non empty set of integers
i1, i2, i3, .... , in such that,
w1, w2, w3, .... wn = x1, x2, x3, .... xn"
• To solve the post correspondence problem we try all
the combinations of i1, i2, i3, .... , in to find the w1 = x1
then we say that PCP has a solution.

• Example 1:
• Consider the correspondence system as given
below
• A = (b, bab3, ba) and B = (b3, ba, a). The input
set is ∑ = {0, 1}. Find the solution.
• Solution:
• A solution is 2, 1, 1, 3. That means
w2w1w1w3 = x2x1x1x3
• The constructed string from both lists is
bab3b3a.
• Note : Top and bottom should be same

• Example 2:
• Does PCP with two lists x = (b, a, aba, bb) and
y = (ba, ba, ab, b) have a solution?
• Solution: Now we have to find out such a
sequence that strings formed by x and y are
identical.
• Such a sequence is 1, 2, 1, 3, 3, 4.
• Hence from x and y list

Recursive and Recursive Enumerable
Languages in TOC
Recursive Enumerable (RE) or Type -0 Language
RE languages or type-0 languages are generated by
type-0 grammars.
An RE language can be accepted or recognized by
Turing machine which means it will enter into
final state for the strings of language and may or
may not enter into rejecting state for the strings
which are not part of the language.
It means TM can loop forever for the strings which
are not a part of the language.
RE languages are also called as Turing recognizable
languages.

Recursive Language (REC)
• A recursive language (subset of RE) can be
decided by Turing machine which means it will
enter into final state for the strings of language
and rejecting state for the strings which are not
part of the language. e.g.; L= {anbncn|n>=1} is
recursive because we can construct a turing
machine which will move to final state if the
string is of the form anbncn else move to non-final
state.
• So the TM will always halt in this case. REC
languages are also called as Turing decidable
languages.
• The relationship between RE and REC languages
can be shown in Figure 1

Closure Properties of Recursive
Languages
• Union: If L1 and If L2 are two recursive
languages, their union L1∪L2 will also be
recursive because if TM halts for L1 and halts
for L2, it will also halt for L1∪L2.
• Concatenation: If L1 and If L2 are two
recursive languages, their concatenation L1.L2
will also be recursive. For Example:

L1 says n no. of a’s followed by n no. of b’s followed by n no.
of c’s.
L2 says m no. of d’s followed by m no. of e’s followed by m
no. of f’s.
Their concatenation first matches no. of a’s, b’s and c’s and
then matches no. of d’s, e’s and f’s.
So it can be decided by TM.

• Kleene Closure:
• If L1is recursive, its kleene closure L1* will also
be recursive. For Example:

• Intersection and complement:
• If L1 and If L2 are two recursive languages,
their intersection L1 ∩ L2 will also be
recursive.
• For Example:

• L1 says n no. of a’s followed by n no. of b’s
followed by n no. of c’s and then any no. of
d’s.
• L2 says any no. of a’s followed by n no. of b’s
followed by n no. of c’s followed by n no. of
d’s.
• Their intersection says n no. of a’s followed by
n no. of b’s followed by n no. of c’s followed by
n no. of d’s.
• So it can be decided by turing machine, hence
recursive.
Similarly, complement of recursive language
L1 which is ∑*-L1, will also be recursive.

Greibach Normal Form (GNF)
• GNF stands for Greibach normal form.
• A CFG(context free grammar) is in GNF(Greibach
normal form) if all the production rules satisfy
one of the following conditions:
• A start symbol generating ε. For example, S → ε.
• A non-terminal generating a terminal. For
example, A → a.
• A non-terminal generating a terminal which is
followed by any number of non-terminals.
• For example, S → aASB.

For example
The production rules of Grammar G1 satisfy the
rules specified for GNF, so the grammar G1 is in
GNF.
However, the production rule of Grammar G2 does
not satisfy the rules specified for GNF as A → ε and
B → ε contains ε(only start symbol can generate ε).
So the grammar G2 is not in GNF

Simplifying Context Free Grammars
• The definition of context free grammars
(CFGs) allows us to develop a wide variety of
grammars.
• Most of the time, some of the productions of
CFGs are not useful and are redundant.
• This happens because the definition of CFGs
does not restrict us from making these
redundant productions.

• By simplifying CFGs we remove all these
redundant productions from a grammar ,
while keeping the transformed grammar
equivalent to the original grammar.
• Two grammars are called equivalent if they
produce the same language.
• Simplifying CFGs is necessary to later convert
them into Normal forms.

• Types of redundant productions and the
procedure of removing them are mentioned
below.
1. Useless productions –
• The productions that can never take part in
derivation of any string , are called useless
productions.
• Similarly , a variable that can never take part
in derivation of any string is called a useless
variable.

In the example above , production ‘C -> dc’ is
useless because the variable ‘C’ will never occur in
derivation of any string.
The other productions are written in such a way that
variable ‘C’ can never reached from the starting
variable ‘S’.

• Production ‘B ->aB’ is also useless because
there is no way it will ever terminate .
• If it never terminates , then it can never
produce a string.
• Hence the production can never take part in
any derivation.
• To remove useless productions , we first find
all the variables which will never lead to a
terminal string such as variable ‘B’.
• We then remove all the productions in which
variable ‘B’ occurs.

We then try to identify all the variables that can
never be reached from the starting variable such as
variable ‘C’.
We then remove all the productions in which
variable ‘C’ occurs.

2. λ productions
• The productions of type ‘A -> λ’ are called λ
productions ( also called lambda productions
and null productions) .
• These productions can only be removed from
those grammars that do not generate λ (an
empty string).
• It is possible for a grammar to contain null
productions and yet not produce an empty
string.

• To remove null productions , we first have to
find all the nullable variables.
• A variable ‘A’ is called nullable if λ can be
derived from ‘A’.
• For all the productions of type ‘A -> λ’ , ‘A’ is a
nullable variable.
• For all the productions of type ‘B -> A1A2…An
‘ , where all ’Ai’s are nullable variables , ‘B’ is
also a nullable variable.

• After finding all the nullable variables, we can
now start to construct the null production free
grammar.
• For all the productions in the original
grammar , we add the original production as
well as all the combinations of the production
that can be formed by replacing the nullable
variables in the production by λ.
• If all the variables on the RHS of the
production are nullable , then we do not add
‘A -> λ’ to the new grammar

• An example will make the point clear.
• Consider the grammar

• Lets first find all the nullable variables.
Variables ‘B’ and ‘C’ are clearly nullable
because they contain ‘λ’ on the RHS of their
production.
• Variable ‘A’ is also nullable because in (2) ,
both variables on the RHS are also nullable.
• Similarly , variable ‘S’ is also nullable.
• So variables ‘S’ , ‘A’ , ‘B’ and ‘C’ are nullable
variables

• Lets create the new grammar.
• We start with the first production.
• Add the first production as it is.
• Then we create all the possible combinations
that can can be formed by replacing the
nullable variables with λ.
• Therefore line (1) now becomes ‘S -> ABCd |
ABd | ACd | BCd | Ad | Bd |Cd | d ’.
• We apply the same rule to line (2) but we do
not add ‘A -> λ’ even though it is a possible
combination.

• We remove all the productions of type ‘V -> λ’.
The new grammar now becomes

3. Unit productions
• The productions of type ‘A -> B’ are called unit
productions.
To create a unit production free grammar
‘Guf’ from the original grammar ‘G’ , we
follow the procedure mentioned below.

• First add all the non-unit productions of ‘G’ in
‘Guf’.
• Then for each variable ‘A’ in grammar ‘G’ , find
all the variables ‘B’ such that ‘A *=> B’.
• Now , for all variables like ‘A ’ and ‘B’, add ‘A ->
x1 | x2 | …xn’ to ‘Guf’ where ‘B -> x1 | x2 |
…xn ‘ is in ‘Guf’ .
• None of the x1 , x2 … xn are single variables
because we only added non-unit productions
in ‘Guf’.
• Hence the resultant grammar is unit
production free.

For eg
Lets add all the non-unit productions of ‘G’ in ‘Guf’. ‘Guf’
now becomes

• Now we find all the variables that satisfy ‘X
*=> Z’.
• These are ‘S *=> A’ , ‘S*=>B’, ‘A *=> B’ and ‘B
*=> A’.
• For ‘A *=> B’ , we add ‘A -> a’ because ‘B ->a’
exists in ‘Guf’.
• ‘Guf’ now becomes

• For ‘B *=> A’ , we add ‘B -> b’ because ‘A -> b’
exists in ‘Guf’.
• The new grammar now becomes
We follow the same step for ‘S *=> A’ and ‘S*=>B’ and finally
get the following grammar

To remove all kinds of productions mentioned
above, first remove the null productions, then
the unit productions and finally , remove the
useless productions.
Following this order is very important to get the
correct result.

How to convert CFG to CNF?
• Step 1. Eliminate start symbol from RHS.
If start symbol S is at the RHS of any
production in the grammar, create a new
production as:
S0->S
where S0 is the new start symbol.
• Step 2. Eliminate null, unit and useless
productions.
If CFG contains null, unit or useless production
rules, eliminate them.

• Step 3. Eliminate terminals from RHS if they
exist with other terminals or non-terminals.
e.g,; production rule X->xY can be
decomposed as:
X->ZY
Z->x
• Step 4. Eliminate RHS with more than two
non-terminals.
e.g,; production rule X->XYZ can be
decomposed as:
X->PZ
P->XY

Convert the given CFG to CNF.
Consider the given grammar G1

• Step 1: We will create a new production S1 →
S, as the start symbol S appears on the RHS.
The grammar will be:

• Step 2: As grammar G1 contains A → ε null
production, its removal from the grammar
yields:

• Now, as grammar G1 contains Unit production
S → B, its removal yield:

• Also remove the unit production S1 → S, its
removal from the grammar yields:

• Step 3:
• In the production rule S0 → aA | Aa, S → aA |
Aa, A → aBB and B → Aa, terminal a exists on
RHS with non-terminals.
• So we will replace terminal a with X:

• Step 4:
• In the production rule A → XBB, RHS has more
than two symbols, removing it from grammar
yield:
Hence, for the given grammar, this is the required CNF.

TOC Introduction

Recommended

Recommended

More Related Content

Similar to TOC Introduction

Similar to TOC Introduction (20)

More from Thapar Institute

More from Thapar Institute (19)

Recently uploaded

Recently uploaded (20)

TOC Introduction