This document discusses regular languages and finite automata. It begins with an overview of regular expressions and the introduction to the theory of finite automata. It then explains that finite automata are used in text processing, compilers, and hardware design. The document goes on to define automata theory and discusses deterministic finite automata (DFA) and nondeterministic finite automata (NFA). It notes that while NFA allow epsilon transitions and multiple state transitions, DFA and NFA have equivalent computational power.
3. Where finite automata are used
Each model in automata theory plays important
roles in several applied areas. Finite
automata are used in text processing, compilers,
and hardware design. Context-free grammar
(CFGs) are used in programming languages and
artificial intelligence. Originally, CFGs
were used in the study of the human languages.
4. Automata theory
Automata theory is the study of abstract
machines and automata, as well as the computational
problems that can be solved using them. It is a theory
in theoretical computer science and discrete mathematics. The
word automata (the plural of automaton) comes from the Greek
word αὐτόματα, which means "self-making".
5. Automata theory
The figure at right illustrates a finite-state machine,
which belongs to a well-known type of automaton. This
automaton consists of states (represented in the figure
by circles) and transitions (represented by arrows). As the
automaton sees a symbol of input, it makes a transition
(or jump) to another state, according to its transition
function, which takes the current state and the recent
symbol as its inputs.
6. 1. Regular Expressions
A regular expression, regex or regexp
(sometimes called a rational expression) is a
sequence of characters that define a search
pattern. Usually such patterns are used by string
searching algorithms for "find" or "find and
replace" operations on strings, or for input
validation.
7. What Regular Expressions Are Exactly - Terminology
Basically, a regular expression is a pattern describing a
certain amount of text. Their name comes from the
mathematical theory on which they are based. You will
usually find the name abbreviated to "regex" or "regexp".
This lecture uses "regex", because it is easy to pronounce
the plural "regexes". On this notes, regular expressions
are highlighted in red as regex.
This first example is actually a perfectly valid regex. It is
the most basic pattern, simply matching the literal
text regex. A “match” is the piece of text, or sequence of
bytes or characters that pattern was found to correspond
to by the regex processing software. Matches are
highlighted in blue here.
8. Regular Expressions
• is a more complex pattern. It describes
a series of letters, digits, dots,
underscores, percentage signs and
hyphens, followed by an at sign,
followed by another series of letters,
digits and hyphens, finally followed by a
single dot and two or more letters. In
other words: this pattern describes
an email address. This also shows the
syntax highlighting applied to regular
expressions on this site. Word
boundaries and quantifiers are blue,
character classes are orange, and
escaped literals are gray.
12. Examples of Σ ∗ for different Σ:
(i) If Σ = {a}, then Σ ∗ contains
ε, a, aa, aaa, aaaa, . . .
(ii) If Σ = {a, b}, then Σ ∗ contains
ε, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa,
bab, bba, bbb, . . .
(iii) If Σ = ∅ (the empty set — the unique set
with no elements), then Σ ∗ = {ε}, the set just
containing the null string.
13. Pattern matching
Slide 14 defines the patterns, or regular
expressions, over an alphabet Σ that we will use.
Each such regular expression, r, represents a whole
set (possibly an infinite set) of strings in Σ ∗ that
match r. The precise definition of this matching
relation is given on Slide 16. It might seem odd to
include a regular expression ∅ that is matched by
no strings at all—but it is technically convenient to
do so. Note that the regular expression ε is in fact
equivalent to ∅ ∗ , in the sense that a string u
matches ∅ ∗ iff it matches ε (iff u = ε)
15. Remark: Binding precedence in regular
expressions
In the definition on Slide 11 we assume implicitly that the
alphabet Σ does not contain the six symbols
ε ∅ ( ) | ∗
Then, concretely speaking, the regular expressions over Σ
form a certain set of strings over the alphabet obtained by
adding these six symbols to Σ. However it makes things more
readable if we adopt a slightly more abstract syntax, dropping
as many brackets as possible and using the convention that
−∗ binds more tightly than − −, binds more tightly than −|−.
So, for example, r|st∗ means (r|s(t) ∗ ), not (r|s)(t) ∗ , or
((r|st))∗ , etc.
17. Pattern matching
The definition of ‘u matches r ∗ ’ on Slide 13 is
equivalent to saying
for some n ≥ 0, u can be expressed as a
concatenation of n strings, u = u1u2 . . . un, where
each ui matches r.
The case n = 0 just means that u = ε (so ε always
matches r ∗ ); and the case n = 1 just means that u
matches r (so any string matching r also matches r ∗
). For example, if Σ = {a, b, c} and r = ab, then the
strings matching r ∗ are
ε, ab, abab, ababab, etc.
18.
19. Some questions about languages
Slide 20 defines the notion of a formal language
over an alphabet. We take a very extensional
view of language: a formal language is
completely determined by the ‘words in the
dictionary’, rather than by any grammatical
rules. Slide 21 gives some important questions
about languages, regular expressions, and the
matching relation between strings and regular
expressions.
22. The answer to question (a) on Slide 21 is ‘yes’.
Algorithms for deciding such pattern-matching
questions make use of finite automata. We will
see this later.
23. 2. Introduction of Finite Automata
Finite Automata(FA) is the simplest machine to
recognize patterns. A Finite Automata consists of
the following :
• Q : Finite set of states.
• ∑ : set of Input Symbols.
• q : Initial state.
• F : set of Final States.
• δ : Transition Function.
Formal specification of machine is
{ Q, ∑, q, F, δ }.
25. FA is characterized into two types:
1) Deterministic Finite Automata (DFA)
DFA consists of 5 tuples {Q, ∑, q, F, δ}. Q : set of
all states. ∑ : set of input symbols. ( Symbols
which machine takes as input ) q : Initial state. (
Starting state of a machine ) F : set of final state.
δ : Transition Function, defined as δ : Q X ∑ -->
Q.
26. Deterministic Finite Automata (DFA)
In a DFA, for a particular input character, the machine
goes to one state only. A transition function is defined
on every state for every input symbol. Also in DFA null
(or ε) move is not allowed, i.e., DFA cannot change
state without any input character.
For example, below DFA with ∑ = {0, 1} accepts all
strings ending with 0.
27. One important thing to note is, there can be
many possible DFAs for a pattern. A DFA with a
minimum number of states is generally
preferred.
Deterministic Finite Automata (DFA)
28. FA is characterized into two types:
2) Nondeterministic Finite Automata(NFA)
NFA is similar to DFA except for the following
additional features:
1. Null (or ε) move is allowed i.e., it can move
forward without reading symbols.
2. Ability to transmit to any number of states for
a particular input.
However, these above features don’t add any
power to NFA. If we compare both in terms of
power, both are equivalent.
29. Nondeterministic Finite
Automata(NFA)
Due to above additional features, NFA has a
different transition function, rest is same as DFA.
• δ: Transition Function (інша функція
переходу)
• δ: Q X (∑ U ϵ ) --> 2 ^ Q.
30. Nondeterministic Finite
Automata(NFA)
As you can see in transition function is for any
input including null (or ε), NFA can go to any
state number of states.
For example, below is a NFA for above problem
31. One important thing to note is, in NFA, if any
path for an input string leads to a final state,
then the input string accepted. For example, in
above NFA, there are multiple paths for input
string “00”. Since, one of the paths leads to a
final state, “00” is accepted by above NFA.
32. Some Important Points:
1. Every DFA is NFA but not vice versa.
2. Both NFA and DFA have same power and each
NFA can be translated into a DFA.
3. There can be multiple final states in both DFA
and NFA.
4. NFA is more of a theoretical concept.
5. DFA is used in Lexical Analysis in Compiler.