Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Regular Expressions
& Regular Languages
slideshare: http://www.slideshare.net/marinasantini1/regular-expressions-and-regul...
Acknowledgements
 Several	
  slides	
  borrowed	
  from	
  Jurafsky	
  and	
  Mar6n	
  
(2009).	
  
 Prac6cal	
  ac6vi6e...
Reading
 Required Reading:
  E&G (2013): Ch. 9 (pp. 252-256)
  Compendium (3): 7.2, 7.3, 8.2.3
  Mats Dahllöf: Reguljä...
Outline
 Regular Expressions
 Regular Languages
 Practical Activities
 (Pumping Lemma)
4
5
Regular Expressions
Definitions
Equivalence to Finite Automata
6
Regular Expressions and Text Searching
 Everybody does it
  Emacs, vi, perl, grep, etc..
 Regular expressions are a c...
7
Example
 Find all the instances of the word “the”
in a text.
  /the/
  /[tT]he/
  /b[tT]heb/
8
Errors
 The process we just went through was
based on two fixing kinds of errors
  Matching strings that we should not...
9
Errors
 Reducing the error rate for an application
often involves two antagonistic efforts:
  Increasing accuracy, or ...
10
REs: What are they?
 Regular expressions describe
languages by an algebra.
Link: https://www.youtube.com/watch?v=eOfMcdeyrMU
11
DFA
12
Converting the regular expression
(a|b)* to a DFA
13
Converting the regular expression
(a*|b*)* to a DFA
14
Converting the regular expression
ab(a|b)* to a DFA
15
Remember Jeff Ullman video?
16
17
Operations on Languages
 REs use three operations:
  union
  concatenation
  Kleene star (*) [cleany star]
Union ∪ (aka: disjunction, OR, |, +)
 The union of languages is the usual
thing, since languages are sets.
 Example: {01...
19
Concatenation: represented by juxtaposition (no punctuation)
or middle dot ( · )
 The concatenation of languages
L and...
20
Kleene Star: represented by an asterisk
aka star (*)
 If L is a language, then L*, the Kleene
star or just “star,” is ...
IMPORTANT!
 FROM NOW ON, LET’S STICK TO THE
FOLLOWING CONVENTIONS (OTHERWISE WE
WILL BE CONFUSED):
  Union ∪ (aka: disju...
22
Precedence of Operators
 Parentheses may be used wherever
needed to influence the grouping of
operators.
 Order of pr...
23
Examples: REs
1.  L(01) = {01}.
2.  L(01+0) = {01, 0}.
3.  L(0(1+0)) = {01, 00}.
  Note order of precedence of
operato...
24
Equivalence of REs and Finite
Automata
 For every RE, there is a finite automaton
that accepts the same language.
 An...
25
Summary
Automata and regular expressions define
exactly the same set of languages: the
regular languages.
REGULAR LANGUAGES
26
27
The Chomsky Hierachy
Regular
(DFA)
Context-
free
(PDA)
Context-
sensitive
(LBA)
Recursively-
enumerable
(TM)
•  Hierarc...
28
Regular Languages
 A language L is regular if it is the
language accepted by some DFA.
  Note: the DFA must accept on...
Only languages that meet the following criteria
are regular languages:
29
  Regular language derive their name from the fact that the
strings they recognize are (in a formal computer science sens...
Formal languages vs regular
languages
 A formal language is a set of strings,
each string composed of symbols from
a fini...
32
But Many Languages are Regular
 They appear in many contexts and have
many useful properties.
How to tell if a language is not regular
 The most common way to prove that a
language is regular is to build a regular
e...
Pumping Lemma
34
Prac6cal	
  Ac6vity	
  1	
  
 The	
  language	
  L	
  contains	
  all	
  strings	
  over	
  the	
  
alphabet	
  {a,b}	
  ...
Practical Activity 1:
Possible Solution
36
Your Solutions
37
In between the concatenation of a
and b there must be 0 or more
unions (disjuctions) of a and b.
Referen...
Practical Activity 2
 Draw a deterministic finite-state automaton
that accepts the following regular expression:
38
( (ab...
Practical Activity 2:
Possible Correct Solution
39
Having the initial state as a final state gives us the empty string as ...
Your solutions (1): when we interpret ”+” as
disjunction, these solutions are wrong because
”c” happens only after ”a” and...
Your solutions (2): same as
previous slide. In addition, here no
final states are shown…
41
Test
these
automata
with the
s...
Practical Activity 3
  Construct a grep regular expression that
matches patterns containing at least one
“ab” followed by...
Practical Activity 3:
Possible Solutions
  grep ‘(ab)+b*’
  [1-9][0-9]{3}
43
Exercises: E&G (2013)
 Övning 9.40
 Optional: as many as you can
 AGer	
  having	
  completed	
  the	
  exercises,	
  
...
The End
45
Upcoming SlideShare
Loading in …5
×

Lecture: Regular Expressions and Regular Languages

13,824 views

Published on

regular expressions, automata, regular languages, pumping lemma

Published in: Education
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Settling for less on valentine's? Then you need VigRX pLUS! ➤➤ https://tinyurl.com/yy3nfggr
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Secrets To Making Up These secrets will help you get back together with your ex. ◆◆◆ http://ow.ly/mOLD301xGxr
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Lecture: Regular Expressions and Regular Languages

  1. 1. Regular Expressions & Regular Languages slideshare: http://www.slideshare.net/marinasantini1/regular-expressions-and-regular-languages Mathematics for Language Technology http://stp.lingfil.uu.se/~matsd/uv/uv15/mfst/ Last Updated 6 March 2015 Marina Santini santinim@stp.lingfil.uu.se Department of Linguistics and Philology Uppsala University, Uppsala, Sweden Spring 2015 1
  2. 2. Acknowledgements  Several  slides  borrowed  from  Jurafsky  and  Mar6n   (2009).    Prac6cal  ac6vi6es  by  Mats  Dahllöf  and  Jurafsky  and   Mar6n  (2009).   2
  3. 3. Reading  Required Reading:   E&G (2013): Ch. 9 (pp. 252-256)   Compendium (3): 7.2, 7.3, 8.2.3   Mats Dahllöf: Reguljära uttryck •  http://stp.lingfil.uu.se/~matsd/uv/uv14/mfst/dok/oh6.pdf  Further Reading:   Chapters  2  in  Jurafsky  D.  &  Mar6n  J.  (2009)  Speech  and  Language  Processing:   An  introduc5on  to  natural  language  processing,  computa5onal  linguis5cs,  and   speech  recogni5on.  Online  draG  version:  hIp://stp.lingfil.uu.se/~san6nim/ml/2014/ JurafskyMar6nSpeechAndLanguageProcessing2ed_draG%202007.pdf   3
  4. 4. Outline  Regular Expressions  Regular Languages  Practical Activities  (Pumping Lemma) 4
  5. 5. 5 Regular Expressions Definitions Equivalence to Finite Automata
  6. 6. 6 Regular Expressions and Text Searching  Everybody does it   Emacs, vi, perl, grep, etc..  Regular expressions are a compact textual representation of a set of strings representing a language.
  7. 7. 7 Example  Find all the instances of the word “the” in a text.   /the/   /[tT]he/   /b[tT]heb/
  8. 8. 8 Errors  The process we just went through was based on two fixing kinds of errors   Matching strings that we should not have matched (there, then, other) •  False positives (Type I)   Not matching things that we should have matched (The) •  False negatives (Type II)
  9. 9. 9 Errors  Reducing the error rate for an application often involves two antagonistic efforts:   Increasing accuracy, or precision, (minimizing false positives)   Increasing coverage, or recall, (minimizing false negatives).
  10. 10. 10 REs: What are they?  Regular expressions describe languages by an algebra.
  11. 11. Link: https://www.youtube.com/watch?v=eOfMcdeyrMU 11
  12. 12. DFA 12
  13. 13. Converting the regular expression (a|b)* to a DFA 13
  14. 14. Converting the regular expression (a*|b*)* to a DFA 14
  15. 15. Converting the regular expression ab(a|b)* to a DFA 15
  16. 16. Remember Jeff Ullman video? 16
  17. 17. 17 Operations on Languages  REs use three operations:   union   concatenation   Kleene star (*) [cleany star]
  18. 18. Union ∪ (aka: disjunction, OR, |, +)  The union of languages is the usual thing, since languages are sets.  Example: {01,111,10}∪{00, 01} = {01,111,10,00}. 18 01 happens to be in both sets, so it will be once in the union
  19. 19. 19 Concatenation: represented by juxtaposition (no punctuation) or middle dot ( · )  The concatenation of languages L and M is denoted LM.  It contains every string wx such that w is in L and x is in M.  Example: {01,111,10}{00, 01} = {0100, 0101, 11100, 11101, 1000, 1001}. In the example, we take 01 from the first language, and we concatenate it with 00 in the second language. That gives us 0100. We then take 01 from the first language again, and we concatenate it with 01 in the second language, and that gives us 0101. Then we take 111 from the first language and we concatenated it with 00 in the second language and this gives us 11100 …. and so on.
  20. 20. 20 Kleene Star: represented by an asterisk aka star (*)  If L is a language, then L*, the Kleene star or just “star,” is the set of strings formed by concatenating zero or more strings from L, in any order.  L* = {ε} ∪ L ∪ LL ∪ LLL ∪ …  Example: {0,10}* = {ε, 0, 10, 00, 010, 100, 1010,…} If you take no strings from L, that would give you the empty string.
  21. 21. IMPORTANT!  FROM NOW ON, LET’S STICK TO THE FOLLOWING CONVENTIONS (OTHERWISE WE WILL BE CONFUSED):   Union ∪ (aka: disjunction, OR) represented by: | or +   Concatenation: represented by juxtaposition (= no punctuation) or middle dot ( · )   Kleene Star: represented by * 21
  22. 22. 22 Precedence of Operators  Parentheses may be used wherever needed to influence the grouping of operators.  Order of precedence is * (highest), then concatenation, then + (lowest). Remember: + = union/disjunction
  23. 23. 23 Examples: REs 1.  L(01) = {01}. 2.  L(01+0) = {01, 0}. 3.  L(0(1+0)) = {01, 00}.   Note order of precedence of operators. 4.  L(0*) = {ε, 0, 00, 000,… }. 5.  L((0+10)*(ε+1)) = all strings of 0s and 1s without two consecutive 1s. 1) The regular expression 01 represents the concatenation of the language consisting of one string, 0 and the language consisting of one string, 1. The result is the language containing the one string 01. 2) The language of 01+0 is the union of the language containing only string 01 and the language containing only string 0. 3) The language of 0 concatenated with 1+0 is the two strings 01 and 00. Notice that we need parentheses to force the + to group first. Without them, since concatenation takes precedence over +, we get the interpretation in the second example. 4) The language of 0* is the star of the language containing only the string 0. This is all strings of 0’s, including the empty string. 5) This example denotes the language with all strings of 0s and 1s without two consecutive 0s. To see why this works, in every such string, each 1 is either followed immediately by a 0, or it comes at the end of the string. (0+10)* denotes all strings in which every 1 is followed by a 0. These strings are surely in the language we want. But we also want these strings followed by a final 1. Thus, we concatenate the language of (0+10)* with epsilon+1. This concatenation gives us all the strings where 1s are followed by 0s, plus all those strings with an additional 1 at the end.
  24. 24. 24 Equivalence of REs and Finite Automata  For every RE, there is a finite automaton that accepts the same language.  And we need to show that for every finite automaton, there is a RE defining its language.
  25. 25. 25 Summary Automata and regular expressions define exactly the same set of languages: the regular languages.
  26. 26. REGULAR LANGUAGES 26
  27. 27. 27 The Chomsky Hierachy Regular (DFA) Context- free (PDA) Context- sensitive (LBA) Recursively- enumerable (TM) •  Hierarchy of classes of formal languages One language is of greater generative power or complexity than another if it can define a language that other cannot define. Context-free grammars are more powerful that regular grammars
  28. 28. 28 Regular Languages  A language L is regular if it is the language accepted by some DFA.   Note: the DFA must accept only the strings in L, no others.  Some languages are not regular.
  29. 29. Only languages that meet the following criteria are regular languages: 29
  30. 30.   Regular language derive their name from the fact that the strings they recognize are (in a formal computer science sense) “regular.”   This implies that there are certain kinds of strings that it will be very hard, if not impossible, to recognize with regular expressions, especially nested syntactic structures in natural language. 30
  31. 31. Formal languages vs regular languages  A formal language is a set of strings, each string composed of symbols from a finite set called an alphabet.   Ex: {a,b!}  Formal languages are not the same as regular languages…. 31
  32. 32. 32 But Many Languages are Regular  They appear in many contexts and have many useful properties.
  33. 33. How to tell if a language is not regular  The most common way to prove that a language is regular is to build a regular expression for the language. 33
  34. 34. Pumping Lemma 34
  35. 35. Prac6cal  Ac6vity  1    The  language  L  contains  all  strings  over  the   alphabet  {a,b}  that  begin  with  a  and  end  with  b,   ie:    Write a regular expression that defines the language L.       35
  36. 36. Practical Activity 1: Possible Solution 36
  37. 37. Your Solutions 37 In between the concatenation of a and b there must be 0 or more unions (disjuctions) of a and b. Reference: slides 17-22
  38. 38. Practical Activity 2  Draw a deterministic finite-state automaton that accepts the following regular expression: 38 ( (ab) | c)* Alternative notation style: ie: 0 or more occurences of the disjunction ab | c Test the automaton with these legal strings in the language : 0 abc a ab cccabc cbacccabababccc ….
  39. 39. Practical Activity 2: Possible Correct Solution 39 Having the initial state as a final state gives us the empty string as an element in the language.
  40. 40. Your solutions (1): when we interpret ”+” as disjunction, these solutions are wrong because ”c” happens only after ”a” and ”b”… 40 Test these automata with the string on slide 35
  41. 41. Your solutions (2): same as previous slide. In addition, here no final states are shown… 41 Test these automata with the string on slide 35
  42. 42. Practical Activity 3   Construct a grep regular expression that matches patterns containing at least one “ab” followed by any number of bs.   Construct a grep regular expression that matches any number between 1000 and 9999. 42
  43. 43. Practical Activity 3: Possible Solutions   grep ‘(ab)+b*’   [1-9][0-9]{3} 43
  44. 44. Exercises: E&G (2013)  Övning 9.40  Optional: as many as you can  AGer  having  completed  the  exercises,   check  out  the  solu6ons  at  the  end  of  the   book.       44
  45. 45. The End 45

×