Cs419 lec3 lexical analysis using re

576 views
439 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
576
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
49
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cs419 lec3 lexical analysis using re

  1. 1. WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Computer Science Department dr.sharaf@from-masr.com
  2. 2. Dr. Hussien M. Sharaf PART ONE 2
  3. 3. SCANNING A scanner reads a stream of characters and puts them together into some meaningful (with respect to the source language) units called tokens.  It produces a stream of tokens for the next phase of compiler.  Stream of characters Dr. Hussien M. Sharaf scanner Stream of tokens 3
  4. 4.  Lexical Analysis (Scanning): is the task of reading a stream of input characters and dividing them into tokens(words) that belong to a certain language. Stream  a[index]=4+2; Scanner a [ index ] = 4 + 2 Identifier left bracket Identifier left bracket assignment Number operator Number Responsibility: accepts and splits a stream into tokens according to rules defined by the source code language. Dr. Hussien M. Sharaf 4
  5. 5. REGULAR EXPRESSIONS   RE is a language for describing simple languages and patterns. Algorithm for using RE 1. Define a pattern: S1StudentnameS2 2. Loop 3. Find next pattern 4. Store StudentName into DB or encrypt StudentName 5. Until no match   REs are used in applications that involve file parsing and text matching. Many Implementations have been made for RE. Dr. Hussien M. Sharaf 5
  6. 6. HOW CAN RE SERVE IN LEXICAL ANALYSIS?  Algorithm for using RE in LA 1. Define a pattern for each valid statement: 2. Loop 3. Find next pattern and call it Token 4. Parse(Token) 5. Until no match 6. If Not EOF stream then return Error else return Success Dr. Hussien M. Sharaf 6
  7. 7. RE IN C++ BOOST LIBRARY    //Pattern matching in a String // Created by Flavio Castelli <flavio.castelli@gmail.com> // distrubuted under GPL v2 license #include <boost/regex.hpp> #include <string> Please check [http://flavio.castelli.name/2006/02/16/regexpwith-boost/] int main() { boost::regex pattern (“Ali”,boost::regex_constants::icase|boost::regex_constants::perl); std::string stringa ("Searching for Ali, Is there another Ali? Yes there is another Ali"); int count =0; while(boost::regex_search (stringa, pattern, boost::regex_constants::format_perl)) {count++;} Output is : “3 found” printf (“%d foundn“, count); return 0; } Dr. Hussien M. Sharaf 7
  8. 8. RE IN C++ STL NEEDS EXTRA FEATURE PACK #include <regex> #include <iostream> #include <string> bool is_email_valid(const std::string& email) { // define a regular expression const std::tr1::regex pattern("(w+)(.|_)?(w*)@(w+)(.(w+))+"); // try to match the string with the regular expression return std::tr1::regex_match(email, pattern); } Please check http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c15339 Dr. Hussien M. Sharaf 8
  9. 9. RE IN C++ STL (CONT.) std::string str("abc + (inside brackets)dfsd"); std::smatch m; std::regex_search(str, m, std::regex(“b")); if (m[0].matched) std::cout << "Found " << m.str(0) << 'n'; else std::cout << "Not foundn“; Please check http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c15339 Dr. Hussien M. Sharaf 9
  10. 10. RE IN PYTHON Example for Python RE import re line = "Cats are smarter than dogs" matchObj = re.search( r'(.*) are(.*)', line, re.M|re.I) if matchObj: print "matchObj.group() : ", matchObj.group() print "matchObj.group(1) : ", matchObj.group(1) print "matchObj.group(2) : ", matchObj.group(2) else: print "No match!!" Some notifications of Python RE d Matches any decimal digit; this is equivalent to the class [0-9]. D Matches any non-digit character; this is equivalent to the class [^0-9]. s Matches any whitespace character; this is equivalent to the class [ tnrfv]. S Matches any non-whitespace character; this is equivalent to the class [^ tnrfv]. w Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_]. W Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_]. Please check http://www.tutorialspoint.com/python/python_reg_expressions.htm Dr. Hussien M. Sharaf 10
  11. 11. OTHER CODE SAMPLES   1. 2. 3.  1. 2. 3. 4. Other samples can be check at: Boost Library http://stackoverflow.com/questions/5804453/c-regularexpressions-with-boost-regex http://www.codeproject.com/KB/string/regex__.aspx http://www.boost.org/doc/libs/1_43_0/more/getting_started/w indows.html#build-from-the-visual-studio-ide STL Library http://www.codeproject.com/KB/recipes/rexsearch.aspx http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c1 5339 [recommended] http://www.microsoft.com/download/en/confirmation.aspx?id= 6922 [feature pack VS2008 Pack] http://channel9.msdn.com/Shows/Going+Deep/C9-LecturesStephan-T-Lavavej-Standard-Template-Library-STL-8-of-n [Video] Dr. Hussien M. Sharaf 11
  12. 12. Dr. Hussien M. Sharaf PART TWO 12
  13. 13. DEFINITION OF A LANAGUAGE “L” ∑ is the set of alphabets * 1. Let ∑ ={x}, Then L = ∑ * In RE, we write x * 2. Let ∑ ={x, y}, Then L = ∑ * In RE, we write (x+y) 3. Kleene‟s star “*” means any combination of letters of length zero or more. Note: Please study the appendix at the end of the lecture Dr. Hussien M. Sharaf 13
  14. 14. RE RULES Given an alphabet ∑, the set of regular expressions is defined by the following rules. 1. 2. For every letter in ∑, the letter written in bold is a regular expression. Λ- lamda or ε-epsilon is a regular expression. Λ or ε means an empty letter. If r1 and r2 are regular expressions, then so are: 1. 2. 3. 4. (r1) r1 r2 r1+r2 r1* and also r2* NOTE 1: r1+ is not a RE NOTE 2: spaces are ignored as long as they are not included in ∑ 3. Nothing else is a regular expression. Dr. Hussien M. Sharaf 14
  15. 15. RE-1 Example 1 ∑ ={a, b} - Formally describe all words with a followed by any number of b‟s L = a b* = ab* - Give examples for words in L {a ab abb abbb …..}  Dr. Hussien M. Sharaf 15
  16. 16. RE-2 Example 2 ∑ ={a, b} - Formally describe the language that contains nothing and contains words where any a must be followed by one or more b‟s L = b*(abb*)* OR (b+ab)* - Give examples for words in L {Λ ab abb abababb ….. b bbb…}  Dr. Hussien M. Sharaf 16
  17. 17. RE-3 Example ∑ ={a, b} - Formally describe all words with a followed by one or more b‟s L = a b+ = abb* - Give examples for words in L {ab abb abbb …..}  Dr. Hussien M. Sharaf 17
  18. 18. RE-4 Example ∑ ={a, b, c} - Formally describe all words that start with an a followed by any number of b‟s and then end with c. L = a b*c - Give examples for words in L {ac abc abbc abbbc …..}  Dr. Hussien M. Sharaf 18
  19. 19. RE-5 Example ∑ ={a, b} - Formally describe all words where a‟s if any come before b‟s if any. L = a* b* - Give examples for words in L {Λ a b aa ab bb aaa abb abbb bbb…..} NOTE: a* b* ≠ (ab)*  because first language does not contain abab but second language has. Once single b is detected then no a„s can be added Dr. Hussien M. Sharaf 19
  20. 20. RE-6 Example ∑ ={a} - Formally describe all words where count of a is odd. L = a(aa)* OR (aa)*a - Give examples for words in L {a aaa aaaaa …..}  Dr. Hussien M. Sharaf 20
  21. 21. RE-7.1 Example ∑ ={a, b, c} - Formally describe all words where single a or c comes in the start then odd number of b‟s. L = (a+c)b(bb)* - Give examples for words in L {ab cb abbb cbbb …..}  Dr. Hussien M. Sharaf 21
  22. 22. RE-7.2 Example ∑ ={a, b, c} - Formally describe all words where single a or c comes in the start then odd number of b‟s in case of a and zero or even number of b‟s in case of c . L = ab(bb)* +c(bb)* - Give examples for words in L {ab c abbb cbb abbbbb …..}  Dr. Hussien M. Sharaf 22
  23. 23. RE-8 Example ∑ ={a, b, c} - Formally describe all words where one or more a or one or more c comes in the start then one or more b‟s. L = (a+c) + b+= (aa*+cc*) bb* - Give examples for words in L {ab cb aabb cbbb …..}  Dr. Hussien M. Sharaf 23
  24. 24. RE-9 Example ∑ ={a, b} - Formally describe all words with length three. L = (a+b) 3 =(a+b) (a+b) (a+b) - List all words in L {aaa aab aba baa abb bab bba bbb} - What is the count of words of length 4? 16 = 24 - What is the count of words of length 44? 244  Dr. Hussien M. Sharaf 24
  25. 25. ASSIGNMENTS 1. 2. 3. [One week] Solve Compilers-Sheet_0General.pdf [One week] Solve Compilers_Sheet_1_RE.pdf [Two weeks] Implement an email validator using Python. a. b. c. Build a GUI using python Tkinter. The GUI must include at least two labels, a TextBox and a Button. The user is given a message on a label stating whether the email is a valid email or not. Dr. Hussien M. Sharaf 25
  26. 26. DEADLINE Thursday 28th of Feb is the deadline.  No excuses.  Don‟t wait until last day.  TAs can help you to the highest limit within the next 3 days.  Dr. Hussien M. Sharaf 26
  27. 27. APPENDIX Dr. Hussien M. Sharaf 27
  28. 28. Dr. Hussien M. Sharaf APPENDIX MATHEMATICAL NOTATIONS AND TERMINOLOGY
  29. 29. Dr. Hussien M. Sharaf SETS-[1] A set is a group of elements represented as a unit.  For example S ={a, b, c} a set of 3 elements  Elements included in curly brackets { } a f S S   a belongs to S f does NOT belong to S
  30. 30. Dr. Hussien M. Sharaf SETS-[2] If S ={a, b, c} and T = {a, b} Then T S T is a subset of S  T S ={a, b} T intersects S ={a, b} T S ={a, b, c} T Union S ={a, b, c} Venn diagram for S and T
  31. 31. Dr. Hussien M. Sharaf SEQUENCES AND TUPLES A sequence is a list of elements in some order. (2, 4, 6, 8, ….) parentheses  A finite sequence of K-elements is called k-tuple. (2, 4) 2-tuple or pair (2, 4, 6) 3-tuple  A Cartesian product of S and P (S x P) is a set of 2-tuples/pairs (i, j) where i S and j P 
  32. 32. Dr. Hussien M. Sharaf EXAMPLE FOR CARTESIAN PRODUCT Example (1) If N ={1,2,…} set of integers; O ={+, -} N x O ={(1,+), (1,-), (2,+), (2,-), …..} Meaningless?  Example (2) N x O x N={(1,+,1), (1,-,1), (2,+,1), (2,-,2), …..}  Does this make sense? 
  33. 33. Dr. Hussien M. Sharaf CONTINUE CARTESIAN PRODUCT Example (3) If A ={a, b,…, z} set of English alphabets; A x A ={(a, a), (a, b), ..,(d, g), (d, h), …(z, z)} These are all pairs of set A.  Example (4) If U={0,1,2,3…,9} set of digits then U x U x U ={(1,1,1), (1,1,2),...,(7,4,1), ….., (9,9,9)} 
  34. 34. Dr. Hussien M. Sharaf Continue Cartesian Product Example (5) If U={0, 1, 2, 3…, 9} set of digits then U x… x U ={(1,..,1), (1,..,2),...,(7,..,1), ..., (9,..,9)}  n  Can be written as Un
  35. 35. Dr. Hussien M. Sharaf RELATIONS AND FUNCTIONS        A relation is more general than a function. Both maps a set of elements called domain to another set called co-domain. In case of functions the co-domain can be called range. R:D C A relation has no restrictions. f:D R A function can not map one element to two differnet elements in the range.
  36. 36. Dr. Hussien M. Sharaf Bijective function SURJECTIVE FUNCTION T P1 P2 P3 P4 t1 S s1 t2 s2 t3 s3 Pn Many planes fly at the same time T t1 t2 t3 Only one plane lands on one runway at a time
  37. 37. Dr. Hussien M. Sharaf WHAT IS THE USE OF FUNCTIONS IN CS?   Helps to describe the transition function that transfer the computing device from one state to another. Any computing device must have clear states. D s1 s2 s3 R s2 s3 shalt
  38. 38. Dr. Hussien M. Sharaf GRAPHS Is a visual representation of a set and a relation of this set over itself.  G = (V, E) V ={1, 2, 3, 4, 5} E = {(i, j) and (j, i)| i, j belongs to V} E is a set of pairs ={(1, 3), (3, 1) …(5, 4), (4, 5)}  1 2 4 5 3
  39. 39. Dr. Hussien M. Sharaf GRAPHS CONSTRUCT Is there a formal language to describe a graph?  G =(V, E) Where :  V is a set of n vertices ={i| i < n-1}  E is a set of edges. Each edge is a pair of elements in V ={(i, j), (j, i)|i, j V} or={(i, j) |i, j V}  1 2 4 5 3
  40. 40. Dr. Hussien M. Sharaf DEFINITIONS Alphabet) : a finite set of letters, denoted  Letter: an element of an alphabet  Word: a finite sequence of letters from the alphabet  (empty string): a word without letters.  Language * Kleene „s Star : the set of all words on 
  41. 41. Dr. Hussien M. Sharaf STRINGS AND LANGUAGES A string w1 over an alphabet Σ is a finite sequence of symbols from that alphabet. 1. Σ: is a set of symbols i.e. {a, b, c, …, z} English letters; {0,1, 2,…,9,.} digits of Arabic numbers {AM, PM}different clocking system {1, 2, …, 12}hours of a clock; 
  42. 42. Dr. Hussien M. Sharaf STRINGS -2 2.1 String: is a sequence of Σ (sigma) symbols Σ Example Description Σ to the power? 5 {a, b, c, …, z} apple English string Σ {0,1, 2,…,9,.} 35 the oldest age for girls Σ {AM, PM} {1, 2, …, 12} PM 12 clocking system a specific hour in the day 2 1 Σ 1 Σ or Σ 2
  43. 43. Dr. Hussien M. Sharaf STRINGS - 3 2.2 Empty String is Λ (Lamda) is of length zero 0 Σ =Λ 2.3 Reverse(xyz) =zyx 2.4 Palindrome is a string whose reverse is identical to itself. If Σ = {a, b} then PALINDROME ={Λ and all strings x such that reverse(x) = x } radar, level, reviver, racecar, madam, pop and noon.
  44. 44. Dr. Hussien M. Sharaf STRINGS - 4 2.5 Kleene star * or Kleene closure is similar to cross product of a set/string over itself. * If Σ = {x}, then Σ = {Λ x xx xxx ….} + If Σ = {x}, then Σ = {x xx xxx ….} If S = {w1 , w2 , w3 } then * S ={Λ, w1 , w2 , w3 , w1w1 , w1w2 , w1w3 , ….} + S ={w1 , w2 , w3 , w1w1 , w1w2 , w1w3 , ….} + Note1: if w3 =Λ, then Λ S * ** Note2: S = S
  45. 45. Dr. Hussien M. Sharaf PART THREE EXERCISES
  46. 46. Dr. Hussien M. Sharaf EXERCISE 1 Assume Σ={0, 1} 2 1. How many elements are there in Σ ? Length(Σ) X Length(Σ) = 2 X 2 =4 3 2. How many combinations of in Σ ? 2 X 2 X 2 = 23 3. How many elements are there in Σn? (Length(Σ))n= 2n 
  47. 47. Dr. Hussien M. Sharaf EXERCISE 2 Assume A1={AM, PM}, A2 = {1, 2, …59}, A3 = {1, 2, …12} 1. How many elements are there in A1 A3? Length(A1) X Length(A3) = 2 X 12 =24  How many elements are there in A1 A2 A3)? Length(A1) X Length(A3) = 2 X 59 X 12 = 1416 2.
  48. 48. Dr. Hussien M. Sharaf EXERCISE 3 Assume L1={Add, Subtract}, DecimalDigits = {0,1, 2, …9} 1. Construct integer numbers out of L2. DecimalDigits + -A number of any length of digits. 2. Construct a language for assembly commands from L1 and DecimalDigits . L1 DecimalDigits + {,} DecimalDigits + - Commands in form of Add 1000, 555 
  49. 49. Dr. Hussien M. Sharaf EXERCISE 4 Assume HexaDecimal = {0,1, …9,A,B,C,D,E,F} 1. How many HexaDecimals of length 4? 164 2. How many HexaDecimals of length n? 16n 3. How many elements are there in {0, 1}8 ? 28 = 256 
  50. 50. Dr. Hussien M. Sharaf THANK YOU 50

×