Your SlideShare is downloading. ×
0
Regular Expressions Minh Hoang TO Portal Team
Agenda <ul><li>Finite State Machine </li></ul><ul><li>Pattern Parser   </li></ul><ul><li>Java Regex   </li></ul><ul><li>Pa...
Finite State Machine
State Diagram
JIRA Issue Lifecycle
Java Thread Lifecycle
Java Compilation Flow
Finite State Machine - FSM <ul><li>Behavioral model to describe working flow of a system </li></ul>
Finite State Machine - FSM <ul><li>Directed graph with labeled edges </li></ul>
Pattern Parser
Classic Problem <ul><li>A  – Finite characters set Ex: A  =   {a, b, c, d,..., z}  or  A  =   { a, b, c,..., z, public, cl...
Classic Problem - Samples <ul><li>Split a sequence of characters into an array of subsequences   String path = “/portal/en...
Finite State Machine & Classic Problem <ul><li>Acceptor FSM? </li></ul><ul><li>How to transform Classic Problem into graph...
FSM – Word Accepting <ul><li>Consider a word  W  – sequence of characters from character set  A     W =  “abcd...xyz” FSM ...
Acceptor FSM <ul><li>Given a pattern  P , a FSM is called  Acceptor FSM  if it  accepts any word  matching pattern  P .  E...
How Pattern Parser Works? Traversing directed graph associated with Acceptor FSM   1. Start from root node   2. Read next ...
Example One <ul><li>Recognize pattern </li></ul><ul><li>  eXo.*er </li></ul><ul><li>in: AAA eXo123er BBB eXoer CCC eXoeXoe...
Example One <ul><li>Acceptor FSM with 8 states: START  –  Start reading input sequence e  –  encounter   e eX  –  encounte...
 
Example Two <ul><li>Recognize comment block   /*  */ in: /* Don't ask * / final int innerClassVariable; </li></ul><ul><li>...
Example Two <ul><li>Acceptor FSM with 5 states: START  –  start reading input sequence OUT  –  stay away from comment bloc...
 
Finite State Machine With Stack <ul><li>Example Two is slightly harder than Example One as transition decision depends on ...
Java Regex
Model <ul><li>Pattern:  Acceptor Finite State Machine </li></ul><ul><li>Matcher:  Parser </li></ul>
java.util.regex.Pattern <ul><li>Construct FSM accepting pattern   Pattern p =  Pattern.compile(“a.*b”); FSM states are ins...
java.util.regex.Matcher <ul><li>Find next subsequence matching pattern   find() </li></ul><ul><li>Get capturing groups fro...
Capturing Group <ul><li>Two Pattern objects Pattern p = Pattern.compile(“abcd.*efgh”); Pattern q = Pattern.compile(“abcd(....
Capturing Group <ul><li>Hold additional information on each match while(matcher.find()) {   matcher.group(index); } </li><...
Capturing Group <ul><li>Pattern.compile(“abc(defgh”); Pattern.compile(“abcdef)gh”); ->  PatternSyntaxException </li></ul><...
Operators <ul><li>Union   [a-zA-Z-0-9] </li></ul><ul><li>Negation   [^abc]   [^X] </li></ul>
Contextual Match <ul><li>X(?=Y) </li></ul><ul><li>Once match X, look ahead to find Y </li></ul><ul><li>X(?!= Y) </li></ul>...
Tips <ul><li>Pattern  is stateless  ->  Maximize reuse We often see:   static final Pattern p = Pattern.compile(“a*b”); </...
Parsers in GateIn
Parsers in GateIn <ul><li>JavaScript Compressor </li></ul><ul><li>CSS Compressor </li></ul><ul><li>Groovy Template Optimiz...
Advanced Theory
Grammar & Language <ul><li>Any word matching pattern eXo.*er is a combination transforms, starting from  S S -> eXoQer Q -...
Finite State Machine & Language <ul><li>Language accepted by a FSM with Stack must be built from a context-free grammar Ex...
Upcoming SlideShare
Loading in...5
×

Regular expression made by To Minh Hoang - Portal team

1,671

Published on

This is a presentation from eXo Platform SEA.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,671
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Regular expression made by To Minh Hoang - Portal team"

  1. 1. Regular Expressions Minh Hoang TO Portal Team
  2. 2. Agenda <ul><li>Finite State Machine </li></ul><ul><li>Pattern Parser </li></ul><ul><li>Java Regex </li></ul><ul><li>Parsers in GateIn </li></ul><ul><li>Advanced Theory </li></ul>
  3. 3. Finite State Machine
  4. 4. State Diagram
  5. 5. JIRA Issue Lifecycle
  6. 6. Java Thread Lifecycle
  7. 7. Java Compilation Flow
  8. 8. Finite State Machine - FSM <ul><li>Behavioral model to describe working flow of a system </li></ul>
  9. 9. Finite State Machine - FSM <ul><li>Directed graph with labeled edges </li></ul>
  10. 10. Pattern Parser
  11. 11. Classic Problem <ul><li>A – Finite characters set Ex: A = {a, b, c, d,..., z} or A = { a, b, c,..., z, public, class, extends, implements, while, if,...} </li></ul><ul><li>Pattern P and input sequence INPUT made of A 's elements </li></ul><ul><li>Ex: P = “a.*b” or P = “class.*extends.*” INPUT = “aaabbbcc” or INPUT = a Java source file </li></ul><ul><li>-> Parser reads character-by-character INPUT and recognizes all subsequences matching pattern P </li></ul>
  12. 12. Classic Problem - Samples <ul><li>Split a sequence of characters into an array of subsequences String path = “/portal/en/classic/home”; String[] segments = path.split(“/”); </li></ul><ul><li>Handle comment block encountered in a file </li></ul><ul><li>Override readLine() in BufferedReader </li></ul><ul><li>Extract data from REST response </li></ul><ul><li>Write an XML parser from scratch </li></ul>
  13. 13. Finite State Machine & Classic Problem <ul><li>Acceptor FSM? </li></ul><ul><li>How to transform Classic Problem into graph traversing problem with well-known generic solution? Find pattern occurrences ↔ Traversing directed graph with labeled edges </li></ul>
  14. 14. FSM – Word Accepting <ul><li>Consider a word W – sequence of characters from character set A W = “abcd...xyz” FSM having graph edges labeled with characters from A , accepts W if there exists a path connecting START node to one of END nodes START = S1 -> S2 -> … -> Sn = END 1. Duplicate of intermediate nodes is allowed 2 . The transition from S_i -> S_(i+1) is determined (labeled) by i-th character of W </li></ul><ul><li> </li></ul>
  15. 15. Acceptor FSM <ul><li>Given a pattern P , a FSM is called Acceptor FSM if it accepts any word matching pattern P . Ex: Acceptor FSM of “a[0-9]b” accepts any elements from word set { “a0b”, “a1b”, “a2b”, “a3b”, “a4b”, “a5b”, “a6b”, “a7b”, “a8b”, “a9b”} </li></ul>
  16. 16. How Pattern Parser Works? Traversing directed graph associated with Acceptor FSM 1. Start from root node 2. Read next characters from INPUT, then makes move according to transition rules 3. Repeat second step until visiting one leaf node or INPUT becomes empty 4. Return OK if leaf node refers to success match.
  17. 17. Example One <ul><li>Recognize pattern </li></ul><ul><li> eXo.*er </li></ul><ul><li>in: AAA eXo123er BBB eXoer CCC eXoeXoer DDD </li></ul>
  18. 18. Example One <ul><li>Acceptor FSM with 8 states: START – Start reading input sequence e – encounter e eX – encounter eX eXo – encounter eXo eXo.* – encounter eXo.* eXo.*e – encounter eXo.*e END – subsequence matching eXo.*er found FAILURE </li></ul>
  19. 20. Example Two <ul><li>Recognize comment block /* */ in: /* Don't ask * / final int innerClassVariable; </li></ul><ul><li> </li></ul>
  20. 21. Example Two <ul><li>Acceptor FSM with 5 states: START – start reading input sequence OUT – stay away from comment blocks ENTERING – at the beginning of comment block IN – stay inside a comment block LEAVING – at the end of comment block </li></ul><ul><li> </li></ul>
  21. 23. Finite State Machine With Stack <ul><li>Example Two is slightly harder than Example One as transition decision depends on past information -> We must keep something in memory </li></ul><ul><li>FSM with Stack = Ordinary FSM + Stack Structure storing past info Contextual transition is determined by ( next input character , stack state ) </li></ul><ul><li> </li></ul>
  22. 24. Java Regex
  23. 25. Model <ul><li>Pattern: Acceptor Finite State Machine </li></ul><ul><li>Matcher: Parser </li></ul>
  24. 26. java.util.regex.Pattern <ul><li>Construct FSM accepting pattern Pattern p = Pattern.compile(“a.*b”); FSM states are instances of java.util.regex.Pattern$Node </li></ul><ul><li>Generate parser working on input sequence Matcher matcher = p.matcher(“aaabbbb”); </li></ul>
  25. 27. java.util.regex.Matcher <ul><li>Find next subsequence matching pattern find() </li></ul><ul><li>Get capturing groups from latest match group() </li></ul>
  26. 28. Capturing Group <ul><li>Two Pattern objects Pattern p = Pattern.compile(“abcd.*efgh”); Pattern q = Pattern.compile(“abcd(.*)efgh”); String text = “abcd12345efgh”; Matcher pM = p.match(text); Matcher qM = q.match(text); </li></ul><ul><li>pM.find() == qM.find(); </li></ul><ul><li>pM.group(1) != qM.group(1); </li></ul>
  27. 29. Capturing Group <ul><li>Hold additional information on each match while(matcher.find()) { matcher.group(index); } </li></ul><ul><li>Pattern P = (A)(B(C)) matcher.group(0) = the whole sequence ABC matcher.group(1) = ABC matcher.group(2) = BC matcher.group(3) = C </li></ul>
  28. 30. Capturing Group <ul><li>Pattern.compile(“abc(defgh”); Pattern.compile(“abcdef)gh”); -> PatternSyntaxException </li></ul><ul><li>Pattern.compile(“abc(defgh”); Pattern.compile(“abcdef)gh”); -> Success thanks to escape character '' </li></ul>
  29. 31. Operators <ul><li>Union [a-zA-Z-0-9] </li></ul><ul><li>Negation [^abc] [^X] </li></ul>
  30. 32. Contextual Match <ul><li>X(?=Y) </li></ul><ul><li>Once match X, look ahead to find Y </li></ul><ul><li>X(?!= Y) </li></ul><ul><li>Once match X, look ahead and expect not find Y </li></ul><ul><li>X(?<= Y) </li></ul><ul><li>Once match X, look behind to find Y </li></ul><ul><li>X(?<!= Y) </li></ul><ul><li>Once match X, look behind and expect not find Y </li></ul>
  31. 33. Tips <ul><li>Pattern is stateless -> Maximize reuse We often see: static final Pattern p = Pattern.compile(“a*b”); </li></ul><ul><li>Be careful with String.split String.split vs Java loop + String.charAt </li></ul>
  32. 34. Parsers in GateIn
  33. 35. Parsers in GateIn <ul><li>JavaScript Compressor </li></ul><ul><li>CSS Compressor </li></ul><ul><li>Groovy Template Optimizer </li></ul><ul><li>Navigation Controller Extracting URL param = Regex matching + Backtracking algorithm </li></ul><ul><li>StaxNavigator (Nice XML parser based on StAX) </li></ul>
  34. 36. Advanced Theory
  35. 37. Grammar & Language <ul><li>Any word matching pattern eXo.*er is a combination transforms, starting from S S -> eXoQer Q -> RQT Q -> '' R -> {a,b,c,d,...} T -> {a,b,c,d,...} </li></ul><ul><li>Language of a Grammar = Vocabularies generated by finite-combination of transforms, starting from S Ex: Any valid Java source file is generated by a finite number of transforms mentioned in Java Grammar (JLS) </li></ul>
  36. 38. Finite State Machine & Language <ul><li>Language accepted by a FSM with Stack must be built from a context-free grammar Explicit steps to build such context-free grammar are described in Kleene theorem </li></ul><ul><li>Context-free grammar Language is accepted by a FSM with Stack Explicit steps to build such Finite State Machine are described in Kleene theorem </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×