Regex in +60’
By: Ghulam Imaduddin
ghulam@ideweb.co.id
Before We Start
• Tools:
 Notepad++ (Windows) - https://notepad-plus-plus.org/
 Sublime (Mac) - https://www.sublimetext.com/
 Online tools - https://regex101.com/
• Sample dataset
Common Type
• Character class: [abc]
• Character: . s w d
• Quantifiers: ? + * {1}
• Anchors: ^ $
• Group/Capture: (…) (a|b)
Character class
• [abc]: single character of: a, b, or c
• [^abc]: character except: a, b, or c
• [a-z]: character in the range of a-z
• [a-zA-Z0-9]: character in the range of a-z or A-Z or 0-9
Character
• . : any single character
• s: any whitespace (space, tab); S: any non-whitespace
• d: any digit (equal to [0-9])
• w: any word (equal to [a-zA-Z0-9]
• t: tab character
• r: carriage return
• n: new line
Quantifiers
• a?: Zero or one of a
• a*: zero or more of a
• a+: one or more of a
• a{3}: exactly 3 of a
• a{3,}: 3 or more of a
• a{3,6}: between 3 and 6 of a
Anchors
• ^: Start of string
• $: end of string
• b: any word boundary
• B: any non-word boundary
Group Capture
• (…): capture everything enclosed
• (a|b): match either a or b
Hands-on
One line to multiline
Input:
Expected Output:
Steps:
1. Change “, ” to “enter”
2. Parse name and email
3. Copy to excel
1
2
Cleansing & Reformat
Input:
1. Step 1
• find: ^(d{1,2})-(d{1,2})-(d{4})$
• Replace with: 3-2-1
2. Step 2:
• find: -(d{1})$
• Replace with: -01
3. Step 3: remove non-date line ;)
Output: How:
Q & A

Regex intro

  • 1.
    Regex in +60’ By:Ghulam Imaduddin ghulam@ideweb.co.id
  • 2.
    Before We Start •Tools:  Notepad++ (Windows) - https://notepad-plus-plus.org/  Sublime (Mac) - https://www.sublimetext.com/  Online tools - https://regex101.com/ • Sample dataset
  • 3.
    Common Type • Characterclass: [abc] • Character: . s w d • Quantifiers: ? + * {1} • Anchors: ^ $ • Group/Capture: (…) (a|b)
  • 4.
    Character class • [abc]:single character of: a, b, or c • [^abc]: character except: a, b, or c • [a-z]: character in the range of a-z • [a-zA-Z0-9]: character in the range of a-z or A-Z or 0-9
  • 5.
    Character • . :any single character • s: any whitespace (space, tab); S: any non-whitespace • d: any digit (equal to [0-9]) • w: any word (equal to [a-zA-Z0-9] • t: tab character • r: carriage return • n: new line
  • 6.
    Quantifiers • a?: Zeroor one of a • a*: zero or more of a • a+: one or more of a • a{3}: exactly 3 of a • a{3,}: 3 or more of a • a{3,6}: between 3 and 6 of a
  • 7.
    Anchors • ^: Startof string • $: end of string • b: any word boundary • B: any non-word boundary
  • 8.
    Group Capture • (…):capture everything enclosed • (a|b): match either a or b
  • 9.
  • 10.
    One line tomultiline Input: Expected Output: Steps: 1. Change “, ” to “enter” 2. Parse name and email 3. Copy to excel 1 2
  • 11.
    Cleansing & Reformat Input: 1.Step 1 • find: ^(d{1,2})-(d{1,2})-(d{4})$ • Replace with: 3-2-1 2. Step 2: • find: -(d{1})$ • Replace with: -01 3. Step 3: remove non-date line ;) Output: How:
  • 12.