Regular Expressions
Ben Simpson - <3 HUB
Introductions
●
●
●
●

Working with web technologies for 10 years
Former HUB supervisor
Tour de jobs: http://tinyurl.com/kmsns38
Graduated from CSU with a BAS in
Technology Management 2013
● Husband and proud father
● Presenter on regular expressions!
What Is a Regular
Expression?
Pattern matching
What Could I Do With a RegExp?
●
●
●
●
●
●

Searching
Syntax highlighting
Data validation
Sanitation
Data queries / extraction
Many tasks that require matching a pattern
RegExps Won’t Let You Time Travel
Brain Teaser
Which of the following is a valid telephone
number?
1. 678 466 4000
2. (678) 466-4000
3. 1234
4. domainuser
5. 1 (800) 1234 567
How did you know?
Depends on who you ask...
We Pattern Match Every Day
● Telephone numbers follow a pattern that we
recognize
● This pattern has rules (3 digit zip, 7 digit
number, numeric only)
● There are often many variations to a pattern
(optional intl code)
Literal Characters
String: The cat in the hat
RegExp: /at/
The cat in the hat
Regular Expressions in Javascript
var haystack = "The cat in the hat";
var needle = new RegExp(/cat/);
haystack.match(needle); // truthy
needle = new RegExp(/dog/);
haystack.match(needle); // falsey
Well that wasn’t so bad
The best is yet to come!
Special Characters (Metacharacters)
●  - escape character
● ^ - beginning of line (not
inside brackets)

● $ - ending of line
● . - wildcard
● | - or junction

●
●
●
●
●
●

? - zero or one
* - zero or more
+ - one or more
() - grouping
[] - character set
{} - repetition
Demonstration of Special Characters
String: ...To login to your email use the
username: “ben.simpson@mail.com” with a
password “password123”...
RegExp: /username "(.*)" .* password "(.*)"/
Results: 1. ben.simpson@mail.com
2. password123
Shorthand Character Classes
● d - digit [0-9]
● w - word
● s - whitespace

● D - digit [^d]
● W - word [^w]
● S - whitespace [^s]
Wait a Second!
You said this was easy
Thinking about a Telephone Pattern
●
●
●
●
●
●
●
●
●

Optional international code
3 digit area code
7 digit number
Optional extension
What about alpha phrases? (e.g. 678 466-HELP)
What is the length of intl codes? (e.g. 358 for Finland)
Are parenthesis optional?
Is spacing optional?
Country specific formats (e.g. France 06 87 71 23 45)
Regular Expression - Telephone #
String: 678 466 4357
RegExp: d{3} d{3} d{4}
String: (678) 466-4357
RegExp: (d{3}) d{3}-d{4}
Telephone # - Two Variations
String: 678 466 4357
(678) 466-4357
RegExp: (?d{3})? d{3}[s-]d{4}
Telephone # - Three Variations
String: 678 466 4357
(678) 466-4357
1 (678) 466-4357
RegExp: d*s?(?d{3})? d{3}[s-]?d{4}
That Escalated Quickly
Surprisingly Difficult
● Seemingly simple patterns can become very
complex.
● Its best to work against data that is
consistent, or regular in its implementation of
patterns
● If the data is too dirty, a regular expression
won’t be much help
When RegExps Go Bad
● Websites that don’t accept special
characters in email addresses, URLs,
telephone numbers, etc
● May be RegExps that are too restrictive
● Doesn’t take into account all variations of a
pattern
● Longer expressions are difficult to grok
In a Nutshell
“Some people, when confronted with a
problem, think ‘I know, I'll use regular
expressions.’ Now they have two problems.”
-Jamie Zawinski
Brain Teaser
Which of the following a valid email address?
1. thehoagie@gmail.com
2. ben.simpson+work@analoganalytics.com
3. ben+email
4. http://www.clayton.edu
5. abc."defghi".xyz@example.com
Thinking about Email Address
● Has a local part (e.g. thehub@clayton.edu)
● Has a domain part (e.g. thehub@clayton.
edu)
● Has an @ symbol in the middle
● Do we need to support special characters?
● Can we verify based on minimum /
maximum length?
Best to Keep It Simple!
String: thehoagie@gmail.com
RegExp: .*@.*
Yeah, but isn’t here an official email Regex that
takes all the patterns into account? Yes...
RFC 5322 - The Email RegExp
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
| "(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]
| [x01-x09x0bx0cx0e-x7f])*")
@ (?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
| [(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:
(?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f]
| [x01-x09x0bx0cx0e-x7f])+)
])
Maybe this instead?

(╯°□°)╯︵ ┻━┻)
┬─┬ ノ( ゜-゜ノ)

(Let me put that back for you)
Brain Teaser
Which is a valid zipcode?
1. 30022
2. 30022-7155
3. 300131
4. -7155
5. AB123XY
Thinking About a Zipcode
●
●
●
●
●

Digits only
5 digits mandatory plus optional 4 digit code
4 digit code suffixed with hyphen
Do other countries use zip codes?
Pattern is easier because there is less
variation (Thank USPS!)
Brain Teaser
Which is a valid URL?
1. http://www.clayton.edu
2. www.clayton.edu
3. clayton.edu
4. thehub.clayton.edu
5. ben:pass@clayton.edu:80/foo?bar=baz#qux
Thinking about a URL
Ben Simpson
thehoagie@gmail.com
@mrfrosti
Extra Credit
●
●
●
●
●

IP address
HTML Tag contents
Validating a password against requirements
Dates
Times

Regular expression presentation for the HUB

  • 1.
  • 2.
    Introductions ● ● ● ● Working with webtechnologies for 10 years Former HUB supervisor Tour de jobs: http://tinyurl.com/kmsns38 Graduated from CSU with a BAS in Technology Management 2013 ● Husband and proud father ● Presenter on regular expressions!
  • 3.
    What Is aRegular Expression? Pattern matching
  • 4.
    What Could IDo With a RegExp? ● ● ● ● ● ● Searching Syntax highlighting Data validation Sanitation Data queries / extraction Many tasks that require matching a pattern
  • 5.
    RegExps Won’t LetYou Time Travel
  • 6.
    Brain Teaser Which ofthe following is a valid telephone number? 1. 678 466 4000 2. (678) 466-4000 3. 1234 4. domainuser 5. 1 (800) 1234 567
  • 7.
    How did youknow? Depends on who you ask...
  • 8.
    We Pattern MatchEvery Day ● Telephone numbers follow a pattern that we recognize ● This pattern has rules (3 digit zip, 7 digit number, numeric only) ● There are often many variations to a pattern (optional intl code)
  • 9.
    Literal Characters String: Thecat in the hat RegExp: /at/ The cat in the hat
  • 10.
    Regular Expressions inJavascript var haystack = "The cat in the hat"; var needle = new RegExp(/cat/); haystack.match(needle); // truthy needle = new RegExp(/dog/); haystack.match(needle); // falsey
  • 11.
    Well that wasn’tso bad The best is yet to come!
  • 12.
    Special Characters (Metacharacters) ● - escape character ● ^ - beginning of line (not inside brackets) ● $ - ending of line ● . - wildcard ● | - or junction ● ● ● ● ● ● ? - zero or one * - zero or more + - one or more () - grouping [] - character set {} - repetition
  • 14.
    Demonstration of SpecialCharacters String: ...To login to your email use the username: “ben.simpson@mail.com” with a password “password123”... RegExp: /username "(.*)" .* password "(.*)"/ Results: 1. ben.simpson@mail.com 2. password123
  • 15.
    Shorthand Character Classes ●d - digit [0-9] ● w - word ● s - whitespace ● D - digit [^d] ● W - word [^w] ● S - whitespace [^s]
  • 16.
    Wait a Second! Yousaid this was easy
  • 17.
    Thinking about aTelephone Pattern ● ● ● ● ● ● ● ● ● Optional international code 3 digit area code 7 digit number Optional extension What about alpha phrases? (e.g. 678 466-HELP) What is the length of intl codes? (e.g. 358 for Finland) Are parenthesis optional? Is spacing optional? Country specific formats (e.g. France 06 87 71 23 45)
  • 18.
    Regular Expression -Telephone # String: 678 466 4357 RegExp: d{3} d{3} d{4} String: (678) 466-4357 RegExp: (d{3}) d{3}-d{4}
  • 19.
    Telephone # -Two Variations String: 678 466 4357 (678) 466-4357 RegExp: (?d{3})? d{3}[s-]d{4}
  • 20.
    Telephone # -Three Variations String: 678 466 4357 (678) 466-4357 1 (678) 466-4357 RegExp: d*s?(?d{3})? d{3}[s-]?d{4}
  • 21.
  • 22.
    Surprisingly Difficult ● Seeminglysimple patterns can become very complex. ● Its best to work against data that is consistent, or regular in its implementation of patterns ● If the data is too dirty, a regular expression won’t be much help
  • 23.
    When RegExps GoBad ● Websites that don’t accept special characters in email addresses, URLs, telephone numbers, etc ● May be RegExps that are too restrictive ● Doesn’t take into account all variations of a pattern ● Longer expressions are difficult to grok
  • 25.
    In a Nutshell “Somepeople, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.” -Jamie Zawinski
  • 26.
    Brain Teaser Which ofthe following a valid email address? 1. thehoagie@gmail.com 2. ben.simpson+work@analoganalytics.com 3. ben+email 4. http://www.clayton.edu 5. abc."defghi".xyz@example.com
  • 27.
    Thinking about EmailAddress ● Has a local part (e.g. thehub@clayton.edu) ● Has a domain part (e.g. thehub@clayton. edu) ● Has an @ symbol in the middle ● Do we need to support special characters? ● Can we verify based on minimum / maximum length?
  • 28.
    Best to KeepIt Simple! String: thehoagie@gmail.com RegExp: .*@.* Yeah, but isn’t here an official email Regex that takes all the patterns into account? Yes...
  • 29.
    RFC 5322 -The Email RegExp (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)* | "(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f] | [x01-x09x0bx0cx0e-x7f])*") @ (?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])? | [(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]: (?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f] | [x01-x09x0bx0cx0e-x7f])+) ])
  • 30.
  • 31.
    ┬─┬ ノ( ゜-゜ノ) (Letme put that back for you)
  • 32.
    Brain Teaser Which isa valid zipcode? 1. 30022 2. 30022-7155 3. 300131 4. -7155 5. AB123XY
  • 33.
    Thinking About aZipcode ● ● ● ● ● Digits only 5 digits mandatory plus optional 4 digit code 4 digit code suffixed with hyphen Do other countries use zip codes? Pattern is easier because there is less variation (Thank USPS!)
  • 34.
    Brain Teaser Which isa valid URL? 1. http://www.clayton.edu 2. www.clayton.edu 3. clayton.edu 4. thehub.clayton.edu 5. ben:pass@clayton.edu:80/foo?bar=baz#qux
  • 35.
  • 36.
  • 37.
    Extra Credit ● ● ● ● ● IP address HTMLTag contents Validating a password against requirements Dates Times