0
Regular Expressions
Ben Simpson - <3 HUB
Introductions
●
●
●
●

Working with web technologies for 10 years
Former HUB supervisor
Tour de jobs: http://tinyurl.com/k...
What Is a Regular
Expression?
Pattern matching
What Could I Do With a RegExp?
●
●
●
●
●
●

Searching
Syntax highlighting
Data validation
Sanitation
Data queries / extrac...
RegExps Won’t Let You Time Travel
Brain Teaser
Which of the following is a valid telephone
number?
1. 678 466 4000
2. (678) 466-4000
3. 1234
4. domainuser
5...
How did you know?
Depends on who you ask...
We Pattern Match Every Day
● Telephone numbers follow a pattern that we
recognize
● This pattern has rules (3 digit zip, 7...
Literal Characters
String: The cat in the hat
RegExp: /at/
The cat in the hat
Regular Expressions in Javascript
var haystack = "The cat in the hat";
var needle = new RegExp(/cat/);
haystack.match(need...
Well that wasn’t so bad
The best is yet to come!
Special Characters (Metacharacters)
●  - escape character
● ^ - beginning of line (not
inside brackets)

● $ - ending of l...
Demonstration of Special Characters
String: ...To login to your email use the
username: “ben.simpson@mail.com” with a
pass...
Shorthand Character Classes
● d - digit [0-9]
● w - word
● s - whitespace

● D - digit [^d]
● W - word [^w]
● S - whitespa...
Wait a Second!
You said this was easy
Thinking about a Telephone Pattern
●
●
●
●
●
●
●
●
●

Optional international code
3 digit area code
7 digit number
Optiona...
Regular Expression - Telephone #
String: 678 466 4357
RegExp: d{3} d{3} d{4}
String: (678) 466-4357
RegExp: (d{3}) d{3}-d{...
Telephone # - Two Variations
String: 678 466 4357
(678) 466-4357
RegExp: (?d{3})? d{3}[s-]d{4}
Telephone # - Three Variations
String: 678 466 4357
(678) 466-4357
1 (678) 466-4357
RegExp: d*s?(?d{3})? d{3}[s-]?d{4}
That Escalated Quickly
Surprisingly Difficult
● Seemingly simple patterns can become very
complex.
● Its best to work against data that is
consis...
When RegExps Go Bad
● Websites that don’t accept special
characters in email addresses, URLs,
telephone numbers, etc
● May...
In a Nutshell
“Some people, when confronted with a
problem, think ‘I know, I'll use regular
expressions.’ Now they have tw...
Brain Teaser
Which of the following a valid email address?
1. thehoagie@gmail.com
2. ben.simpson+work@analoganalytics.com
...
Thinking about Email Address
● Has a local part (e.g. thehub@clayton.edu)
● Has a domain part (e.g. thehub@clayton.
edu)
●...
Best to Keep It Simple!
String: thehoagie@gmail.com
RegExp: .*@.*
Yeah, but isn’t here an official email Regex that
takes ...
RFC 5322 - The Email RegExp
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
| "(?:[x01-x08x0bx0cx0e-x1fx...
Maybe this instead?

(╯°□°)╯︵ ┻━┻)
┬─┬ ノ( ゜-゜ノ)

(Let me put that back for you)
Brain Teaser
Which is a valid zipcode?
1. 30022
2. 30022-7155
3. 300131
4. -7155
5. AB123XY
Thinking About a Zipcode
●
●
●
●
●

Digits only
5 digits mandatory plus optional 4 digit code
4 digit code suffixed with h...
Brain Teaser
Which is a valid URL?
1. http://www.clayton.edu
2. www.clayton.edu
3. clayton.edu
4. thehub.clayton.edu
5. be...
Thinking about a URL
Ben Simpson
thehoagie@gmail.com
@mrfrosti
Extra Credit
●
●
●
●
●

IP address
HTML Tag contents
Validating a password against requirements
Dates
Times
Regular expression presentation for the HUB
Regular expression presentation for the HUB
Upcoming SlideShare
Loading in...5
×

Regular expression presentation for the HUB

334

Published on

Presentation I did for the helpdesk of my alma mater

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
334
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Regular expression presentation for the HUB"

  1. 1. Regular Expressions Ben Simpson - <3 HUB
  2. 2. Introductions ● ● ● ● Working with web technologies for 10 years Former HUB supervisor Tour de jobs: http://tinyurl.com/kmsns38 Graduated from CSU with a BAS in Technology Management 2013 ● Husband and proud father ● Presenter on regular expressions!
  3. 3. What Is a Regular Expression? Pattern matching
  4. 4. What Could I Do With a RegExp? ● ● ● ● ● ● Searching Syntax highlighting Data validation Sanitation Data queries / extraction Many tasks that require matching a pattern
  5. 5. RegExps Won’t Let You Time Travel
  6. 6. Brain Teaser Which of the following is a valid telephone number? 1. 678 466 4000 2. (678) 466-4000 3. 1234 4. domainuser 5. 1 (800) 1234 567
  7. 7. How did you know? Depends on who you ask...
  8. 8. We Pattern Match Every Day ● Telephone numbers follow a pattern that we recognize ● This pattern has rules (3 digit zip, 7 digit number, numeric only) ● There are often many variations to a pattern (optional intl code)
  9. 9. Literal Characters String: The cat in the hat RegExp: /at/ The cat in the hat
  10. 10. Regular Expressions in Javascript var haystack = "The cat in the hat"; var needle = new RegExp(/cat/); haystack.match(needle); // truthy needle = new RegExp(/dog/); haystack.match(needle); // falsey
  11. 11. Well that wasn’t so bad The best is yet to come!
  12. 12. Special Characters (Metacharacters) ● - escape character ● ^ - beginning of line (not inside brackets) ● $ - ending of line ● . - wildcard ● | - or junction ● ● ● ● ● ● ? - zero or one * - zero or more + - one or more () - grouping [] - character set {} - repetition
  13. 13. Demonstration of Special Characters String: ...To login to your email use the username: “ben.simpson@mail.com” with a password “password123”... RegExp: /username "(.*)" .* password "(.*)"/ Results: 1. ben.simpson@mail.com 2. password123
  14. 14. Shorthand Character Classes ● d - digit [0-9] ● w - word ● s - whitespace ● D - digit [^d] ● W - word [^w] ● S - whitespace [^s]
  15. 15. Wait a Second! You said this was easy
  16. 16. Thinking about a Telephone Pattern ● ● ● ● ● ● ● ● ● Optional international code 3 digit area code 7 digit number Optional extension What about alpha phrases? (e.g. 678 466-HELP) What is the length of intl codes? (e.g. 358 for Finland) Are parenthesis optional? Is spacing optional? Country specific formats (e.g. France 06 87 71 23 45)
  17. 17. Regular Expression - Telephone # String: 678 466 4357 RegExp: d{3} d{3} d{4} String: (678) 466-4357 RegExp: (d{3}) d{3}-d{4}
  18. 18. Telephone # - Two Variations String: 678 466 4357 (678) 466-4357 RegExp: (?d{3})? d{3}[s-]d{4}
  19. 19. Telephone # - Three Variations String: 678 466 4357 (678) 466-4357 1 (678) 466-4357 RegExp: d*s?(?d{3})? d{3}[s-]?d{4}
  20. 20. That Escalated Quickly
  21. 21. Surprisingly Difficult ● Seemingly simple patterns can become very complex. ● Its best to work against data that is consistent, or regular in its implementation of patterns ● If the data is too dirty, a regular expression won’t be much help
  22. 22. When RegExps Go Bad ● Websites that don’t accept special characters in email addresses, URLs, telephone numbers, etc ● May be RegExps that are too restrictive ● Doesn’t take into account all variations of a pattern ● Longer expressions are difficult to grok
  23. 23. In a Nutshell “Some people, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.” -Jamie Zawinski
  24. 24. Brain Teaser Which of the following a valid email address? 1. thehoagie@gmail.com 2. ben.simpson+work@analoganalytics.com 3. ben+email 4. http://www.clayton.edu 5. abc."defghi".xyz@example.com
  25. 25. Thinking about Email Address ● Has a local part (e.g. thehub@clayton.edu) ● Has a domain part (e.g. thehub@clayton. edu) ● Has an @ symbol in the middle ● Do we need to support special characters? ● Can we verify based on minimum / maximum length?
  26. 26. Best to Keep It Simple! String: thehoagie@gmail.com RegExp: .*@.* Yeah, but isn’t here an official email Regex that takes all the patterns into account? Yes...
  27. 27. RFC 5322 - The Email RegExp (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)* | "(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f] | [x01-x09x0bx0cx0e-x7f])*") @ (?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])? | [(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]: (?:[x01-x08x0bx0cx0e-x1fx21-x5ax53-x7f] | [x01-x09x0bx0cx0e-x7f])+) ])
  28. 28. Maybe this instead? (╯°□°)╯︵ ┻━┻)
  29. 29. ┬─┬ ノ( ゜-゜ノ) (Let me put that back for you)
  30. 30. Brain Teaser Which is a valid zipcode? 1. 30022 2. 30022-7155 3. 300131 4. -7155 5. AB123XY
  31. 31. Thinking About a Zipcode ● ● ● ● ● Digits only 5 digits mandatory plus optional 4 digit code 4 digit code suffixed with hyphen Do other countries use zip codes? Pattern is easier because there is less variation (Thank USPS!)
  32. 32. Brain Teaser Which is a valid URL? 1. http://www.clayton.edu 2. www.clayton.edu 3. clayton.edu 4. thehub.clayton.edu 5. ben:pass@clayton.edu:80/foo?bar=baz#qux
  33. 33. Thinking about a URL
  34. 34. Ben Simpson thehoagie@gmail.com @mrfrosti
  35. 35. Extra Credit ● ● ● ● ● IP address HTML Tag contents Validating a password against requirements Dates Times
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×