Regular expressions

Regular Expressions

How do they work

Several important Facts
1. Everything in computing was discovered in
one form or another in the 70-80’s and was
probably thought about during the 60’s.
2. The easiest way to become a great computer
engineer in the 80’s was to work for Bell Labs
and have a beard.

What are regular expressions?
From Wikipedia:
In computing, a regular expression provides a
concise and flexible means to "match" (specify
and recognize) strings of text, such as particular
characters, words, or patterns of characters.
Common abbreviations for "regular expression"
include regex and regexp.

Why do we need regular expressions
(in programming)
Many reasons but most of them are in their base
finding strings in text .
Preferably without reading it

^(?("")(""[^""]+?""@)|(([0-9a-z]((.(?!.))|[-
!#$%&'*+/=?^`{}|~w])*)(?<=[0-9a-
z])@))(?([)([(d{1,3}.){3}d{1,3}])|(([0-9a-z][-
w]*[0-9a-z]*.)+[a-z0-9]{2,17}))$

^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z])S{8,}$

Regular Expressions Syntax
meta characters
 Grouping
 . – match any other character
 [ ] – grouping, match single character that is inside the group
 [^ ] – grouping, match single character that is not inside the group
 ( ) – sub expression, in Perl can be recalled later from special variables
 Quantifier
 {m,n} –specifies that the character/sub expression before need to be matched
at least m times and no more than n times
 * - derived from Kleene star in formal logic, matches 0 or more amount of the
character before it.
 ? –matches zero or one of the preceding elements
 + - derived from Kleene cross in formal logic, matches 1 or more of the
character before it.
 Location
 ^ - Marking start of line
 $ - Marking end of line

Regular Expressions Syntax
Character groups
 [:alpha:] - Any alphabetical character - [A-Za-z]
 [:alnum:] - Any alphanumeric character - [A-Za-z0-9]
 [:ascii:] - Any character in the ASCII character set.[:blank:] - A GNU
extension, equal to a space or a horizontal tab ("t")
 [:cntrl:] - Any control character
 [:digit:] - Any decimal digit - [0-9], equivalent to "d“
 [:graph:] - Any printable character, excluding a space
 [:lower:] - Any lowercase character - [a-z]
 [:print:] - Any printable character, including a space
 [:punct:] - Any graphical character excluding "word" characters
 [:space:] - Any whitespace character. "s" plus the vertical tab ("cK")
 [:upper:] - Any uppercase character - [A-Z]
 [:word:] - A Perl extension - [A-Za-z0-9_], equivalent to "w“
 [:xdigit:] - Any hexadecimal digit - [0-9a-fA-F].

What is a regular expression engine
A regular expression engine is a program that takes
a set of constraints specified in a mini-
language, and then applies those constraints to a
target string, and determines whether or not the
string satisfies the constraints.

In less grandiose terms, the first part of the job is to
turn a pattern into something the computer can
efficiently use to find the matching point in the
string, and the second part is performing the search
itself.

How the Perl Regex engine works
• Unlike the army only two steps
– Compilation
• Parsing (Size, Construction)
• Peep-hole optimization and analysis
– Execution
• Start position and no-match optimizations
• Program execution

NFA
Equal in strength to DFA
Smaller in size

Thompson NFA method
• In 1968 Thompson wrote an article on how to
convert a regular expression to still unnamed
automata (NFA)
• The article included code to explain the point

Thompson NFA method
1. Check the regex and inject . For concat action
a(b|c)*d
2. Convert to reverse polish notation
abc|*.d.

Thompson NFA method cont.
Check single character

OR
char

exp
exp
Kleene star

exp

Thompson NFA method cont.
• 3.Build the NFA

B
A
C

D

Problems for regex
• NLP

• Unicode vs. ASCII

Some examples of Regex
• ([^s]+(.(?i)(jpg|png|gif|bmp))$)
– Match file with specific extentions
• ^(https?://)?([da-z.-]+).([a-z.]{2,6})([/w
.-]*)*/?$
– Match URL
• /^#?([a-f0-9]{6}|[a-f0-9]{3})$/
– Match a hex value
• [ -~]
– An interesting one.

Regular expressions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Regular expressions

Similar to Regular expressions (20)

Recently uploaded

Recently uploaded (20)

Regular expressions

Editor's Notes