Introduction to regular expressions
Upcoming SlideShare
Loading in...5

Introduction to regular expressions



A quick start introduction to the world of regular expressions, through special characters, quantifiers, character classes..

A quick start introduction to the world of regular expressions, through special characters, quantifiers, character classes..

Assumes no knowledge of regular expressions.



Total Views
Views on SlideShare
Embed Views



1 Embed 4 4



Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Pluarals
  • There's a lot of shorthand when talking about Perl. e.g. Array of Arrays. I'll try to avoid this shorthand.
  • See handout
  •  – reject any match where the cursor is not now at the end of the input
  • There are a load on your handout

Introduction to regular expressions Introduction to regular expressions Presentation Transcript

  • Quick Intro to Regexen Brian McCauley (nobull)
  • About this talk
    • For Perl Newbies
    • For Regex Newbies
    • Assumes programming experience
    • Only scratches surface
      • Full tutorial could last days
    • Takes some liberties
    • Somewhat revised compared to proceedings
    • Not suitable for world authorities!
  • What is a RE?
    • Compact description of a set of strings
    • Notation does not a regex make
    • We're talking Perl notation
  • Truly “Regular”?
    • “Regular expression” from formal language theory
    • True regular expressions only a tiny subset of what we commonly mean
    • Perl5 (Java, Ruby etc..) regex perhaps better called “patterns”
      • I'll tend to use the terms interchangeably
  • Notational aside
    • Perl patterns conventionally written between //
    • One writes “the pattern /foo/”
      • Looks just like pattern match operator
      • But it's not
    • I'm talking about the pattern
    • I'm not talking about the match operator
  • Simple regex syntax
    • Literal characters / tokens match a literal
      • Alphanumerics
      • Escaped non-alphanumerics
      • (Most) double-quotish escapes
    • Anything else may have special meaning
      • Without specials, a pattern describes one string
    • Concatenation is concatenation
  • “Matches” v “Describes”
    • Initially said “RE describes a set of strings”
    • Why do I keep saying “matches”?
    • Can also think of a pattern as a bit of code
      • Passed an input string (and a cursor)
      • Locates string described by the RE (following the cursor)
      • May also record additional information
  • “Matches” v “Matches”
    • People use “matches” loosely
    • Shorthand terminology
      • Usually clear from context
      • Confusion if shorthand taken literally
  • Alternation
    • Match “this or that”
    • Lower precedence than concatenation
    • Parentheses DWIM
    • Grouping with parentheses has a side-effect
  • Character classes
    • Alternation of a single token (character)
    • Negation
      • /[^ac]/ any single character other than 'a' or 'c'
  • Shorthand character classes
    • The (almost) universal class
      • Sometimes any character at all (depends on switches)
    • “Well known” classes
  • Character encoding
    • Beyond chr(127) “DWIM” gets complicated!
      • Locales, Unicode (the utf8 flag)
      • Exact version of Perl
      • Cited as one of the most annoying features in Perl
  • Quantifiers
    • Match a number of repeats of pattern
    • Pattern, not string, repeated
    • Range (can be open-ended)
    • Precedence
  • Quantifiers
    • Shorthand forms for well known ranges
  • Best match
    • Theoretical RE just defines a set of strings
    • Matching in Perl also says what it matched
      • But a lot of possible matches
      • 19 in all!
    • Choose the first match found
      • For some definition of “first”
  • First match
    • Must match complete pattern
    • First starting position in input
    • First choice in alternation
    • Most repeats in repeat
  • Non-greedy
    • Usual rule “as many repeats as possible”
    • Can also go for the fewest
    • Only useful in the context of a larger expression
  • Greedy but impatient
    • Remember (non-)greediness is local
    • This is sometimes called “eager” or “impatient”
      • I've got a complete match so take it
    • But “must match whole pattern still applies”
  • Anchors
    • Zero-width assertions - match the empty string
    • Only where something that I assert holds true
      • Gross simplification!
    • These assertions also called “anchors”
      • Using term “anchor” for the more complex zero-width assertions can result in false expectations
  • Capturing
    • Match can return more than overall position
    • Records last cursor position at each ( )
    • “captures” the bit between
      • $1='g'
      • $2='34'
      • $3='3'
    • There's an overhead so can group without capture
    1 2 3
  • Back references
    • Match whatever a previous capture matched
    2 nd caputure – any single character As few characters as possible The character we captured before
  • Switches
    • Vagueness earlier
    • Controlled by switches
      • Usually referred to as /i /m /x and /s
  • The rest!
    • This is only a tiny subset
    • Lots more assertions
    • The Perl substitution operator s///
    • Naming your captures
    • Embedding Perl code in your regex
    • Creating complex grammars by defining named subpatterns and using them later
    • It would take an hour just to enumerate them!
  • Live floor show
    • Requests?
    • Questions?