Introduction to regular expressions
Upcoming SlideShare
Loading in...5
×
 

Introduction to regular expressions

on

  • 1,491 views

A quick start introduction to the world of regular expressions, through special characters, quantifiers, character classes..

A quick start introduction to the world of regular expressions, through special characters, quantifiers, character classes..

Assumes no knowledge of regular expressions.

Statistics

Views

Total Views
1,491
Views on SlideShare
1,487
Embed Views
4

Actions

Likes
0
Downloads
28
Comments
0

1 Embed 4

http://www.slideshare.net 4

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Pluarals
  • There's a lot of shorthand when talking about Perl. e.g. Array of Arrays. I'll try to avoid this shorthand.
  • See handout
  •  – reject any match where the cursor is not now at the end of the input
  • There are a load on your handout

Introduction to regular expressions Introduction to regular expressions Presentation Transcript

  • Quick Intro to Regexen Brian McCauley (nobull) Birmingham.pm
  • About this talk
    • For Perl Newbies
    • For Regex Newbies
    • Assumes programming experience
    • Only scratches surface
      • Full tutorial could last days
    • Takes some liberties
    • Somewhat revised compared to proceedings
    • Not suitable for world authorities!
  • What is a RE?
    • Compact description of a set of strings
    • Notation does not a regex make
    • We're talking Perl notation
  • Truly “Regular”?
    • “Regular expression” from formal language theory
    • True regular expressions only a tiny subset of what we commonly mean
    • Perl5 (Java, Ruby etc..) regex perhaps better called “patterns”
      • I'll tend to use the terms interchangeably
  • Notational aside
    • Perl patterns conventionally written between //
    • One writes “the pattern /foo/”
      • Looks just like pattern match operator
      • But it's not
    • I'm talking about the pattern
    • I'm not talking about the match operator
  • Simple regex syntax
    • Literal characters / tokens match a literal
      • Alphanumerics
      • Escaped non-alphanumerics
      • (Most) double-quotish escapes
    • Anything else may have special meaning
      • Without specials, a pattern describes one string
    • Concatenation is concatenation
  • “Matches” v “Describes”
    • Initially said “RE describes a set of strings”
    • Why do I keep saying “matches”?
    • Can also think of a pattern as a bit of code
      • Passed an input string (and a cursor)
      • Locates string described by the RE (following the cursor)
      • May also record additional information
  • “Matches” v “Matches”
    • People use “matches” loosely
    • Shorthand terminology
      • Usually clear from context
      • Confusion if shorthand taken literally
  • Alternation
    • Match “this or that”
    • Lower precedence than concatenation
    • Parentheses DWIM
    • Grouping with parentheses has a side-effect
  • Character classes
    • Alternation of a single token (character)
    • Negation
      • /[^ac]/ any single character other than 'a' or 'c'
  • Shorthand character classes
    • The (almost) universal class
      • Sometimes any character at all (depends on switches)
    • “Well known” classes
  • Character encoding
    • Beyond chr(127) “DWIM” gets complicated!
      • Locales, Unicode (the utf8 flag)
      • Exact version of Perl
      • Cited as one of the most annoying features in Perl
  • Quantifiers
    • Match a number of repeats of pattern
    • Pattern, not string, repeated
    • Range (can be open-ended)
    • Precedence
  • Quantifiers
    • Shorthand forms for well known ranges
  • Best match
    • Theoretical RE just defines a set of strings
    • Matching in Perl also says what it matched
      • But a lot of possible matches
      • 19 in all!
    • Choose the first match found
      • For some definition of “first”
  • First match
    • Must match complete pattern
    • First starting position in input
    • First choice in alternation
    • Most repeats in repeat
  • Non-greedy
    • Usual rule “as many repeats as possible”
    • Can also go for the fewest
    • Only useful in the context of a larger expression
  • Greedy but impatient
    • Remember (non-)greediness is local
    • This is sometimes called “eager” or “impatient”
      • I've got a complete match so take it
    • But “must match whole pattern still applies”
  • Anchors
    • Zero-width assertions - match the empty string
    • Only where something that I assert holds true
      • Gross simplification!
    • These assertions also called “anchors”
      • Using term “anchor” for the more complex zero-width assertions can result in false expectations
  • Capturing
    • Match can return more than overall position
    • Records last cursor position at each ( )
    • “captures” the bit between
      • $1='g'
      • $2='34'
      • $3='3'
    • There's an overhead so can group without capture
    1 2 3
  • Back references
    • Match whatever a previous capture matched
    2 nd caputure – any single character As few characters as possible The character we captured before
  • Switches
    • Vagueness earlier
    • Controlled by switches
      • Usually referred to as /i /m /x and /s
  • The rest!
    • This is only a tiny subset
    • Lots more assertions
    • The Perl substitution operator s///
    • Naming your captures
    • Embedding Perl code in your regex
    • Creating complex grammars by defining named subpatterns and using them later
    • It would take an hour just to enumerate them!
  • Live floor show
    • Requests?
    • Questions?