• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Perl Development (Sample Courseware)
 

Perl Development (Sample Courseware)

on

  • 3,097 views

 

Statistics

Views

Total Views
3,097
Views on SlideShare
3,091
Embed Views
6

Actions

Likes
5
Downloads
0
Comments
0

2 Embeds 6

http://www.slideshare.net 4
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Perl Development (Sample Courseware) Perl Development (Sample Courseware) Presentation Transcript

    • Perl Development Sample Courseware Created October 2008 © Garth Gilmour 2008
    • Overview
      • This is a three day course
        • Core hours and breaks are flexible…
      • The course has three goals
        • Familiarize you with the Perl language
        • Learn the different styles of Perl development
          • Utility scripts, longer programs and OO applications
        • Explore commonly used Perl modules (libraries)
      • Please control the course
        • Ask as many questions as possible
        • Speed up or slow down the pace
        • Request extra examples and exercises
        • Don’t sit in misery!!
      © Garth Gilmour 2008
    • Introduction to Perl History and Basic Concepts © Garth Gilmour 2008
    • Introduction to Perl
      • The Perl language was created by Larry Wall to simplify his text manipulation problems
        • It contains a superset of all the functionality provided by shell scripting, sed and awk, plus many extra features
      • The language is supported by a huge set of libraries
        • Which can be downloaded free of charge from the Comprehensive Perl Archive Network website (CPAN)
      • There are two expansions of the name:
        • P ractical E xtracting and R eporting L anguage
        • P athologically E clectic R ubbish L ister
      © Garth Gilmour 2008
    • Versions of Perl and Competitors
      • Version 5 is the current edition of Perl
        • It represents how far the language and interpreter could progress whilst maintaining backward compatibility
      • Perl 6 is a complete rewrite
        • Of both the language and the interpreter
          • The Perl 6 interpreter is known as Parrot
        • It has been under development for a long time
      • Today Perl has serious competition
        • Python and Ruby are closely related languages
        • They both aim to have cleaner syntax and better OO
      © Garth Gilmour 2008
    • Comparing Perl, Python and Ruby © Garth Gilmour 2008 Perl Python Ruby Documentation ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Library Support ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ OO Support ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Power ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Approachability ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Ease of Mastery ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ C / C++ Interop ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Java / .NET Interop ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Web Frameworks ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦
    • Common Applications of Perl
      • There are three possible levels of Perl coding:
        • Short utility scripts
          • Written in a procedural manner
          • Without high level organization
        • Medium size programs
          • Divided into multiple subroutines grouped within modules
        • Large applications
          • Built using the concepts of object-oriented design
      • Perl is happiest at the first level
        • Historically that’s where its home is
        • Problems exist at the other levels
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 Scripts Core language features Programs Strict module Subroutines References Applications Best Practises Named Parameters Modules and classes
    • Learning Programming in Perl
      • You can start coding in Perl very quickly
        • Just place some code within a ‘.pl’ file
        • No need to create a special ‘main’ method
      • But Perl isn't a beginners language
        • The interpreter assumes you know what you are doing
          • Many things that you expect not to compile actually do
        • E.g. ‘$fred{‘barney’} = [‘wilma’, ‘betty’]’ creates:
          • A table (aka hash) called ‘fred’ with one row
          • In the single row the key is ‘barney’
          • The value is the address of a new array
          • The array has two boxes holding ‘wilma’ and ‘betty’
      © Garth Gilmour 2008
    • Common Applications of Perl
      • Perls ‘sweet spot’ is text manipulation
        • Reading text from a file, loading it into data structures manipulating it and generating formatted output
      • This utility is exploited in three ways
        • By system administrators managing large networks
        • By developers and QA staff writing test harnesses
        • By authors and publishers managing documents
      • Perl used to be the main language for Web Apps
        • But has been replaced by languages with better networking support and component models (mainly Java and C#)
      © Garth Gilmour 2008
    • Declaring Variables in Perl
      • Perl prefixes variables with special symbols
        • These are commonly known as sigils
        • In advanced coding they are used in combination
      • Sigils make Perl code appear baffling at first
        • Once you learn to read them they are a big help
      © Garth Gilmour 2008 Sigil Description $ Scalar variable - holds a single number string or reference @ Array variable - a sequence of one or more scalars % Hash variable - a maps of keys and values (both scalars) Reference - used to specify a scalar holds a memory address & Function - used to specify a symbol is a function name * Typeglob - used for manipulating symbol tables (advanced)
    • Variables and Barewords
      • Forgetting to add a sigil is a common mistake
        • It can lead to unpredictable results in your code
      • An identifier without a sigil is a ‘bareword’
        • The interpreter searches for a subroutine, package name, label or file-handle with that name
        • Otherwise the identifier is assumed to be an unquoted string
      • You should never deliberately use barewords
        • The meaning of your code will change if someone introduces a function or filename with the same value
        • Consider ‘print abc, “def”, “ghi”’
          • The bareword ‘abc’ is understood as a filehandle
      © Garth Gilmour 2008
    • Variables and Symbol Tables
      • Sigils in Perl enable an unusual language feature
        • You can have more than one variable with the same name
        • As long as the variables are of different types
      • Perl stores details of variables in symbol tables
        • Each symbol name (e.g. ‘fred’) is associated with a typeglob
        • A typeglob stores the memory associated with the scalar called ‘fred’, the array and hash called ‘fred’ etc…
      • Advanced coding techniques utilize typeglobs
        • They are data types in their own right with their own sigil
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 Symbol Table Typeglob $fred = 12; @fred = (12,13,14); %fred = (k1 =>12, k2 => 14); sub fred { return 101; } Name Link fred Others… Type Link $ @ % &
    • Perl Comments
      • Standard Perl comments start with ‘#’
        • This comments out everything to the end of the line
      • A multi-line comment is here-document or ‘heredoc’
        • This is signified by ‘<<‘ followed by an identifier
        • There must be no space between the two
          • Unless the identifier is quoted
        • The identifier is taken as terminating the comment
          • When it appears by itself without quotes or whitespace
      • Note that comments are allowed in regex’s
        • If you use the extended regular expression syntax
      © Garth Gilmour 2008
    • Using Heredoc Comments © Garth Gilmour 2008 $myvar = << &quot;THE_END&quot;; More than prince of cats, I can tell you. O, he is the courageous captain of compliments. He fights as you sing prick-song, keeps time, distance, and proportion; rests me his minim rest, one, two, and the third in your bosom: the very butcher of a silk button, a duellist, a duellist; a gentleman of the very first house, of the first and second cause: ah, the immortal passado! the punto reverso! the hay! THE_END print &quot;--- START DATA --- &quot;, $myvar, &quot;--- END DATA --- &quot;;
    • The Perl Language and Modules
      • Perl libraries are organized as ‘modules’
        • One of Perl’s strengths is their number and variety
        • All of them can be downloaded from CPAN
      • The distinction between Perl and its libraries is fuzzy
        • Pragmatic Modules supplement the core language
        • By interacting with the interpreter through symbol tables
      • An important library for large projects is ‘strict’
        • This causes a range of extra checks to be run on your code
        • They can all be run or enabled on an individual basis
      © Garth Gilmour 2008
    • The Strict Module
      • Strict is a module that changes how you write code
        • It performs extra checks that change what count as valid Perl
        • Including it is a best practise for large scripts
      • The checks can be enabled selectively
        • E.g. ‘use strict “vars”’ turns on explicit declarations
      © Garth Gilmour 2008 Example Declaration Description use strict ‘vars’ User defined variables must be declared by: ♦ Using the ‘my’ or ‘our’ functions ♦ Prefixing the variable with a package name use strict ‘refs’ Symbolic references (an obsolete feature) are not allowed use strict ‘subs’ Barewords are treated as syntax errors, rather than being interpreted as subroutine names or unquoted strings
    • Basic Programming The Core Perl Syntax © Garth Gilmour 2008
    • Introducing Scalar Variables
      • A scalar variable is a single box
        • It is prefixed by the dollar sigil
      • Perl is a weakly typed language
        • A box may hold a number, a string or a memory address
        • If the scalar holds an address it is known as a ‘reference’
      • Type conversions occur automatically
        • So in the expression ‘$var1 = $var2 + $var3’ both types are converted to numbers before being added together
      • As with all variables sigils are created on demand
        • You don’t need to declare them separately
      © Garth Gilmour 2008
    • Scalar Variables and Operators
      • In strongly typed languages operators are overloaded
        • So ‘var1 + var2’ would add the variables if they were numbers but concatenate them if they were strings
        • If the types weren't matched there would be a compiler error
      • Weak typing means Perl cannot support overloading
        • Instead there must be an operator for each operation
        • In the case of addition:
          • The ‘+’ operator means add as numbers
          • The ‘.’ operator means concatenate
          • Conversions are made as required
      © Garth Gilmour 2008
    • Operators Commonly Used in Perl © Garth Gilmour 2008 Description Number Version String Version Addition $var1 + $var2 $var1 . $var2 Equality $var1 == $var2 $var1 eq $va2 Ordered Comparison >, <, <=, >= lt, gt, le, ge Power Of $var1 ** 3 $var1 x 3 Bitwise Comparison &, |, ^ (NB work differently for numbers and strings) Logical &&, ||, ! and, or, not (Lower precedence) Conditional $var1 = $var2 ? 12 : 14; Range 1..4 ‘ D’ .. ‘Z’
    • © Garth Gilmour 2008 $num1 = 42; $num2 = &quot;42&quot;; $result = $num1 + $num2; print &quot;adding numbers gives $result&quot;, &quot; &quot;x2; $result = $num1 . $num2; print &quot;adding strings gives $result&quot;, &quot; &quot;x2; $result = $num2 ** 3; print &quot;42 to the power of 3 is $result&quot;, &quot; &quot;x2; $result = $num1 x 3; print &quot;42 concatenated with itself three times is $result&quot;, &quot; &quot;x2; if($num1 == $num2) { print '$num1 and $num2 are equal as numbers',&quot; &quot;x2; } if($num1 eq $num2) { print '$num1 and $num2 are equal as strings',&quot; &quot;x2; }
    • String Values in Detail
      • Strings may be placed in single or double quotes
        • They have different meanings and are not interchangeable
      • Single quotes are not treated specially
        • The interpreter sees them as a plain sequence of characters
      • Double quotes cause variable interpolation
        • Perl searches for sigils in the string and replaces them with the value of the variable (creating it if required)
      • Note that you can also use ‘backtick’ quotes
        • These surround a string to be run as an OS command
        • E.g. ‘ $var1 = `ls -al` ’ runs the UNIX list command and stores the results in the variable ‘$var1’
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 $var1 = &quot;abc&quot;; $var2 = 123; $var3 = [&quot;ab&quot;,&quot;cd&quot;,&quot;ef&quot;]; print 'Values are $var1, $var2 and $var3 '; print &quot;Values are $var1, $var2 and $var3 &quot;; $path = `set path`; print &quot;Value of path environment variable is: $path&quot;; Values are $var1, $var2 and $var3 Values are abc, 123 and ARRAY(0x225e28) Value of path environment variable is: PATH=c:jdk1.5.0_05in;C:Perlsitein;C:Perlin;c: ubyin;C:WINDOWSsystem32;C:WINDOWS;C:WINDOWSSystem32Wbem;
    • What is Truth in Perl?
      • Truth is a source of confusion in Perl
        • Many Perl functions return boolean values
        • By Perl does not have a boolean type
      • Three things in Perl count as false
        • The empty string
        • The string or number “0”
        • An undefined variable
      • You can obtain the undefined value by:
        • Using a variable that has not been initialized
        • Passing a variable as an argument to ‘undef’
        • Using the return value from ‘undef’
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 $var1 = &quot;0&quot;; # The string &quot;0&quot; counts as false $var2 = &quot;&quot;; # The empty string also counts as false $var3 = &quot;AB&quot;; # Other string values are true $var4 = -12; # Other numerical values are true $var5; # Undefined values are false $var6 = undef(); # Values set to undef are false printTruth('$var1',$var1); printTruth('$var2',$var2); printTruth('$var3',$var3); printTruth('$var4',$var4); printTruth('$var5',$var5); printTruth('$var6',$var6); undef($var4); # Release storage space for var1 so it becomes undefined printTruth('$var4',$var4); sub printTruth { my ($varName,$varValue) = @_; if($varValue) { print &quot;$varName is true &quot;; } else { print &quot;$varName is false &quot;; } }
    • Special Scalar Variables
      • The Perl interpreter automatically creates variables
        • These variables are represented by symbols rather than names
      • This is a further source of confusion when learning Perl
        • The ‘English’ module renames the variables more clearly
      © Garth Gilmour 2008 Variable Name Description $] The version of Perl supported by this interpreter $0 The name of the file containing the current script $^O The name of the operating system $_ The current item (used in input, output and loops) $/ The line separator used when reading text (default it newline)
    • Special Scalar Variables © Garth Gilmour 2008 print &quot;This is verson $] of Perl &quot;; print &quot;Running on the $^O operating system &quot;; print &quot;The current script is $0 &quot;; @myarray = (&quot;ab&quot;,&quot;cd&quot;,&quot;ef&quot;,&quot;gh&quot;); print &quot;Elements are: &quot;; foreach(@myarray) { print &quot; $_ &quot;; } This is verson 5.008008 of Perl Running on the MSWin32 operating system The current script is C:perlspecialScalars.pl Elements are: ab cd ef gh
    • Reading Text Into a Scalar Variable
      • Reading text into a scalar is simple
        • The expression ‘$line = <INPUT>’ reads a line of text from INPUT and stores it in the scalar ‘line’
      • The symbol within the angle braces is a handle
        • Handles are links to resources outside your program
        • The ‘STDIN’ and ‘STDOUT’ handles are created automatically
        • We will see how to open and close handles later…
      • To write data use the ‘print’ function
        • This takes a file handle as an optional first parameter
        • The default handle is STDOUT
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 open(INPUT,&quot;MuchAdoAboutNothing.txt&quot;); $count = 1; foreach(<INPUT>) { print(&quot;$count: $_&quot;); $count++; } 1: Much Ado About Nothing 2: A comedy by William Shakespear 3: 4: Act 1, Scene 1 5: Before LEONATO'S house. 6: Enter LEONATO, HERO, and BEATRICE, with a Messenger 7: 8: LEONATO 9: I learn in this letter that Don Peter of Arragon 10: comes this night to Messina.
    • Conditionals and Iteration in Perl
      • Perl supports the standard ‘if’ conditional
        • Note that the ‘elif’ keyword is used instead of ‘else if’
      • The ‘unless’ is an ‘if’ in reverse
        • So ‘if(!done()) { … }’ becomes ‘unless(done()) { … }’
        • This is convenient once you get used to it
      • It is possible to place the test after the action
        • E.g. ‘$a = 12 if $b < $c’ or ‘$a = 12 unless $b >= $c’
      • The C/C++ ‘switch’ keyword is not supported
        • Although there are ways of simulating it if required
      • The rarely used ternary conditional operator is available
        • E.g. ‘$var1 = ($var2 == $var3) ? 17 : 19’
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 print &quot;Enter a number &quot;; $number = <STDIN>; chomp($number); if($number < 10) { print &quot;$number is less than 10 &quot;; } elsif($number < 20) { print &quot;$number is less than 20 &quot;; } elsif($number < 30) { print &quot;$number is less than 30 &quot;; } else { print &quot;$number is greater than 30 &quot;; } unless($number % 2 == 0) { print &quot;$number is odd &quot;; } Enter a number 17 17 is less than 20 17 is odd
    • Conditionals and Iteration in Perl
      • The standard loops are supported
        • Perl provides ‘while’, ‘do … while’ and ‘for’ loops
        • With the same syntax and semantics as C/C++
      • Variations of the ‘while’ loops are available
        • The ‘until’ and ‘do … until’ loops avoid the need to negate the conditional, but are not always intuitive
      • The ‘for’ loop is extended in two ways:
        • It can be used with ranges rather than counters
          • E.g. ‘for(1..4) { … }’ or ‘for(1..$max) { … }’
        • The ‘foreach’ loop iterates over arrays
          • We will meet it later when introducing data structures
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 print &quot;Enter a positive number &quot;; $max = <STDIN>; chomp($max); if($max <= 0) { die(&quot;Number must be positive!&quot;); } print &quot;Demo of while loop &quot;; print &quot; Numbers from 0 to $max are: &quot;; $count = 0; while($count <= $max) { print &quot; $count &quot;; $count++; } print &quot;Demo of do..while loop &quot;; print &quot; Numbers from 0 to $max are: &quot;; $count = 0; do { print &quot; $count &quot;; $count++; } while($count <= $max); print &quot;Demo of until loop &quot;; print &quot; Numbers from 0 to $max are: &quot;; $count = 0; until($count > $max) { print &quot; $count &quot;; $count++; } print &quot;Demo of do..until loop &quot;; print &quot; Numbers from 0 to $max are: &quot;; $count = 0; do { print &quot; $count &quot;; $count++; }until($count > $max); print &quot;Demo of for loop v1 &quot;; print &quot; Numbers from 0 to $max are: &quot;; for($count=0; $count<= $max; $count++) { print &quot; $count &quot;; } print &quot;Demo of for each loop v3 &quot;; print &quot; Numbers from 0 to $max are: &quot;; for(0..$max) { print &quot; $_ &quot;; }
    • Conditionals and Iteration in Perl
      • Loops can optionally have a continue block
        • E.g. ‘while($a < 12) { … } continue { … }’
      • This is used with the loop control operators
        • A call to ‘last’ immediately exits the loop
          • Without executing the continue block
        • A call to ‘next’ skips the remaining statements in this iteration
          • But the continue block is executed before the loop condition is re-evaluated and (if true) the next iteration begins
        • A call to ‘redo’ restarts the current iteration
          • The continue block is not executed
          • The loop condition is not checked
      © Garth Gilmour 2008
    • Basic Perl I/O Using the Console and Files © Garth Gilmour 2008
    • Basic Perl I/O
      • Perl I/O is based around handles
        • Links provided by the OS to data sources and sinks
      • Handles for the console are built-in
        • ‘ STDIN’ and ‘STDOUT’ represent the command prompt
      • Input is read via the ‘< >’ operator
        • So ‘$data = <STDIN>;’ reads a line from the console
        • Use the ‘chomp’ function to remove the newline
      • Output is written via the ‘print’ function
        • If the first argument is not a handle then STDOUT is used
        • A comma should not be placed after the handle
      © Garth Gilmour 2008
    • Testing File Paths
      • Perl provides built in operators for testing file paths
        • E.g. ‘ if(-e $file && -T $file) { print &quot;$file exists and is a text file&quot;; } ’
        • You should always check a file before opening it
      © Garth Gilmour 2008 File Test Operator Description -e File exists -r File is readable -w File is writable -z File has zero size -s Returns file size -T File is a text file -B File is a binary file -S File is a socket
    • Opening and Reading From Files
      • The ‘open’ function is used to create a file handle
        • The first argument is the symbol we want to represent the handle
      • Files are opened in a particular mode
        • As indicated by the character(s) before the filename
        • The default is to open for reading
      © Garth Gilmour 2008 Function Description open(HANDLE, “myfile.txt”) open(HANDLE, “<myfile.txt”) Open file for reading open(HANDLE, “>myfile.txt”) Open file for writing (truncating if necessary) open(HANDLE, “>>myfile.txt”) Open file for appending open(HANDLE, “+<myfile.txt”) Open file for reading and updating
    • Opening and Reading From Files
      • The standard form of ‘open’ could cause problems
        • If the handle name was already in use (e.g. as a subroutine)
        • If we were trying to open a file called ‘>myfile.txt’
      • There are two ways around this
        • There is a three argument form of ‘open’
          • The mode(s) are passed as separate arguments
        • The handle can be stored in a scalar variable
          • This is known as an indirect filehandle
      • Once you have a file opened you can:
        • Read lines from the file using the ‘< >’ operator
        • Write to the file using the print method
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 open(INPUT,&quot;input.txt&quot;); open(OUTPUT,&quot;>output.txt&quot;); $count = 0; while($line = <INPUT>) { print OUTPUT ++$count, &quot; &quot;, $line; } print &quot;Processed $count lines &quot; This short interval was sufficient to determine d'Artagnan on the part he was to take. It was one of those events which decide the life of a man; it was a choice between the king and the cardinal--the choice made, it must be persisted in. To fight, that was to disobey the law, that was to risk his head, that was to make at one blow an enemy of a minister more powerful than the king himself. All this young man perceived, and yet, to his praise we speak it, he did not hesitate a second. Turning towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to correct your words, if you please. You said you were but three, but it appears to me we are four.&quot; 1 This short interval was sufficient to determine d'Artagnan on the 2 part he was to take. It was one of those events which decide the 3 life of a man; it was a choice between the king and the 4 cardinal--the choice made, it must be persisted in. To fight, 5 that was to disobey the law, that was to risk his head, that was 6 to make at one blow an enemy of a minister more powerful than the 7 king himself. All this young man perceived, and yet, to his 8 praise we speak it, he did not hesitate a second. Turning 9 towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to 10 correct your words, if you please. You said you were but three, 11 but it appears to me we are four.&quot;
    • © Garth Gilmour 2008 open(INPUT, '<', &quot;input.txt&quot;); open(OUTPUT, '>', &quot;output.txt&quot;); $count = 0; while($line = <INPUT>) { print OUTPUT ++$count, &quot; &quot;, $line; } print &quot;Processed $count lines &quot; This short interval was sufficient to determine d'Artagnan on the part he was to take. It was one of those events which decide the life of a man; it was a choice between the king and the cardinal--the choice made, it must be persisted in. To fight, that was to disobey the law, that was to risk his head, that was to make at one blow an enemy of a minister more powerful than the king himself. All this young man perceived, and yet, to his praise we speak it, he did not hesitate a second. Turning towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to correct your words, if you please. You said you were but three, but it appears to me we are four.&quot; 1 This short interval was sufficient to determine d'Artagnan on the 2 part he was to take. It was one of those events which decide the 3 life of a man; it was a choice between the king and the 4 cardinal--the choice made, it must be persisted in. To fight, 5 that was to disobey the law, that was to risk his head, that was 6 to make at one blow an enemy of a minister more powerful than the 7 king himself. All this young man perceived, and yet, to his 8 praise we speak it, he did not hesitate a second. Turning 9 towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to 10 correct your words, if you please. You said you were but three, 11 but it appears to me we are four.&quot;
    • © Garth Gilmour 2008 open($input, '<', &quot;input.txt&quot;); open($output, '>', &quot;output.txt&quot;); $count = 0; while($line = <$input>) { print $output ++$count, &quot; &quot;, $line; } print &quot;Processed $count lines &quot; This short interval was sufficient to determine d'Artagnan on the part he was to take. It was one of those events which decide the life of a man; it was a choice between the king and the cardinal--the choice made, it must be persisted in. To fight, that was to disobey the law, that was to risk his head, that was to make at one blow an enemy of a minister more powerful than the king himself. All this young man perceived, and yet, to his praise we speak it, he did not hesitate a second. Turning towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to correct your words, if you please. You said you were but three, but it appears to me we are four.&quot; 1 This short interval was sufficient to determine d'Artagnan on the 2 part he was to take. It was one of those events which decide the 3 life of a man; it was a choice between the king and the 4 cardinal--the choice made, it must be persisted in. To fight, 5 that was to disobey the law, that was to risk his head, that was 6 to make at one blow an enemy of a minister more powerful than the 7 king himself. All this young man perceived, and yet, to his 8 praise we speak it, he did not hesitate a second. Turning 9 towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to 10 correct your words, if you please. You said you were but three, 11 but it appears to me we are four.&quot;
    • Opening and Reading From Files
      • Lines from a file should be read using a ‘while’ loop
        • A ‘for’ loop causes the interpreter to create a list of all the lines from the file, which is then iterated over
      • File handles should be closed via ‘close’
        • Handles stored as scalars are automatically closed when the variable goes out of scope, but you may want to do this earlier
      • You should verify calls to ‘open’, ‘print’ and ‘close’
        • Both return a boolean value to indicate success or failure
        • You can throw an error using the ‘die’ or ‘croak’ functions
          • We will cover these in depth later in the course
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 open($input, '<', &quot;input.txt&quot;) or die &quot;Can't open input&quot;; open($output, '>', &quot;output.txt&quot;) or die &quot;Can't open output&quot;; $count = 0; while($line = <$input>) { print($output ++$count, &quot; &quot;, $line) or die &quot;Can't write to file&quot;; } print &quot;Processed $count lines &quot;; close($input) or die &quot;Can't close input&quot;; close($output) or die &quot;Can't close output&quot;;
    • File Handles and Globbing
      • You can place a pattern inside the ‘< >’ operator
        • In which case the pattern is ‘globbed’
        • E.g. ‘@dirs = <../*>;’
      • To avoid confusion use the ‘glob’ function
        • E.g. ‘@dirs = glob(‘../*’);’
      • Globbing is often used to change file properties
        • ‘ chmod’ and ‘chown’ change a files access rights and ownership
        • E.g. ‘while(glob(“*.pl”)) { chmod(O777, $_); }’
        • E.g. ‘while(glob(“*.pl”)) { chown($user_id,$group_id, $_); }’
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 @exampleDirectories = glob('..*'); print &quot;Perl example files are: &quot;; foreach $dir (@exampleDirectories) { @perlFiles = glob($dir . '*.pl'); foreach(@perlFiles) { #characters preceding a slash preceding # letters ending in '.pl' m/.*w+.pl)/; print &quot; $1 in $dir &quot;; } } Perl example files are: arrayFunctions.pl in ..arrays arraysAndLists.pl in ..arrays days.pl in ..arrays forEach.pl in ..arrays fork.pl in ..concurrency threads.pl in ..concurrency customerDB.pl in ..databases arraysOfArrays.pl in ..dataStructures checkingErrors.pl in ..files indirectHandles.pl in ..files basicHashes.pl in ..hashes extraSyntax.pl in ..hashes
    • Arrays in Perl Creating and Using Lists © Garth Gilmour 2008
    • Introducing Arrays in Perl
      • In other languages arrays cannot change size
        • Hence they must be supplemented with data structures
        • E.g. the C++ STL or the Collections libraries in Java and C#
      • In Perl arrays can grow and shrink as required
        • So there is no need for a separate ‘vector’ or ‘LinkedList’ type
        • If the array is of size 10 and you try to store something in box 100 then the size is automatically changed
      • Often arrays are created implicitly
        • E.g. ‘$myarray[9] = “abc”’ would create an array called ‘myarray’ with ten boxes, all of which were undefined apart from the last
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 $myarray[8] = &quot;string in 9th box&quot;; $myarray[10] = &quot;string in 11th box&quot;; $count = 0; foreach(@myarray) { print $count++,&quot;: $_ &quot;; } 0: 1: 2: 3: 4: 5: 6: 7: 8: string in 9th box 9: 10: string in 11th box
    • Support for Arrays in Perl
      • Arrays are declared with the ‘@’ sigil
        • E.g. ‘@myarray = (“abc”, 123, “def”, 456, “ghi”, 789)’
      • Normally we create an array based on a list of values
        • The ‘qw’ operator can be used to avoid quoting
        • E.g. ‘@myarray = qw(abc 123 def 456 ghi 789)’
      • Lists can also be initialized based on arrays
        • E.g. ‘($val1,$val2,$val3) = @myarray’ copies the values in the first three boxes into the scalar variables
        • ‘ ($val1) = @myarray’ is an idiom for grabbing the first value
          • It is equivalent to ‘$val1 = $myarray[0]’
      © Garth Gilmour 2008
    • Support for Arrays in Perl
      • Note that lists are only used for grouping
        • Unlike arrays their existence is only ever temporary
      • The ‘@’ sigil is not used when indexing
        • Instead of ‘@myarray[1]’ we write ‘$myarray[1]’
        • This is because the value we are accessing is a scalar
      • Arrays can be created based on slices of other arrays
        • E.g. ‘@array2 = @array1[1..3]’ creates a new array called ‘array2’ holding copies of boxes 2,3 and 4 in ‘array1’
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 @tstArray = qw(abc def ghi jkl mno pqr); ($first) = @tstArray; ($a,$b,$c) = @tstArray; print &quot;First element is: $first &quot;; print &quot;First three elements are: $a $b $c &quot; First element is: abc First three elements are: abc def ghi
    • Special Features of Arrays
      • Perl makes assumptions about your use of arrays
        • Almost anything you might write is meaningful
        • Even if the meaning is not what you intended
      • If you put an array in a scalar context the size is used
        • ‘ $var = @myarray’ stores the size of ‘myarray’ in ‘var’
        • ‘ $var = @myarray - 1’ would store the index of the last box
      • You can easily copy values into another array
        • E.g. ‘@myarray3 = (@myarray1,@myarray2)’ would mean ‘myarray3’ contained copies of the values in the other two arrays
        • It does not create a multidimensional array
      © Garth Gilmour 2008
    • Special Features of Arrays
      • There is an easy way to find the last index
        • For ‘@myarray’ it is stored in the variable ‘$#myarray’
      • Arrays can be shrunk if required
        • The special variable ‘$#myarray’ is not immutable
        • So ‘$#myarray -= 2’ removes the last two boxes of ‘myarray’
      • There are two ways to empty out an array
        • By assigning its size to -1
          • E.g. ‘$#myarray = -1’
        • By assigning it to an empty list
          • E.g. ‘@myarray = ()’
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 @workDays = (&quot;Monday&quot;,&quot;Tuesday&quot;,&quot;Wednesday&quot;,&quot;Thursday&quot;,&quot;Friday&quot;); @weekendDays = qw(Saturday Sunday); @days = (@workDays,@weekendDays); $numDays = @days; $firstDay = $days[0]; $lastDay = $days[$#days]; print &quot;There are $numDays days in a week &quot;; print &quot;$firstDay is the first day and $lastDay is the last day &quot;; print &quot; The other days are: &quot;; foreach $day (@days) { unless($day eq $firstDay or $day eq $lastDay) { print $day, &quot; &quot;; } } print &quot; The last four days are: &quot;; @lastDays = @days[3..6]; foreach $day (@lastDays) { print $day, &quot; &quot;; }
    • Iterating Over Arrays
      • The easiest way to loop over an array is ‘foreach’
        • This iterates over a list of values and assigns each to a scalar variable
        • You can specify the scalar or use the built-in ‘$_’
      • The ‘foreach’ keyword is just an alias for ‘for’
        • Which one you use is a matter of style
      © Garth Gilmour 2008 @myarray = qw(abc def ghi jkl); print &quot;Loop one &quot;; foreach $item (@myarray) { print &quot; &quot;,$item,&quot; &quot; ; } print &quot;Loop two &quot;; foreach (@myarray) { print &quot; &quot;,$_,&quot; &quot; ; } print &quot;Loop three &quot;; for $item (@myarray) { print &quot; &quot;,$item,&quot; &quot; ; } print &quot;Loop four &quot;; for (@myarray) { print &quot; &quot;,$_,&quot; &quot; ; }
    • Functions for Working With Arrays
      • Perl provides powerful functions for manipulating arrays
        • The ‘push’ and ‘pop’ functions add and remove from the end
        • The ‘unshift’ and ‘shift’ functions do the same from the start
      © Garth Gilmour 2008 Function Name Description push Add a new box to the end of the array pop Remove a box from the end of the array unshift Add a new box to the start of the array shift Remove the first box in the array join Join all the values in the array into a string, separated by a delimiter split Create an array by splitting a string into a sequence of sub-strings, using a regular expression to specify the delimiter token(s)
    • © Garth Gilmour 2008 @myarray1 = qw(abc def ghi); $val1 = pop(@myarray1); print &quot; Just popped $val1, contents now: &quot;; foreach(@myarray1) { print &quot; $_ &quot;; } push(@myarray1,&quot;zzz&quot;); print &quot; Just pushed zzz, contents now: &quot;; foreach(@myarray1) { print &quot; $_ &quot;; } $val1 = shift(@myarray1); print &quot; Just shifted $val1, contents now: &quot;; foreach(@myarray1) { print &quot; $_ &quot;; } unshift(@myarray1,&quot;AAA&quot;); print &quot; Just unshifted AAA, contents now: &quot;; foreach(@myarray1) { print &quot; $_ &quot;; } Just popped ghi, contents now: abc def Just pushed zzz, contents now: abc def zzz Just shifted abc, contents now: def zzz Just unshifted AAA, contents now: AAA def zzz
    • Hashes in Perl Creating and Using Tables © Garth Gilmour 2008
    • Introducing Hashes in Perl
      • Hashes are the second built in data type
        • A hash is a data structure that works like a map or table
        • The name comes from the use of a hashing algorithm
      • The ‘%’ sigil is used when declaring hashes
        • As with arrays this is not used when referring to values
        • E.g. ‘$myhash{“k1”}’ returns the value for the key ‘k1’ in the hash ‘%myhash’ and ‘$myhash{“k1”} = 12’ sets it
      • Hashes can be declared and expanded explicitly
        • ‘ $myhash{“k1”} = 12’ creates a hash called ‘myhash’ if required
        • Otherwise if the row does not exist it is added to the hash
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 $myhash{'k1'} = &quot;abc&quot;; $myhash{'k2'} = 123; $myhash{'k3'} = &quot;def&quot;; $myhash{'k4'} = 456; foreach $key (keys %myhash) { print $key, &quot; indexes &quot;, $myhash{$key}, &quot; &quot;; } k2 indexes 123 k1 indexes abc k3 indexes def k4 indexes 456
    • Initializing a Hash
      • Like arrays hashes can be initialized via lists
        • E.g. ‘%myhash = (“k1”, 123, “k2”, “XYZ”)’ creates a hash with two rows, where the keys are ‘k1’ and ‘k2’
          • Again ‘qw’ can be used to avoid quoting literals
        • The odd numbers become keys and the even numbers values
          • The ordering of the rows cannot be predicted
      • The ‘fat comma’ notation makes things clearer
        • E.g. ‘%myhash = (k1 => 123, k2 => “XYZ”)’
        • The ‘=>’ operator is the same as the comma, except that the value on the left hand side is quoted if required
        • Each pair should be placed on its own line for clarity
      © Garth Gilmour 2008
    • © Garth Gilmour 2008
      • %myhash = (
          • k1 => 123,
          • k2 => &quot;abc&quot;,
          • k3 => 456,
          • k4 => &quot;def&quot;,
          • k5 => 789
        • );
      • foreach $key (keys %myhash) {
      • print $key, &quot; indexes &quot;, $myhash{$key}, &quot; &quot;;
      • }
      k5 indexes 789 k2 indexes abc k1 indexes 123 k3 indexes 456 k4 indexes def
    • Functions for Working With Hashes
      • As with arrays there are built-in functions for hashes
        • ‘ keys’ and ‘values’ return the entries in different columns
        • The ‘each’ function is slightly complex
          • Every time it is called it returns a list holding a key/value pair
          • When the end of the hash is reached it returns a null array
      © Garth Gilmour 2008 Function Name Description each Returns a list of two values representing a row in the hash exists Returns true if a specified entry exists in the hash keys Returns a list of all the keys in the hash values Returns a list of all the values in the hash delete Removes a row from the hash
    • © Garth Gilmour 2008 %myhash = ( k1 => 123, k2 => &quot;abc&quot;, k3 => 456, k4 => &quot;def&quot;, k5 => 789 ); print &quot;Keys are: &quot;; foreach $key (keys %myhash) { print &quot; &quot;, $key, &quot; &quot;; } print &quot;Values are: &quot;; foreach $value (values %myhash) { print &quot; &quot;, $value, &quot; &quot;; } print &quot;Entries are: &quot;; while (($key, $value) = each(%myhash)) { print &quot; $key indexes $value &quot;; } Keys are: k5 k2 k1 k3 k4 Values are: 789 abc 123 456 def Entries are: k5 indexes 789 k2 indexes abc k1 indexes 123 k3 indexes 456 k4 indexes def
    • Special Syntax for Hashes
      • A list can be assigned to an array
        • E.g. ‘@myarray = %myhash’ causes all the keys and values from ‘myhash’ to be inserted into ‘myarray’
          • The items are added in the order they are found
          • Rather than the order in which they were added
      • Slices of hashes can be obtained
        • E.g. ‘($v1,$v2) = @myhash { “k1”, “k2” }’ stores the values associated with ‘k1’ and ‘k2’ into the two scalar variables
      • Slicing can also be used to add values
        • E.g. ‘@myhash { “k1”, “k2”, “k3” } = (“abc”, 123, “def”)’ adds three key/value pairs into the hash
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 %tstHash = (&quot;k1&quot;,&quot;v1&quot;,&quot;k2&quot;,&quot;v2&quot;); print &quot;Original hash contents: &quot;; foreach(keys(%tstHash)) { print(&quot;$_ indexes &quot;,$tstHash{$_},&quot; &quot;); } @tstHash {&quot;k3&quot;,&quot;k4&quot;,&quot;k5&quot;,&quot;k6&quot;} = (111,222,333,444); print &quot; Hash contents after insertions: &quot;; foreach(keys(%tstHash)) { print(&quot;$_ indexes &quot;,$tstHash{$_},&quot; &quot;); } ($var1,$var2,$var3) = @tstHash {&quot;k1&quot;,&quot;k2&quot;,&quot;k3&quot;}; print &quot; Values in scalars are $var1 $var2 and $var3 &quot;; @elements = %tstHash; print &quot; Array contents: &quot;; foreach(@elements) { print &quot;$_ &quot;; } Original hash contents: k2 indexes v2 k1 indexes v1 Hash contents after insertions: k5 indexes 333 k2 indexes v2 k1 indexes v1 k6 indexes 444 k3 indexes 111 k4 indexes 222 Values in scalars are v1 v2 and 111 Array contents: k5 333 k2 v2 k1 v1 k6 444 k3 111 k4 222
    • Hashes of Anonymous Arrays
      • This is the first advanced data structure we will meet
        • We introduce it now because it is so useful
      • The syntax ‘[12, “AB”]’ creates an anonymous array
        • No name is associated with the array
        • Instead the ‘[ ]’ operator returns its address
      • We can store the addresses in a hash
        • Indexed by an appropriate key
      • This can be used to store all kinds of data
        • E.g. exam results for students
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 %results = ( dave => [54, 62, 73, 48], jane => [59, 67, 82, 70], fred => [92, 64, 59, 71] ); results dave jane fred 54 62 73 48 59 67 82 70 92 64 59 71
    • Hashes of Anonymous Arrays
      • To work with the array as a whole use ‘@{ }’
        • E.g. ‘@{$myhash{“dave”}}’ means get the array whose address is indexed in ‘myhash’ by the key ‘dave’
      • To work with array elements use the arrow operator
        • E.g. ‘$myhash{“dave”}->[1]’ means get the value in ‘myhash’ indexed by ‘dave’ and then go to box 2 in the array it references
        • ‘ ${$myhash{“dave”}}[1]’ is also valid
          • But is very hard to decode
          • ‘ $$myhash{“dave”}[1]’ would be interpreted as meaning that ‘myhash’ is a scalar variable holding the address of the array
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 %actors = ( &quot;george clooney&quot; => [&quot;Oceans 11&quot;, &quot;The Peacemaker&quot;, &quot;O Brother Where Art Thou&quot;], &quot;harrison ford&quot; => [&quot;Star Wars&quot;,&quot;Sabrina&quot;,&quot;Indiana Jones&quot;], &quot;robin williams&quot; => [&quot;Good Morning Vietnam&quot;,&quot;Hook&quot;,&quot;The Birdcage&quot;] ); print &quot; The list of actors and their movies is: &quot;; foreach $actor (keys %actors) { print &quot; $actor starred in: &quot;; foreach $film(@{$actors{$actor}}) { print &quot; $film &quot;; } } The list of actors and their movies is: robin williams starred in: Good Morning Vietnam Hook The Birdcage harrison ford starred in: Star Wars Sabrina Indiana Jones george clooney starred in: Oceans 11 The Peacemaker O Brother Where Art Thou
    • Regular Expressions Part 1: Core Concepts © Garth Gilmour 2008
    • Introducing Regular Expressions
      • A Regular Expression is a pattern in text
        • The commonly used shorthand is ‘regex’
      • Regex’s are used to find matches in search strings
        • E.g. the regex for an email might be:
          • One or more lowercase or uppercase letters
          • Optionally a dot and one or more letters (any case)
          • The ‘@’ symbol followed by the company name
          • One of a range of supported prefixes (.com or .co.uk or .ie)
      • Regex’s can save a huge amount of time and effort
        • Especially when compared to writing your own parser
      © Garth Gilmour 2008
    • The Syntax of Regular Expressions
      • Regular expressions are a ‘little language’
        • Like SQL and XPath they have their own syntax
      • Unfortunately the special characters in the regex language can also occur as part of your search string
        • E.g. the dot is short for any character, so if you mean the actual character dot it must be escaped
      • The syntax of regex’s has developed over time
        • Initially in UNIX commands and then in Perl scripting
        • Perl V5 set the standard for regex support
          • Most languages now support the Perl 5 regex syntax
      • Your regex’s are run by an ‘Expression Engine’
        • The details of how this works can be quite complex
      © Garth Gilmour 2008
    • Key Regex Concept No 1
      • The search for the next match starts from just after the end of the last successful match
        • So if the pattern is ‘three uppercase letters’ and the pattern is ‘ABCDEF’ then the matches are ‘ABC’ and ‘DEF’
        • Not ‘ABC’ followed by ‘BCD’ followed by ‘CDE’ and so on
      © Garth Gilmour 2008 A B C D E F G H I J K L M N O Match No 1 Match No 2 Match No 3 Match No 4 Match No 5
    • Key Regex Concept No 2
      • Regular expressions are greedy by default
        • So given the pattern ‘one or more uppercase letters’ and the search string ‘ABCDEFg’ the match is ‘ABCDEF’
        • The engine always selects the largest possible set of characters
          • It is possible to use non-greedy matching symbols instead
      © Garth Gilmour 2008 A B C D E f g h I J K L M N o p q r S T U V W X y z Match 1 Match 2 Match 3
    • Character Classes
      • The building block of all regex’s is the character class
        • This defines a set of symbols to find e.g.
          • ‘ [aeiou]’ matches any vowel
          • ‘ [a-z]’ matches uppercase letters
          • ‘ [A-Z]’ matches lowercase letters
          • ‘ [a-zA-Z]’ matches a letter in either case
        • Note that it is a set and not a sequence
          • ‘ [abc]’ matches ‘a’ OR ‘b’ OR ‘c’ and NOT ‘abc’
      • The top hat symbol negates the character class
        • So ‘[^aeiou]’ matches any character that isn't a vowel
        • Note that outside a character class ‘^’ has another meaning
      © Garth Gilmour 2008
    • Shortcuts for Character Classes
      • There are two shortcut notations for character classes
        • One provided by the Perl version of regular expressions
        • The other by the POSIX standards (this is very rarely used)
      © Garth Gilmour 2008 Perl Shortcut Description Character Class d Digit [0-9] D Non-Digit [^0-9] s Whitespace Character [ f] S Non Whitespace Character [^ f] w Word Character [a-zA-Z0-9_] W Non-Word Character [^a-zA-Z0-9_]
    • Specifying Multiplicities
      • By default a character class matches one instance
        • You can specify a different number in braces
          • So ‘[a-z]’ is the same as ‘[a-z]{1}’
        • Separating numbers by commas specifies a range
          • So ‘[a-z]{2,4}’ means between two and four lowercase letters
        • The question mark signifies optionality
          • So ‘[a-z]{2}[A-Z]?’ specifies two lowercase letters optionally followed by a single uppercase letter
      • There are two meta-characters used for many
        • The plus means one or more and the star zero or more
        • So ‘[a-z]+[A-Z]*’ means one or more lowercase letters followed by zero or more uppercase letters (note that greediness applies)
      © Garth Gilmour 2008
    • Specifying Points Within the Input
      • Two characters signify the start and end points
        • The ‘top hat’ specifies the start
        • The dollar specifies the end
      • These are very useful
        • ‘ ^$’ matches blank lines
        • ‘ ^[a-zA-Z]{10}’ captures the first 10 characters
        • ‘ [a-zA-Z]{10}$’ captures the last 10 characters
        • ‘ ^[^0-9]{5}’ captures the first 5 characters if they are not digits
      • What you mean by start and end points can vary
        • You can select whether they match the start and end of the entire string or each line embedded within it
      © Garth Gilmour 2008
    • Using Submatches Within a Regex
      • Matches can contain submatches
        • By placing part of the regex in braces it can be accessed separately from the main match
        • E.g. applying ‘([a-z]+)([A-Z]+)’ to ‘ABCdefGHIjkl’ matches ‘defGHI’ with submatches of ‘def’ and ‘GHI’
      • Braces are used for both grouping and submatches
        • If you want to use braces for grouping only use ‘(?: … )’
      • Submatches can be very helpful
        • E.g. you want to match email addresses but capture the name and domain prefix separately
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 ABCdefGHIjklMNOpqrSTU [A-Z]{2} Matches: AB GH MN ST ABCdefGHIjklMNOpqrSTU [A-Z]{3} Matches: ABC GHI MNO STU ABCdefGHIjklMNOpqrSTU [A-Z]{3}[a-z] Matches: ABCd GHIj MNOp
    • © Garth Gilmour 2008 ABCdefGHIjklMNOpqrSTU [A-Z]+[a-z]+ Matches: ABCdef GHIjkl MNOpqr ABCdefGHIjklMNOpqrSTU ([A-Z]+)([a-z]+) Matches: ABCdef Group 1: ABC Group 2: def GHIjkl Group 1: GHI Group 2: jkl MNOpqr Group 1: MNO Group 2: pqr ABCdefGHIjklMNOpqrSTU [A-Za-z]+ Matches: ABCdefGHIjklMNOpqrSTU
    • © Garth Gilmour 2008 ABCdefGHIjklMNOpqrSTU [A-Za-z]{5} Matches: ABCde fGHIj klMNO pqrST ABCdefGHIjklMNOpqrSTU ^[A-Za-z]{5} Matches: ABCde ABCdefGHIjklMNOpqrSTU [A-Za-z]{5}$ Matches: qrSTU ABCdefGHIjklMNOpqrSTU [A-Za-z]{5,8} Matches: ABCdefGH IjklMNOp rqSTU
    • Other Meta-Characters
      • The bar is used as a logical OR
        • So ‘(com|ie|net)’ matches one of three prefixes
        • Note that placing spaces around the bar changes the pattern
      • The dot matches any character
        • You can choose whether or not this includes newlines
      • The slash is used to escape meta-characters
        • So ‘(.com|.ie|.net)’ matches any character plus the prefix whereas ‘(.com|.ie|.net)’ matches the prefix with the dot
      • In Sed ‘<’ and ‘>’ match the start and end of a word
        • Perl does not support this and instead uses ‘’ for both
      © Garth Gilmour 2008
    • Regular Expressions Part 2: Perl Syntax © Garth Gilmour 2008
    • Regular Expressions in Perl
      • Perl 5 is the established standard for regex’s
        • The ‘=~’ operator is used to apply an expression
          • E.g. ‘$match = $data =~ m/[A-Z]+/’
      • By default the operator returns a true/false value
        • The ‘!~’ operator is the reverse of ‘=~’
      • If successful the matching group is stored in ‘$&’
        • ‘ $`’ holds the text before the match
        • ‘ $’’ holds the text after the match
      • The ‘g’ modifier causes an array to be returned
        • E.g. ‘@results = $data =~ m/[A-Z]+/g’
        • A ‘foreach’ loop can be used to iterate over the results
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 $data = &quot;ABCdefGHIjklMNOpqrSTUvwxYZA&quot;; $m1 = $data =~ m/[A-Z]/; if($m1) { print &quot;Match one is $& &quot;; } $m2 = $data =~ m/[A-Z]{2}/; if($m2) { print &quot;Match two is $& &quot;; } $m3 = $data =~ m/[A-Z]+/; if($m3) { print &quot;Match three is $& &quot;; } $m4 = $data =~ m/[^e]+/; if($m4) { print &quot;Match four is $& &quot;; } $m5 = $data =~ m/[A-Z]{3}[a-z]{2}/; if($m5) { print &quot;Match five is $& &quot;; } Match one is A Match two is AB Match three is ABC Match four is ABCd Match five is ABCde
    • © Garth Gilmour 2008 $m6 = $data =~ m/[A-Z]+[a-z]+/; if($m6) { print &quot;Match six is $& &quot;; } $m7 = $data =~ m/[A-Za-z]{9}/; if($m7) { print &quot;Match seven is $& &quot;; } $m8 = $data =~ m/.+/; if($m8) { print &quot;Match eight is $& &quot;; } $m9 = $data =~ m/^.{8}/; if($m9) { print &quot;Match nine is $& &quot;; } $m10 = $data =~ m/.{8}$/; if($m10) { print &quot;Match ten is $& &quot;; } Match six is ABCdef Match seven is ABCdefGHI Match eight is ABCdefGHIjklMNOpqrSTUvwxYZA Match nine is ABCdefGH Match ten is TUvwxYZA
    • © Garth Gilmour 2008 @groupOne = $data =~ m/[A-Z]{3}/g; print &quot;First group of matches are: &quot;; foreach(@groupOne) { print &quot; $_ &quot;; } @groupTwo = $data =~ m/[a-z]{3}/g; print &quot;Second group of matches are: &quot;; foreach(@groupTwo) { print &quot; $_ &quot;; } @groupThree = $data =~ m/.{4}/g; print &quot;Third group of matches are: &quot;; foreach(@groupThree) { print &quot; $_ &quot;; } First group of matches are: ABC GHI MNO STU YZA Second group of matches are: def jkl pqr vwx Third group of matches are: ABCd efGH Ijkl MNOp qrST Uvwx
    • Regular Expressions in Perl
      • Several other modifiers can be used with ‘m//’
        • E.g. by default ‘.’ does not match new-line characters, so ‘.+’ will not cross embedded new-lines unless you use ‘s’
      • Submatches are stored in numbered scalar variables
        • So ‘$1’ holds the first submatch and so on
      © Garth Gilmour 2008 Modifier Description g Finds all the matches in the string i Makes the pattern case-insensitive s Means ‘.’ matches new-line characters m Means ‘^’ and ‘$’ match substrings
    • Using Regular Expressions in Perl
      • Best practice is to use extended expressions
        • This is enabled via the ‘x’ modifier
      • The ‘x’ modifier has two effects:
        • Whitespace within the regex is disregarded
          • The regex can be spread over multiple lines
        • Standard Perl comments can be added
          • These let you explain the your intent
      • Multi line regexes should be bracketed with ‘{ }’
        • The interpreter will accept any characters as brackets
          • If an opening brace is used then the reverse is expected
          • E.g. ‘m![A-Z]+!’, ‘m?[A-Z]+?’, or ‘m{[A-Z]+}’,
      © Garth Gilmour 2008
    • Using Regular Expressions in Perl © Garth Gilmour 2008 print &quot;Enter the email address: &quot;; chomp($email = <STDIN>); #easier to read version of ([a-z]+(.[a-z]+)?)@megacorp(.com|.ie|.co.uk) $m = $email =~ m{^ #start of string ( #start of inner match 1 [a-z]+ #one or more lowercase letters (.[a-z]+)? #optionally a dot and letters (inner match 2) ) #end of inner match 1 @megacorp #company name NB need to escape array (.com|.ie|.co.uk) #possible domain names $ #end of string }x; if($m) { print &quot;Recognized email for $1 in domain $3 &quot;; } else { print &quot;Invalid address! &quot;; }
    • Pattern Matching and Substitutions
      • The ‘s///’ operator is used to make substitutions
        • Its syntax is ‘s/ regex / replacement text / modifiers ’
        • The replacement text can contain ‘$1’, ‘$2’ etc…
      • By default a substitution is only made for the first match
        • By using the ‘g’ modifier all matches are replaced
        • The return value is the number of substitutions made
      • The ‘e’ modifier is especially useful
        • The replacement text is executed as a Perl expression
        • The standard use of this is to replace a match with the result of running one of more functions against the match
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 $test = &quot;aabbccddyzeeffgghhiijjkkllmmnnoo&quot;; # Replace all occurances of ff with FF $test =~ s/ff/FF/g; print $test , &quot; &quot;; # Replace any occurance of four characters at the start of the string with XX $test =~ s/^.{4}/XX/g; print $test , &quot; &quot;; # Replace any lower case characters with their upper case equivalents $test =~ s/([a-z])/uc($1)/eg; print $test , &quot; &quot;; aabbccddyzeeFFgghhiijjkkllmmnnoo XXccddyzeeFFgghhiijjkkllmmnnoo XXCCDDYZEEFFGGHHIIJJKKLLMMNNOO
    • Regular Expressions Part 3: Advanced Concepts © Garth Gilmour 2008
    • Regular Expressions - Advanced
      • So far we have covered the 80% of regular expressions that developers use 99% of the time
        • But there is still a lot of extra functionality available
      • We mention some of the advanced features here
        • You are encouraged to play with them after the course
      • These features may be useful when you are:
        • Trying to parse non-ASCI based text
        • Trying to shorten an overly long expression
        • Extracting information from poorly organised data
        • Searching for the most efficient regex possible
      © Garth Gilmour 2008
    • Non Greedy Matching
      • We have seen that matching is greedy
        • E.g. ‘A.+B’ grabs as many characters from A--> B
        • So against ‘nnnAnnnBnnnBnnn’ the first match is ‘AnnnBnnnB’
          • Rather than ‘AnnnB’ which may well be what you wanted
        • Greedy matching makes it hard to define ‘end tokens’
      • It is possible to have a non-greedy match
        • ‘ .+?’ still means grab one of more characters
          • But it takes as few as possible to create a successful match
        • ‘ .*?’ does the same thing for zero of more characters
        • ‘ ??’ is the non-greedy version of ‘?’
          • Given the choice between grabbing zero or one characters it prefers to grab zero, as long as the match will still succeed
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 abcDEfghIJklm [a-zA-Z]+[A-Z]{2} abcDEfghIJ Matches: abcDE fghIJ [a-zA-Z]+?[A-Z]{2} [a-zA-Z]*[A-Z]{2} abcDEfghIJklm Matches: ab cDEfg hIJkl [a-zA-Z]*?[A-Z]{2}
    • Parenthesis Which Do Not Capture
      • We have seen that parenthesis serve two functions:
        • As in coding they group separate constructs
        • They are used to define matches-within-matches
      • Sometimes you only want the first function
        • You need to separate out part of the expression but don’t want to capture the result within a submatch
        • E.g. ‘(.com|.ie)’ when matching against URL’s
      • The ‘(?: … )’ syntax provides non-capturing brackets
        • E.g. ‘(?:.com|.ie)’ lets you specify optionality without capturing
      © Garth Gilmour 2008
    • Modifying Just Part of the Pattern
      • We know modifiers can be placed after ‘m//’ or ‘s///’
        • E.g. the ‘i’ modifier makes the match case insensitive
      • But the modifier applies to the whole pattern
        • What if only part of the regex should be case insensitive?
        • What if ‘.’ should match newlines only in one place?
      • The ‘(?imsx: … )’ syntax lets you do this
        • E.g. if you say ‘[A-Z]{3}(?i:[A-Z]{2})[A-Z]{3}’ this matches ‘ABCdeFGH’ or ‘ABCDEFGH’ but not ‘abcdefgh’
      • You can use if to turn off modifiers as well
        • E.g. ‘g/… (?-i: …) …/i’ makes part of the pattern case sensitive
      © Garth Gilmour 2008
    • Matching Without Capturing
      • It is possible to check what is ahead or behind you
        • Without capturing these characters as part of the match
        • The proper name for these is ‘lookaround’ assertions
        • This is useful when what comes after you is both a condition of your match and part of the next match
      © Garth Gilmour 2008 Assertion Explanation (?= …) Looks ahead to see if a pattern occurs without capturing (?! …) Looks ahead to see if a pattern does not occur without capturing (?<= …) Looks behind to see if a pattern occurs without capturing (?<! …) Looks behind to see if a pattern does not occur without capturing
    • Building Regex Parts Dynamically
      • Consider the following problem
        • You need a regex to find company email addresses
        • But the company name will be specified at runtime
      • You could build the whole expression at runtime
        • Through normal string concatenation
        • But this code will be tedious and error prone
      • Perl lets you embed dynamic content in expressions
        • When you use ‘(?{ … })’ within your regex the code placed at ‘…’ is run by the interpreter and forms part of the pattern
        • Variables in the block are in scope until the current search for a match either succeeds or is rolled back
      © Garth Gilmour 2008
    • Working With Unicode
      • So far in our discussion we have assumed ASCII
        • E.g. ‘[A-Z]+’ matches one or more uppercase letters
        • As long as your input text is written in ASCII in a culture that agrees with the common definition of English capital letters
      • However this does not apply with internationalization
        • Where we will be working with unfamiliar character sets and conventions (capitals, accents, reading direction etc…)
      • Perl supports internationalization via Unicode
        • The character set that aims to embrace and supplant the characters sets already in-use across the world
        • Unicode itself is unavoidably complex and contradictory
          • It aims to unify character sets with as little modification as possible
      © Garth Gilmour 2008
    • Working With Unicode
      • Unicode defines a character in terms of:
        • A unique number and textual name
        • A representative glyph (which does not preclude others)
        • Annotations - which add extra information informally
        • Properties - which formally group characters together based on a shared criteria (e.g. mathematical symbols)
      • Don’t confuse character sets with character encodings
        • ‘ UTF-8’ favours western characters but Unicode itself does not
      • There are 88 properties a character may have
        • As of Unicode version 4.1.0
        • These can be tested for in regex’s
      © Garth Gilmour 2008
    • Sample Unicode Properties © Garth Gilmour 2008 Unicode Property Description AHex True for ASCII characters used in hexadecimal numbers Alpha True if a character can be compared to others ea The East Asian width of a character (full, half or narrow) IDC Indicates if a character can only be used as the first in an identifier Math True if the character is used in describing mathematical expressions Lower Indicates if a character is a lowercase letter STerm Indicates if a character is used to terminate a sentence Term Indicates if a character is punctuation that terminates a unit WSpace Indicates if a character should be treated as whitespace during parsing
    • Unicode Properties in Expressions
      • Unicode properties can be used as character classes
        • ‘ p{Math}’ matches math symbols and ‘P{Math}’ is the reverse
        • If the name is a single character then the braces can be omitted
      • Shortcuts for character classes support Unicode
        • So in modern interpreters the ‘d’ shortcut is the same as ‘p{IsDigit}’ and ‘D’ is the same as ‘P{IsDigit}’
      • Perl has alias’ for properties and defines its own
        • ‘ p{IsDigit}’ is the more verbose Perl terminology for ‘p{Nd}’
        • ‘ p{IsXDigit}’ is a Perl property name equivalent to ‘[0-9a-fA-F]’
      © Garth Gilmour 2008
    • Perl Subroutines Creating and Calling Functions © Garth Gilmour 2008
    • Introducing Subroutines
      • Subroutines do not have a C/C++ style declaration
        • When you create a subroutine you do not declare its return type or the parameters it will take
        • Instead the syntax for a subroutine is ‘sub NAME { … }’
      • Any subroutine can take any number of parameters
        • These are automatically placed in the special array ‘@_’
          • Not to be confused with ‘$_’ which holds the current item in a loop
        • The first parameter is ‘$_[0]’ and the last is ‘$_[$#_]’
      • The return value can be specified in two ways
        • By default it is the result of the last expression evaluated
        • You can explicitly provide a value via the ‘return’ keyword
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 sub func { my($param1, $param2, $param3, $param4, $param5) = @_; print &quot;func called with: &quot;; print &quot; $param1 &quot;; print &quot; $param2 &quot;; print &quot; $param3 &quot;; print &quot; $param4 &quot;; print &quot; $param5 &quot;; } func(&quot;abc&quot;,123,&quot;def&quot;,456,&quot;ghi&quot;); func called with: abc 123 def 456 ghi
    • Variable Declarations and Scoping
      • As you write functions you will make a startling discovery
        • Variables created inside subroutines remain alive and usable for the rest of the life of the script (unlike in C/C++/Java etc…)
        • This is because all variables are global by default
          • They are represented by an entry in the symbol table
      • Perl lets you control the scope and lifetime of variables
        • Via the ‘my’ , ‘our’ and ‘local’ functions
        • Each of these has a subtly different effect
        • Use these in all but the shortest scripts…
      © Garth Gilmour 2008
    • Variable Declarations and Scoping
      • The ‘my’ function creates a private variable
        • Its scope and lifetime are the block it is declared in
      • The ‘local’ function is unusual
        • It defines a new value for a variable, which is held for the duration of the current scope
        • Once control leaves the current block the value is reset
        • Any sub-methods which get called see the new value
      • The ‘our’ function simply (re)declares a global variable
        • It is the safe way of referencing a global variable within a subroutine or declaring a global variable under ‘use strict’
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 $var1 = &quot;ABC&quot;; $var2 = &quot;DEF&quot;; $var3 = &quot;GHI&quot;; print &quot;At start $var1, $var2 and $var3 &quot;; func1(); print &quot;At end $var1, $var2 and $var3 &quot;; sub func1 { my $var1 = &quot;JKL&quot;; local $var2 = &quot;MNO&quot;; our $var3; print &quot;In func1 $var1, $var2 and $var3 &quot;; func2(); } sub func2 { print &quot;In func2 $var1, $var2 and $var3 &quot;; } At start ABC, DEF and GHI In func1 JKL, MNO and GHI In func2 ABC, MNO and GHI At end ABC, DEF and GHI
    • Using Named Parameters
      • Parameters in Perl are loosely organised
        • In large applications you want to be more specific about what value goes with what parameter
      • There is a simple idiom for naming parameters:
        • Pass parameters into the subroutine in pairs
          • E.g. ‘connect(“ip” => “12.24.5.6”, “port” => 80, “timeout” => 30)’
        • Inside the subroutine load the parameters into a hash
          • The first parameter becomes the key for the second and so on
        • Use parameters by taking values from the hash
          • So rather than saying ‘$_[1]’ we use ‘$params{“ip”}’
          • This allows parameters to be passed in any order
            • As long as the name/value convention is preserved
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 sub func { my %params = @_; print &quot; Parameter fred has value $params{'fred'} &quot;; print &quot; Parameter wilma has value $params{'wilma'} &quot;; print &quot; Parameter barney has value $params{'barney'} &quot;; } print &quot;---------- First Call ---------- &quot;; @args1 = (&quot;fred&quot;,20,&quot;wilma&quot;,30,&quot;barney&quot;,40); func(@args1); print &quot;---------- Second Call ---------- &quot;; @args2 = (fred => 50, wilma => 60, barney => 70); func(@args2); print &quot;---------- Third Call ---------- &quot;; func(fred => 80, wilma => 90, barney => 100); ---------- First Call ---------- Parameter fred has value 20 Parameter wilma has value 30 Parameter barney has value 40 ---------- Second Call ---------- Parameter fred has value 50 Parameter wilma has value 60 Parameter barney has value 70 ---------- Third Call ---------- Parameter fred has value 80 Parameter wilma has value 90 Parameter barney has value 100
    • Subroutines and Recursion
      • The ‘&’ sigil is normally left out when calling subroutines
        • Without parenthesis the symbol is initially treated as a bareword
      • Parenthesis around the parameters are also optional if the interpreter has processed the declaration
        • Whether or not you use them is a matter of style
      • If you use the sigil and omit the parameter list then the parameter array of the current function is passed
        • This creates a compact syntax for recursive functions
      © Garth Gilmour 2008
    • Subroutines and Recursion © Garth Gilmour 2008 sub recursion1 { until($_[0] == 0) { print &quot;$_[0] &quot;; $_[0]--; &recursion1; } } sub recursion2 { if(@_) { print(shift, &quot; &quot;); &recursion2; } } $val1 = 10; recursion1($val1); print &quot; &quot;; @val2 = qw(abc def ghi jkl); recursion2(@val2); 10 9 8 7 6 5 4 3 2 1 abc def ghi jkl
    • Anonymous Subroutines
      • Subroutines can be declared without names
        • Using the syntax ‘sub { … }’
      • What is returned is the address of the subroutine
        • This can be captured in a reference (see later)
      • This enables some ‘meta-programming’ techniques
        • You can write functions which build and return functions
        • You can write functions which take blocks of code as parameters
      • These techniques have their own terminology
        • A block of code passed as a parameter is a closure
        • One function building a simplified version of another is currying
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 Closures are cool! Closures are cool! Closures are cool! Closures are cool! Closures are cool! Closures are very cool! Closures are very cool! Closures are very cool! ab cd ef gh sub doTimes { for(1..$_[0]) { &{$_[1]}(); } } sub withMatches { for($_[0] =~ m/$_[1]/g) { &{$_[2]}($_); } } $ref = sub { print &quot;Closures are cool! &quot;; }; doTimes(5, $ref); doTimes(3, sub { print &quot;Closures are very cool! &quot;; }); withMatches(&quot;ab cd ef gh&quot;, &quot;[a-z]{2}&quot;, sub { print &quot;$_[0] &quot;; });
    • Error Handling Managing Error Conditions © Garth Gilmour 2008
    • Error Handling in Perl
      • Traditional functions report errors via a return code
        • Modern functions raise exceptions
      • There are several functions for raising exceptions
        • The ‘die’ function causes a value to be thrown as an exception
          • The value is typically an error message string
          • But could be a reference to anything you want
        • More versatile functions are provided by the ‘Carp’ module
          • They report errors from the users perspective
      • To test for exceptions use ‘eval’ blocks
        • Errors generated from code inside ‘eval { … }’ are trapped
        • You can test and obtain them via the ‘$@’ variable
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 eval { op1(); }; if($@) { print &quot;Code threw error: &quot;, $@; } sub op1 { op2(); } sub op2 { op3(); } sub op3 { die &quot;BOOM!&quot;; } eval { op1(); }; if($@) { print &quot;Error of severity &quot;, $@->{'severity'}; print &quot; thrown with message &quot;, $@->{'msg'}; } sub op1 { op2(); } sub op2 { op3(); } sub op3 { my %error = (msg => 'BOOM!', severity => 'fatal'); die error; }
    • References in Perl Using Memory Addresses © Garth Gilmour 2008
    • References in Perl
      • References are an advanced feature of Perl
        • They are similar to pointers in ‘C’ and ‘C++’
      • A scalar can hold the address of another variable
        • This could be another scalar, an array, a hash or a function
        • The address of a variable is taken via the ‘’ sigil
      • Sigils are combined when working with references
        • So if ‘ref’ is a reference to an array then the first element could be accessed with ‘$$ref[0]’, ‘${$ref}[0]’ or ‘$ref->[0]’
        • Each syntax is appropriate in different circumstances
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 $var1 = 124; $ref1 = $var1; print $ref1, &quot; &quot;; print &quot;Reference ref1 refers to value $$ref1 &quot;; print &quot;Reference ref1 refers to value ${$ref1} &quot;; SCALAR(0x226d98) Reference ref1 refers to value 124 Reference ref1 refers to value 124
    • © Garth Gilmour 2008 @var2 = (&quot;abc&quot;,&quot;def&quot;,&quot;ghi&quot;,&quot;jkl&quot;); $ref2 = @var2; print $ref2, “ ”; print &quot;Reference ref2 refers to array with contents: &quot;; foreach $val (@$ref2) { print &quot;$val &quot;; } print &quot; Reference ref2 refers to array with contents: &quot;; foreach $val (@{$ref2}) { print &quot;$val &quot;; } print &quot; First three elements in array pointed to by ref2 are: &quot;; @slice = @{$ref2}[0..2]; foreach $val (@slice) { print &quot;$val &quot;; } print &quot; First item is $$ref2[0]&quot;; print &quot; First item is ${$ref2}[0]&quot;; print &quot; First item is $ref2->[0]&quot;;
    • © Garth Gilmour 2008 ARRAY(0x226da4) Reference ref2 refers to array with contents: abc def ghi jkl Reference ref2 refers to array with contents: abc def ghi jkl First three elements in array pointed to by ref2 are: abc def ghi First item is abc First item is abc First item is abc
    • © Garth Gilmour 2008 %var3 = (&quot;k1&quot;,&quot;xxx&quot;,&quot;k2&quot;,&quot;yyy&quot;,&quot;k3&quot;,&quot;zzz&quot;); $ref3 = var3; print &quot;Reference ref3 refers to hash with contents: &quot;; foreach $key (keys %$ref3) { print &quot; $key indexes $$ref3{$key} &quot;; } print &quot;Reference ref3 refers to hash with contents: &quot;; foreach $key (keys %{$ref3}) { print &quot; $key indexes $ref3->{$key} &quot;; } print &quot;Key k1 indexes $$ref3{'k1'} &quot;; print &quot;Key k1 indexes ${$ref3}{'k1'} &quot;; print &quot;Key k1 indexes $ref3->{'k1'} &quot;;
    • © Garth Gilmour 2008 Reference ref3 refers to hash with contents: k2 indexes yyy k1 indexes xxx k3 indexes zzz Reference ref3 refers to hash with contents: k2 indexes yyy k1 indexes xxx k3 indexes zzz Key k1 indexes xxx Key k1 indexes xxx Key k1 indexes xxx
    • References and Anonymous Data
      • All complete programming languages need to have the ability to allocate memory on demand
        • As opposed to pre-declaring it though standard variables
      • Consider processing records from a file
        • You need to allocate memory on the fly for each record you find
      • Memory is allocated via anonymous data structures
        • There is no special keyword but rather a different syntax for creating anonymous arrays, hashes and subroutines
        • It isn't (directly) possible to create anonymous scalars
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 $ref1 = [&quot;ab&quot;,&quot;cd&quot;,&quot;ef&quot;,&quot;gh&quot;]; $ref2 = { k1 => 123, k2 => 456, k3 => 789 }; $ref3 = sub { return $_[0] + $_[1]; }; print $ref1, &quot; &quot;; print $ref2, &quot; &quot;; print $ref3, &quot; &quot;; print $ref1->[0], &quot; &quot;; print $ref2->{'k1'}, &quot; &quot;; print $ref3->(12,5), &quot; &quot;; ARRAY(0x225f88) HASH(0x226d80) CODE(0x18303f0) ab 123 17
    • References and Data Structures
      • Complex data structures are built using arrays and hashes in combination
        • The syntax ‘$ref = [“abc”, “def”, “ghi”]’ creates an anonymous array and stores its address in ‘ref’
        • The syntax ‘$ref = { “k1” => “v1”, “k2” => “v2”}’ creates an anonymous hash and stores its address in ‘ref’
      • So an exam marking script could be made up of:
        • An array of references to anonymous hashes
        • Where each hash held a candidates details
        • Including a reference to an anonymous array of answers
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 @numerals = ( [100,&quot;C&quot;], [90,&quot;XC&quot;], [50,&quot;L&quot;], [40,&quot;XL&quot;], [10,&quot;X&quot;], [9,&quot;IX&quot;], [5,&quot;V&quot;], [4,&quot;IV&quot;], [1,&quot;I&quot;] ); print &quot;Enter the number to convert to a roman numeral... &quot;; $number = <STDIN>; chomp($number); foreach (@numerals) { my $decimal = $_->[0]; my $string = $_->[1]; my $times = int($number / $decimal); if($times > 0) { for(1..$times) { print $string; } $number = $number % $decimal; } }
    • © Garth Gilmour 2008 $ref1 = [ [&quot;abc&quot;,&quot;def&quot;,&quot;ghi&quot;], [&quot;jkl&quot;,&quot;mno&quot;,&quot;pqr&quot;], [&quot;stu&quot;,&quot;vwx&quot;,&quot;yza&quot;] ]; print &quot;Contents of 2d array are: &quot;; foreach(@{$ref1}) { print &quot; &quot;; foreach(@{$_}) { print &quot;$_ &quot;; } print &quot; &quot;; } Contents of 2d array are: abc def ghi jkl mno pqr stu vwx yza
    • © Garth Gilmour 2008 $ref = { k1 => { k4 => &quot;ab&quot;, k5 => &quot;cd&quot; }, k2 => { k6 => &quot;ef&quot;, k7 => &quot;gh&quot; }, k3 => { k8 => &quot;ij&quot;, k9 => &quot;kl&quot; } }; print $ref->{'k1'}->{'k4'}, &quot; &quot;; print $ref->{'k1'}->{'k5'}, &quot; &quot;; print $ref->{'k2'}->{'k6'}, &quot; &quot;; print $ref->{'k2'}->{'k7'}, &quot; &quot;; print $ref->{'k3'}->{'k8'}, &quot; &quot;; print $ref->{'k3'}->{'k9'}, &quot; &quot;; ab cd ef gh ij kl
    • Modules and Packages Creating Reusable Code © Garth Gilmour 2008
    • Modules and Code Reuse in Perl
      • Modules are Perl libraries
        • You can create your own or download them from CPAN
        • They are normally found in the ‘lib’ folder of your distribution
      • There are several types of module:
        • Traditional and object-oriented modules are for code reuse
          • They let you avoid re-inventing the wheel in and across projects
        • Pragmatic Modules extend the language
          • When loaded they alter symbol tables and interact with the interpreter, thereby adding to Perl’s functionality
      © Garth Gilmour 2008
    • Creating Perl Modules
      • Modules are placed in a separate file
        • By convention this is given a ‘.pm’ extension
        • Pragmatic modules have lowercase names
        • Other module names should contain capitals
      • The module begins with a package declaration
        • This creates a new namespace for symbols
        • Within the interpreter this is represented by a new symbol table
      • There are two ways of loading a module
        • The ‘use’ declaration loads a module at compile time
        • The ‘require’ declaration loads a module at runtime
      © Garth Gilmour 2008
    • Creating Perl Modules
      • Perl does not strictly enforce barriers between modules
        • Symbols from a module can always be used
          • By prefixing the symbol name with the package name
        • You should only use the symbols a module wants you to see
      • There is a standard mechanism for exporting symbols
        • The module needs to require the ‘Exporter’ module
          • And place the name ‘Exporter’ into an array called ‘@ISA’
        • This allows the module to place entries in other symbol tables
          • Symbols placed in an array called ‘@EXPORT’ are automatically added to the table of the script with the ‘use’ declaration
          • Symbols placed in an array called ‘@EXPORT_OK’ will be added to the importing symbol table if they are listed after ‘use’
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 package Maths; require Exporter; our @ISA = (&quot;Exporter&quot;); @EXPORT = qw(add multiply subtract); sub add { return $_[0] + $_[1]; } sub divide { return $_[0] / $_[1]; } sub multiply { return $_[0] * $_[1]; } sub subtract { return $_[0] - $_[1]; } use Maths; print &quot;Calculations using our maths module:&quot;; print &quot; 40 + 30 is: &quot;, add(40,30); print &quot; 60 - 20 is: &quot;, subtract(60,20); print &quot; 50 * 10 is: &quot;, multiply(50,10); # divide is not exported so it only works # if we qualify the namespace print &quot; 40 / 10 is: &quot;, Maths::divide(40,10);
    • © Garth Gilmour 2008 package Maths; require Exporter; our @ISA = (&quot;Exporter&quot;); @EXPORT_OK = qw(add multiply subtract); sub add { return $_[0] + $_[1]; } sub divide { return $_[0] / $_[1]; } sub multiply { return $_[0] * $_[1]; } sub subtract { return $_[0] - $_[1]; } use Maths qw(add multiply subtract); print &quot;Calculations using our maths module:&quot;; print &quot; 40 + 30 is: &quot;, add(40,30); print &quot; 60 - 20 is: &quot;, subtract(60,20); print &quot; 50 * 10 is: &quot;, multiply(50,10); # divide is not exported so it only works # if we qualify the namespace print &quot; 40 / 10 is: &quot;, Maths::divide(40,10);
    • Extra Syntax for Modules
      • Modules can prevent symbols from being exported
        • If you place a symbol name in ‘@EXPORT_FAIL’ then Perl will call an ‘export_fail’ subroutine, which can throw an error
      • Simple versioning is supported
        • A module can declare a ‘$VERSION’ variable
        • Which can then be mentioned in the ‘use’ declaration
          • E.g. ‘use Fred 2.7;’ means only version 2.7 will be accepted
      • Code placed within the module will be executed
        • For the module to load the last statement run must be true
          • Most modules end with ‘1;’ to ensure this is the case
        • A ‘BEGIN { … }’ block is run when the module is loaded
      © Garth Gilmour 2008
    • Objects in Perl Support for Object Oriented Programming Concepts © Garth Gilmour 2008
    • Object Oriented Perl
      • OO support in Perl is minimal at best
        • It is not a good language for learning Object Oriented coding
        • Perl provides only rudimentary support for classes and objects
        • Leaving you to do most of the hard work yourself
      • The ‘bless’ keyword is the key to OO in Perl
        • Once you properly understand what it does the rest of OO in Perl becomes relatively straightforward
      • It helps to approach Perl OO indirectly
        • We will consider class declarations in Python and Ruby first
        • This is a good warm-up for understanding the Perl syntax
      © Garth Gilmour 2008
    • Core Principles of OO Languages
      • All popular OO languages use the same concepts:
        • Class declarations are the templates for objects
        • A class declaration is made up of members
          • Members holding data are known as fields
          • Members holding code are known as methods
        • Special methods are provided with their own syntax
          • The most important of these is the constructor method to be called automatically when an object is created
        • Every object has a built in reference to itself
          • Similar to the ‘127.0.0.1’ IP address in networking
      • It helps to think of members as slots on the object
        • Only slots on the outside can be used by clients
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 Client Code Account Objects account1 account2 withdraw { … } display { … } withdraw { … } display { … }
    • © Garth Gilmour 2008 class Account: def __init__(self,id,balance): self.id = id self.balance = balance def withdraw(self, amount): self.balance -= amount def display(self): print &quot;Account with id&quot;, self.id, &quot;and balance&quot;, self.balance account1 = Account(&quot;AB12&quot;,30000) account2 = Account(&quot;CD34&quot;,45000) print &quot;----- Before Withdrawl -----&quot; account1.display() account2.display() account1.withdraw(250) account2.withdraw(500) print &quot;----- After Withdrawl -----&quot; account1.display() account2.display() ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44500 Constructor Class Declaration Creating Objects
    • © Garth Gilmour 2008 class Account def initialize(id, balance) @id = id @balance = balance end def withdraw(amount) @balance -= amount end def display() puts &quot;Account with id #{@id} and balance #{@balance}&quot; end end account1 = Account.new(&quot;AB12&quot;,30000) account2 = Account.new(&quot;CD34&quot;,45000) puts &quot;----- Before Withdrawl -----&quot; account1.display() account2.display() account1.withdraw(250) account2.withdraw(500) puts &quot;----- After Withdrawl -----&quot; account1.display() account2.display() ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44500 Class Declaration Constructor Creating Objects
    • Perl Syntax for Object Orientation
      • Perl does not have separate class declarations
        • Instead a class is just a special type of module
      • Fields are saved in anonymous hashes
        • Or in whatever anonymous data structure you want
      • An arbitrary method is used as a constructor
        • It creates the hash and blesses it into the package
      • Blessing associates the hash with the package
        • Methods of the package can be called via the hash reference
        • In the method declaration the hash is the first parameter
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 package Account; sub new { my $packageName = shift; my $data = {}; $data->{'id'} = shift; $data->{'balance'} = shift; bless $data, $packageName; } sub withdraw { my $data = shift; $data->{'balance'} -= $_[0]; } sub display { my $data = shift; print &quot;Account with id $data->{'id'} and balance $data->{'balance'} &quot;; } Packages double as classes Any method can be a constructor Anonymous hashes hold fields
    • © Garth Gilmour 2008 $account1 = Account->new(&quot;AB12&quot;,30000); $account2 = Account->new(&quot;CD34&quot;,45000); print &quot;----- Before Withdrawl ----- &quot;; $account1->display(); $account2->display(); $account1->withdraw(250); $account2->withdraw(500); print &quot;----- After Withdrawl ----- &quot;; $account1->display(); $account2->display(); ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44500 Creating Objects: Account->new(“AB”,1000) equals Account::new(‘Account’, “AB”, 1000) Because of blessing: $account->withdraw(250) equals Account::withdraw($account, 250)
    • Inheritance and Overriding in Perl
      • These are the other two key principles of OO
        • Inheritance enables one class to be built on top of another
        • Overriding enables derived class methods to replace those inherited from the base class
      • It helps to visualize objects as layered
        • For each class in the hierarchy there is a layer in the object
        • When a method is overridden the slot in a base layer is rewired into an implementation in the derived layer
      • Again it helps to review the syntax of other languages
        • Once again the Perl syntax is minimal
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 Client Code account1 account2 withdraw { … } { … } Account SavingsAccount display { … } withdraw { … } display { … } Account
    • © Garth Gilmour 2008 class Account: def __init__(self,id,balance): self.id = id self.balance = balance def withdraw(self, amount): self.balance -= amount def display(self): print &quot;Account with id&quot;, self.id, &quot;and balance&quot;, self.balance class SavingsAccount(Account): def __init__(self,id,balance,fee): Account.__init__(self,id,balance) self.fee = fee def withdraw(self, amount): self.balance -= (amount + self.fee) account1 = Account(&quot;AB12&quot;,30000) account2 = SavingsAccount(&quot;CD34&quot;,45000,15) print &quot;----- Before Withdrawl -----&quot; account1.display() account2.display() account1.withdraw(250) account2.withdraw(500) print &quot;----- After Withdrawl -----&quot; account1.display() account2.display() ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44485
    • © Garth Gilmour 2008 class Account def initialize(id, balance) @id = id @balance = balance end def withdraw(amount) @balance -= amount end def display() puts &quot;Account with id #{@id} and balance #{@balance}&quot; end end class SavingsAccount < Account def initialize(id, balance, fee) super(id,balance) @fee = fee end def withdraw(amount) @balance -= (amount + @fee) end end account1 = Account.new(&quot;AB12&quot;,30000) account2 = SavingsAccount.new(&quot;CD34&quot;,45000,15) puts &quot;----- Before Withdrawl -----&quot; account1.display() account2.display() account1.withdraw(250) account2.withdraw(500) puts &quot;----- After Withdrawl -----&quot; account1.display() account2.display() ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44485
    • Inheritance and Overriding in Perl
      • Inheritance is supported via ‘@ISA’
        • One module requires another and places its name in ‘@ISA’
        • This means that if a symbol is not found in the derived module the interpreter will search the base one
          • Thereby creating the layered effect associated with inheritance
      • The pragmatic module ‘base’ simplifies this
        • E.g. ‘use base Employee’ in ‘Manager.pm’
      • Overriding works in the same way
        • The search order means the derived version is found first
        • The ‘SUPER’ symbol lets you access the base version
          • This is particularly useful when creating derived constructors
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 use Account; use SavingsAccount; $account1 = Account->new(&quot;AB12&quot;,30000); $account2 = SavingsAccount->new(&quot;CD34&quot;,45000,15); print &quot;----- Before Withdrawl ----- &quot;; $account1->display(); $account2->display(); $account1->withdraw(250); $account2->withdraw(500); print &quot;----- After Withdrawl ----- &quot;; $account1->display(); $account2->display(); ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44485
    • © Garth Gilmour 2008 package Account; require Exporter; our @ISA = &quot;Exporter&quot;; our @EXPORT = qw(new withdraw display); sub new { my $packageName = shift; my $data = {}; $data->{'id'} = shift; $data->{'balance'} = shift; bless $data, $packageName; } sub withdraw { my $data = shift; $data->{'balance'} -= $_[0]; } sub display { my $data = shift; print &quot;Account with id $data->{'id'} and balance $data->{'balance'} &quot;; } 1; package SavingsAccount; use base Account; sub new { $data = shift->SUPER::new(@_); $data->{'fee'} = $_[2]; return $data; } sub withdraw { my $data = shift; $data->{'balance'} -= ($_[0] + $data->{'fee'}); } 1;
    • Parsing XML An Example of an OO Module © Garth Gilmour 2008
    • An Example of Parsing XML Files
      • Text files are increasingly formatted as XML
        • This adds an extra layer of structure that makes it easier to preserve the semantics of the data
      • There are many API’s for parsing and creating XML
        • Perl has modules that support all of these standards
      • Note XPath is the XML version of regular expressions
        • In XPath V2 regular expressions can be used within an XPath
      • The most basic is the SAX standard
        • This is a low-level event driven API
        • Implemented by the ‘XML::Parser’ module
      © Garth Gilmour 2008
    • An Example of Parsing XML Files
      • To parse XML create an instance of ‘XML::Parser’
        • As normal the constructor method is called ‘new’
      • The constructor method takes named parameters
        • The ‘handler’ parameter should be an anonymous hash
      • The hash defines callback methods
        • E.g. the ‘Start’ and ‘End’ keys index methods to be called whenever the parser encounters opening or closing tags
      • Parsing is triggered via a call to ‘parse’ or ‘parsefile’
        • The parser reads in the XML from the file or string
        • As different parts of the file are met callbacks are triggered
          • All your implementation is placed in the callback methods
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <myapp> <resources> <threads>7</threads> <server-ip>120.153.72.208</server-ip> <cache size=&quot;10&quot;/> </resources> <accounts> <user role=&quot;administrator&quot;> <id>dave</id> <password>abab12</password> </user> <user role=&quot;power-user&quot;> <id>jane</id> <password>cdcd34</password> </user> <user role=&quot;end-user&quot;> <id>fred</id> <password>efef56</password> </user> <user role=&quot;end-user&quot;> <id>sharon</id> <password>ghghij</password> </user> </accounts> </myapp> We found the following users administrator dave with password abab12 power-user jane with password cdcd34 end-user fred and with password efef56 end-user sharon with password ghghij
    • © Garth Gilmour 2008 #Variables to temporarily store user details our $userID; our $userRole; our $userPassword; #Flags to let us know when we are in a particular element our $isInID = 0; our $isInPassword = 0; our $parser = new XML::Parser(Handlers => { Start => &startElement, End => &endElement, Char => &characters }); $parser->parsefile(&quot;config.xml&quot;); print &quot;We found the following users &quot;; foreach(@users) { my $user = $_; print &quot; $$user[1] with ID $$user[0] and password $$user[2] &quot;; }
    • © Garth Gilmour 2008 sub startElement { if($_[1] eq &quot;user&quot;) { $userRole = $_[3]; } elsif($_[1] eq &quot;id&quot;) { $isInID = 1; } elsif($_[1] eq &quot;password&quot;) { $isInPassword = 1; } } sub endElement { if($_[1] eq &quot;user&quot;) { my $newUser = [$userID, $userRole, $userPassword]; push(@users, $newUser); } elsif($_[1] eq &quot;id&quot;) { $isInID = 0; } elsif($_[1] eq &quot;password&quot;) { $isInPassword = 0; } } sub characters { if($isInID) { $userID = $_[1]; } if($isInPassword) { $userPassword = $_[1]; } }
    • Database Access Another Example of OO Perl © Garth Gilmour 2008
    • Database Access in Perl
      • ‘ DBI’ is the standard module for database access
        • It enables you to access any relational database
      • Like most modern API’s the ‘DBI’ module is a shell
        • As with JDBC and ADO.NET the purpose of the ‘DBI’ module is to expose a common interface that conceals the type of the DB
        • The actual functionality is provided by driver modules which implement the standard functionality in a vendor specific way
        • The ODBC driver lets you talk to any database
          • But is probably not as efficient as a native driver
      © Garth Gilmour 2008
    • The Architecture of the DBI Library © Garth Gilmour 2008 DBI API DBD-mysql DBD-Oracle DBD-Sybase DBD-ODBC
    • Using the DBI Module
      • A ‘database handle’ is a link to the underlying driver
        • It is a reference to an object that represents a connection
      • A database handle is created by a call to ‘connect’
        • The parameter is a string that identifies the database
        • This is written as ‘dbi: DRIVER_NAME : DRIVER_INFO ’
      • Forming the right connection string is half the battle
        • Make sure you are using the right documentation for the driver
      • The ‘disconnect’ method terminates the connection
        • It is good practise to do this explicitly even in short scripts
      © Garth Gilmour 2008
    • Using the DBI Module
      • Database handles are factories for statement handles
        • These are obtained by calling the ‘prepare’ method
        • The SQL string is specified as the parameter
      • The statement is triggered via the ‘execute’ method
        • If the query is a SELECT then a result set is obtained
        • The results are stored inside the statement
          • This is done in a vendor specific way
      • The results can be iterated one row at a time
        • There a variety of methods for retrieving the values in a row
        • The ‘dump_results’ method is a quick way of printing the data
      © Garth Gilmour 2008
    • © Garth Gilmour 2008 my $connectionString = &quot;dbi:ODBC:SomeDB&quot;; my $dbh = DBI->connect($connectionString) or die &quot;cant connect to DB!&quot;; my $statement = $dbh->prepare($insertStatement); $statement->execute($val1, $val2, $val3, $val4); listAllRows($dbh); $statement = $dbh->prepare($deleteStatement); $statement->execute(&quot;100&quot;); listAllRows($dbh); $dbh->disconnect(); sub listAllRows { my($dbh) = $_[0]; my $statement = $dbh->prepare($selectStatement); $statement->execute; print &quot;Table contents are &quot;; while(my($column1,$column2,$column3) = $statement->fetchrow()) { print “$column1 $column2 $column3 &quot; } }
    • Course Project An Exam Marking System © Garth Gilmour 2008
    • A Course Project - Marking Exams
      • We will be writing a script to process exam results
        • Stage 1: A hash of arrays (marks per candidate)
        • Stage 2: A hash of hashes (marks per candidate per subject)
        • Stage 3: A hash of hashes of arrays
          • Holding individual marks per subject per candidate
      © Garth Gilmour 2008 Name Mark Dave Jane Fred 80 70 64 81 56 90 93 76 87 64 59 55 62 68 68 71 55 79
    • © Garth Gilmour 2008 Name Link dave jane fred Subject Mark History 80 Maths 70 French 65 Subject Mark English 66 Maths 70 Politics 82 Subject Mark Physics 61 History 73 Spanish 52
    • © Garth Gilmour 2008 Name Link dave jane fred Subject Mark History Maths French Subject Mark English Maths Politics Subject Mark Physics History Spanish 10 9 10 7 5 6 9 9 8 7 6 7 4 5 8 6 0 9 5 8 7 7 6 8 7 8 4 6 5 9 6 6 7 9 8 8 10 10 8 7 6 8 9 6 8 5 0 9 10 8 7 4 8 9