You Can Do It! Start Using Perl to Handle Your Voyager Needs

1,445 views
1,375 views

Published on

Getting started with Perl with a view towards using it with the Voyager LMS
Presented at ELUNA 2008

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,445
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Run demo 22 and 23
  • You Can Do It! Start Using Perl to Handle Your Voyager Needs

    1. 1. You Can Do It! Start Using Perl to Handle Your Voyager Needs.
    2. 2. Some Perl nomenclature <ul><li>PERL – Practical Extraction and Report Language </li></ul>(camel by O’Reilly)
    3. 3. Some Perl nomenclature <ul><li>PERL – Practical Extraction and Report Language </li></ul><ul><li>PERL – Pathologically Eclectic Rubbish Lister (not really) </li></ul>(camel by O’Reilly)
    4. 4. Some Perl nomenclature <ul><li>PERL – Practical Extraction and Report Language </li></ul><ul><li>PERL – Pathologically Eclectic Rubbish Lister (not really) </li></ul><ul><li>TMTOWTDI – There’s More Than One Way To Do It </li></ul>(camel by O’Reilly)
    5. 5. Some Perl attributes <ul><li>it’s a scripted language, not compiled - </li></ul><ul><li>faster, easier development </li></ul><ul><li>runs plenty fast for most things </li></ul>
    6. 6. Some Perl attributes <ul><li>it’s a scripted language, not compiled - </li></ul><ul><li>faster, easier development </li></ul><ul><li>runs plenty fast for most things </li></ul><ul><li>Loose variable typing - </li></ul><ul><li>both good and bad, </li></ul><ul><li>but mostly good </li></ul>
    7. 7. Your first program #!/usr/local/bin/perl print &quot;Hello, World &quot;;
    8. 8. “ Protecting” your program (Unix) By default, your program is not executable. chmod 744 your_program You can execute it as owner of the file, anyone else can only read it.
    9. 9. Variables $name can be text or number: a character, a whole page of text, or any kind of number context determines type can go “both” ways
    10. 10. Variables, array of @employee Array of $employee variables $employee[0] $employee[1] etc.
    11. 11. Variables, hash of $lib{‘thisone’} = “2 days”; $lib{‘thatone’} = “5 days”; Thus can use $grace_period = $lib{$libname} when $libname is thatone, $grace_period is 5 days
    12. 12. Variables, list of ($var1, $var2, $var3) = function_that_does_something; This function returns a list of elements. A list is always inside parentheses ().
    13. 13. Variables, assigning a value to $var = value or expression $array[n] = something; @array = (); # empty array %hash = (); # empty hash Can be done almost anywhere, anytime.
    14. 14. use strict; Requires that you declare all variables like this: my $var; my $var = something; my @array = (); Also makes Perl check your code. Best Practices! Variable scope, and good practices
    15. 15. Variable scope, and good practices use strict; my $var; my $var = something; my @array = (); A variable declared like this is visible throughout your program. Best Practices!
    16. 16. Variable scope, and good practices use strict; my $var; my $var = something; my @array = (); A “my” declaration within code grouped within { and } is visible only in that section of code; it does not exist elsewhere. Best Practices! Scope: where in a program a variable exists.
    17. 17. $_ default parameter for many functions $. current record/line number in current file $/ input record separator (usually the newline character) $, print() function output separator (normally an empty string) $0 name of the Perl script being executed $^T time, in seconds, when the script begins running $^X full pathname of the Perl interpreter running the current script some Special Variables
    18. 18. @ARGV array which contains the list of the command line arguments @Inc array which contains the list of directories where Perl can look for scripts to execute (for use DBI and other modules) %ENV hash variable which contains entries for your current environment variables some Special Variables
    19. 19. STDIN read from the standard input file handle (normally the keyboard) STDOUT send output to the standard output file handle (normally the display) STDERR send error output to the standard error file handle (normally the display) DATA file handle referring to any data following __END__ and dozens more… some Special Variables
    20. 20. String manipulation & other stuff Given $stuff = “this is me”; These are not equivalent: “ print $stuff” ‘ print $stuff’ `print $stuff`
    21. 21. String manipulation & other stuff Given $stuff = “this is me”; These are not equivalent: “ print $stuff” is “print this is me” ‘ print $stuff’ `print $stuff`
    22. 22. String manipulation & other stuff Given $stuff = “this is me”; These are not equivalent: “ print $stuff” is “print this is me” ‘ print $stuff’ is ‘print $stuff’ `print $stuff`
    23. 23. String manipulation & other stuff Given $stuff = “this is me”; `print $stuff` would have the operating system try to execute the command <print this is me>
    24. 24. String manipulation & other stuff This form should be used as $something = `O.S. command` Example: $listing = ‘ls *.pl`; The output of this ls command is placed, as possibly a large string, into the variable $listing. This syntax allows powerful processing capabilities within a program.
    25. 25. printf, sprintf printf(“%s lines here”, $counter) if $counter is 42, we get 42 lines here for the output
    26. 26. printf, sprintf printf(“%c lines here”, $counter) if $counter is 42, we get * lines here for the output, since 42 is the ASCII value for “*”, and we’re printing a c haracter
    27. 27. printf, sprintf Some additional string formatting… %s – output length is length($var) %10s – output length is absolutely 10 (right justified) %10.20s – output length is min 10, max 20 %-10.10s – output length is absolutely 10 (left justified) Any padding is with space characters.
    28. 28. printf, sprintf Some additional number formatting… %d – output length is length($var) %10d – output length is absolutely 10 (leading space padded) %-10d – left justified, absolutely 10 (trailing space padded) %-10.10d – right justified, absolutely 10 (leading zero padded)
    29. 29. printf, sprintf Still more number formatting… %f – output length is length($var) %10.10f – guarantees 10 positions to the right of the decimal (zero padded)
    30. 30. printf, sprintf printf whatever outputs to the screen
    31. 31. printf, sprintf printf whatever outputs to the screen printf file whatever outputs to that file Ex: printf file (“this is %s fun ”, $much); ( print functions just like the above, as to output destination.)
    32. 32. printf, sprintf printf whatever outputs to the screen printf file whatever outputs to that file Ex: printf file (“this is %s fun ”, $much); ( print functions just like the above, as to output destination.) sprintf is just like any printf, except that its output always goes to a string variable. Ex: $var = sprintf(“this is %s fun ”, $much);
    33. 33. substr get a portion of a string index get the location of a string in a string length get the length of a string ord, char convert a character to its ASCII value and vice versa Some other common functions $var = ƒ(x);
    34. 34. uc, lc convert a string entirely to upper or lower case ucfirst, convert the first character of a string to lcfirst upper or lower case Some other common functions $var = ƒ(x);
    35. 35. split convert a string into pieces based on a supplied character join convert a list of strings into one string, joined by a supplied character Some other common functions $var = ƒ(x);
    36. 36. @person contains a large number of people foreach $individual (@person) { print “this is person $individual ”; } no subscript required! cleaner code Loop stuff foreach , with an array
    37. 37. @person contains a large number of people $idnum = 0; while ($idnum < @person) { print “this is person $person[$idnum] ”; $idnum++; } not as clean as using foreach, but sometimes this makes more sense Loop stuff while , with an array
    38. 38. @person contains a large number of people for ($idnum=scalar(@person); $idnum--; $idnum>=0) { print “this is person $person[$idnum] ”; } conventional for loop Loop stuff for , with an array (backwards traversal)
    39. 39. @person contains a large number of people for ($idnum=scalar(@person); $idnum--; $idnum>=0) { next if ($person[$idnum] eq “Harry”) print “this is person $person[$idnum] ”; } skip anybody named Harry Loop stuff, more control
    40. 40. @person contains a large number of people for ($idnum=scalar(@person); $idnum--; $idnum>=0) { last if ($person[$idnum] eq “Penelope”) print “this is person $person[$idnum] ”; } next_program_line; once we get to Penelope, leave the loop, and resume execution at next_program_line Loop stuff, more control
    41. 41. @person = (); … while (“reading a file”) # this line is not real code! { $name = substr($file_line, 0, 30); push @person, $name; } populate an array simply, no hassle with an index variable One last bit of array stuff…
    42. 42. File input and output (I/O)
    43. 43. “ slurping” a file
    44. 44. File test operators Here are a few: -d tests if the file is a directory -e tests if the file exists -s returns the size of the file in bytes -x tests if the file can be executed Example: $filesize = -s $file
    45. 45. Date and Time in Perl, basic ### &quot;create&quot; today's date my ($sec, $min, $hour, $day, $month, $year, $wday, $yday, $isdst) = localtime; This gets the date and time information from the system.
    46. 46. Date and Time in Perl, basic ### &quot;create&quot; today's date my ($sec, $min, $hour, $day, $month, $year, $wday, $yday, $isdst) = localtime; my $today = sprintf (&quot;%4.4d.%2.2d.%2.2d&quot;, $year+1900, $month+1, $day); This puts today’s date in “Voyager” format, 2006.04.26
    47. 47. Date and Time in Perl The program, datemath.pl, is part of your handout. The screenshot below shows its output.
    48. 48. Regular expressions, matching m/PATTERN/gi If the m for matching is not there, it is assumed. The g modifier means to find globally, all occurrences. The i modifier means matching case insensitive. Modifiers are optional; others are available.
    49. 49. Regular expressions, substituting s/PATTERN/REPLACEWITH/gi The s says that substitution is the intent. The g modifier means to substitute globally, all occurrences. The i modifier means matching case insensitive. Modifiers are optional; others are available.
    50. 50. Regular expressions, transliterating tr/SEARCHFOR/REPLACEWITH/cd The tr says that transliteration is the intent. The c modifier means transliterate whatever is not in SEARCHFOR. The d modifier means to delete found but unreplaced characters. Modifiers are optional; others are available.
    51. 51. Regular expressions # if the pattern matches if ($var =~ /regular expression/) { make_something_happen; }
    52. 52. Regular expressions # if the pattern does NOT match if ($var !~ /regular expression/) { make_something_happen; }
    53. 53. Regular expressions # contents of $var will be changed # 1 st occurrence of this changes to that $var =~ s/this/that/; # all occurrences of this are changed to that $var =~ s/this/that/g;
    54. 54. Regular expressions # contents of $var will be changed # converts all lower case letters to # upper case letters $var =~ tr/a-z/A-Z/;
    55. 55. Regular expressions Some simple stuff to get started… m/thisx*/ * find zero or more ‘x’ right after ‘this’ m/thisx+/ + find one or more ‘x’ right after ‘this’ m/thisx?/ ? find zero or one ‘x’ right after ‘this’
    56. 56. Regular expressions Some simple stuff to get started… m/thisx*/ * find zero or more ‘x’ right after ‘this’ m/thisx+/ + find one or more ‘x’ right after ‘this’ m/thisx?/ ? find zero or one ‘x’ right after ‘this’ m/[0-9]{5}/ find exactly five consecutive digits m/[0-9]{5,}/ find at least five consecutive digits m/[0-9]{5,7}/ find from five to seven consecutive digits
    57. 57. Regular expressions Some more simple stuff… m/^this/ find ‘this’ only at the beginning of the string m/this$/ find ‘this’ only at the end of the string
    58. 58. Regular expressions Some more simple stuff… m/^this/ find ‘this’ only at the beginning of the string m/this$/ find ‘this’ only at the end of the string Some specific characters: newline (line feed) carriage return tab f form feed null
    59. 59. Regular expressions Some more simple stuff… m/^this/ find ‘this’ only at the beginning of the string m/this$/ find ‘this’ only at the end of the string Some specific characters: Some generic characters: newline (line feed) d any digit carriage return D any non-digit character tab s any whitespace character f form feed S any non-whitespace character null
    60. 60. Regular expressions Look in the Perl book (see Resources) for an explanation on how to use regular expressions. You can look around elsewhere, at Perl sites, and in other books, for more information and examples. Looking at explained examples can be very helpful in learning how to use regular expressions. (I’ve enclosed some I’ve found useful; see Resources.)
    61. 61. Regular expressions Very powerful mechanism. Often hard to understand at first glance. Can be rather obtuse and frustrating! If one way doesn’t work, keep at it. Most likely there is a way that works!
    62. 62. DBI stuff What is it and why might I want it? DBI is the DataBase Interface module for Perl. You will also need the specific DBD (DataBase Driver) module for Oracle. This enables Perl to perform queries against your Voyager database. Both of these should already be on your Voyager box.
    63. 63. DBI stuff, how to You need four things to connect to Voyager: machine name your.machine.here.edu username your_username password your_password SID VGER (or LIBR)
    64. 64. DBI stuff, how to $dbh is the handle for the database $sth is the handle for the query Create a query…then execute it. NOTE: SQL from Access will most likely NOT work here!
    65. 65. DBI stuff, how to Get the data coming from your query.
    66. 66. DBI stuff, how to Get the data coming from your query. You’ll need a Perl variable for each column returned in the query. Commonly a list of variables is used; you could also use an array.
    67. 67. DBI stuff, how to Get the data coming from your query. You’ll need a Perl variable for each column returned in the query. Commonly a list of variables is used; you could also use an array. Typically, you get your data in a while loop, but you could have $var = $sth->fetchrow_array; when you know you’re getting a single value.
    68. 68. DBI stuff, how to When you’re done with a query, you should finish it. This becomes important when you have multiple queries in succession. You can have multiple queries open at the same time. In that case, make the statement handles unique…$sth2, or $sth_patron. Finally, you can close your database connection.
    69. 69. CPAN Comprehensive Perl Archive Network http://cpan.org You name it and somebody has probably written a Perl module for it, and you’ll find it here. There are also good Perl links here; look for the Perl Bookmarks link.
    70. 70. CPAN Installing modules You need to be root for systemwide installation on Unix systems. On Windows machines, you’ll probably need to be administrator. You can install them “just for yourself” with a bit of tweaking, and without needing root access. If you’re not a techie, you’ll probably want to find someone who is, to install modules. Installing modules from CPAN is beyond the scope of this presentation.
    71. 71. Perl on your PC You can get Perl for your PC from ActiveState. They typically have two versions available; I recommend the newer one. Get the MSI version. Installation is easy and painless, but it may take some time to complete. A lot of modules are included with this distribution; many additional modules are available. Module installation is made easy via the Perl Package Manager (PPM).
    72. 72. Perl on your PC To use ppm in ActiveState Perl, open a command prompt window and enter ppm . Help is available by simply typing help. Some useful commands in ppm are: query * show what’s already installed search pkg look for package pkg at ActiveState’s repository install pkg retrieve and install package pkg on your machine
    73. 73. Perl on your PC If you can’t find the module you’re looking for at ActiveState, you should be able to find it at CPAN, and will have to install it manually.
    74. 74. Voyager examples Based on my experience ( your mileage may vary), there are two main types of applications, for Voyager:
    75. 75. Voyager examples Based on my experience ( your mileage may vary), there are two main types of applications, for Voyager: reports , or data retrievals, from the database
    76. 76. Voyager examples Based on my experience ( your mileage may vary), there are two main types of applications, for Voyager: reports , or data retrievals, from the database data manipulation , mainly of files to be imported
    77. 77. Voyager example, a simple report This report finds patrons with multiple email addresses
    78. 78. Voyager example, a simple report Tells the system where to find Perl
    79. 79. Voyager example, a simple report Will be querying the Voyager database
    80. 80. Voyager example, a simple report Set up output file name
    81. 81. Voyager example, a simple report Carefully open the output file for use
    82. 82. Voyager example, a simple report Keep password data in ONE file. Why? one point of maintenance (less work when the password changes) reduces opportunities for error anyone can see the source code without seeing the password data
    83. 83. Voyager example, a simple report Get some information for each patron
    84. 84. Voyager example, a simple report Get the patron identifying data in a loop, and…
    85. 85. Voyager example, a simple report Get the patron identifying data in a loop, and set up the query to get the email address(es) for this patron
    86. 86. Voyager example, a simple report In an “inner” loop, get email address data for this patron
    87. 87. Voyager example, a simple report In an “inner” loop, get email address data for this patron. Preformat the fields for future output.
    88. 88. Voyager example, a simple report In an “inner” loop, get email address data for this patron. Preformat the fields for future output. Populate the address array with each address for this patron. (note that this array starts out empty for each patron, see previous slide)
    89. 89. Voyager example, a simple report If this patron has more than one email address, then we are interested
    90. 90. Voyager example, a simple report Remove trailing spaces from the name parts, then concatenate the parts together
    91. 91. Voyager example, a simple report Now output the multiple email addresses for this patron
    92. 92. A sample of the output Voyager example, a simple report
    93. 93. Voyager example, some data manipulation This program processes incoming authority records: remove records whose 010 |a fields begin with &quot;sj“ remaining records are stripped of the 9xx fields
    94. 94. Voyager example, some data manipulation Specify the file to be processed as a command line parameter. If no parameter is supplied, display a short paragraph that shows how to use this program, then exit.
    95. 95. Voyager example, some data manipulation Set up the |a subfield “delimiter”. This will be used later in the 010 field.
    96. 96. Voyager example, some data manipulation We could have used $ARGV[0] as the filename variable, but using $marcin makes the program more readable
    97. 97. Voyager example, some data manipulation An example of “slurping”, reading the file into an array without resorting to a loop
    98. 98. Voyager example, some data manipulation This an example of early code sticking around too long. It should be rewritten: Insert this line before accessing the file: $/ = chr(0x1d); # use the MARC end-of-record terminator Then get the data this way: @marcrecords = <marcin>; The above code can be eliminated by these simple changes.
    99. 99. Voyager example, some data manipulation This an example of early code sticking around too long. It should be rewritten: Insert this line before accessing the file: $/ = chr(0x1d); # use the MARC end-of-record terminator Then get the data this way: @marcrecords = <marcin>; The above code can be eliminated by these simple changes. The end result is that we have an array of the MARC records from the input file
    100. 100. Voyager example, some data manipulation
    101. 101. Voyager example, some data manipulation Determine the base address for data in this record, and get ready to read the directory
    102. 102. Voyager example, some data manipulation Get each field’s particulars, figure out where its data is, and read the data
    103. 103. Voyager example, some data manipulation We look for field 010, subfield a
    104. 104. Voyager example, some data manipulation If subfield a is found, does its data start with “sj”? If so, we do not want this record.
    105. 105. Voyager example, some data manipulation Looks like this record is a keeper. If this is a 9xx field, i.e., the tag id starts with ‘9’, keep track of these fields in an array until we’ve looked at all the fields.
    106. 106. Voyager example, some data manipulation When done reading the record that’s a keeper, we need to delete the 9xx fields, and output the record.
    107. 107. Voyager example, some data manipulation If the record is not a keeper, put it in the deleted file
    108. 108. Resources Learning Perl Perl in a Nutshell Programming Perl Perl Cookbook I use these two a lot All books are from O’Reilly.
    109. 109. Resources Advanced Perl Programming Perl Best Practices These will start to be useful once you have some Perl experience. All books are from O’Reilly. Perl Hacks Intermediate Perl
    110. 110. Resources Active State Perl http://activestate.com/Products/Download/Download.plex?id=ActivePerl CPAN http://cpan.org a great link to links http://www.thepeoplestoolbox.com/programmers/perl
    111. 111. Resources The files listed below are available at http://homepages.wmich.edu/~zimmer/files/eugm2007 youcandoitPerl.ppt this presentation findmanyemail.pl find patrons with multiple email addresses (available by request) noauthsj.pl delete record if 010 |a starts with “sj”, and strip 9XX fields from remaining records datemath.pl some program code for math with dates snippet.grep various regular expressions I’ve found useful
    112. 112. Thanks for listening. Questions? [email_address] 269.387.3885 Picture © 2005 by Roy Zimmer Thanks for listening. Questions? [email_address] 269.387.3885 Picture © 2006 by Roy Zimmer

    ×