Marc’s (bio)perl course

Goals
• Perl Positioning System: find your way in the
Perl World
• Write Once, Use Many Times
• Object Oriented Perl
– Consumer
– Developer
• Thou shalt not be afraid of the Bioperl Beast

Agenda Day1
• Perl refresher
– Scalars
– Arrays and lists
– Hashes
– Subroutines and functions
• Perldoc
• Creating and running a Perl script
• References and advanced data structures
• Packages and modules
• Objects, (multiple) inheritance, polymorphism

Agenda Day 2
• What is bioperl ?
• Taming the Bioperl Beast
– Finding modules
– Finding methods
– Data::Dumper
• Sequence processing
• One image says more than 1000 words

Variables
• Data of any type may be stored within three basic
types of variables:
– Scalar (strings, numbers, references)
– Array (aka list but not quite the same)
– Hash (aka associative array)
• Variable names are always preceded by a
“dereferencing symbol” or prefix. If needed: {}
– $ - Scalar variables
– @ - List variables
– % - Associative array aka hash variables

Variables
• You do NOT have to
– Declare the variable before using it
– Define the variable’s data type
– Allocate memory for new data values

Scalar variables
• Scalar variable stores a string, a number, a
character, a reference, undef
• $name, ${name}, ${‘name’}
• More magic: $_

Array variables
• Array variable stores a list of scalars
• @name, @{name}, @{‘name’}
• Index
– Map: index => scalar value
– zero-indexed (distance from start)

Array variables
• List assignment:
@count = (1, 2, 3, 4, 5);
@count = („apple‟, „bat‟, „cat‟);
@count2 = @count;

• Individual assignment: $count[2] = 42
• Individual acces: print $count[2]
• Special variable $#<array name>
• Scalar context

Array variables
• Access multiple values via array slice:
print @array[3,2,4,1,0,-1];

• Assign multiple values via array slice:
@array[3,2,4,1,0,-1] = @new_values;

Lists
• List = temporary sequence of comma
separated values usually in () or result of qw
operator my @array = qw/blood sweat tears/;

• Array = container for a list
• Use:
– Array initialization
– Extract values from array
my ($var1, $var2[42], $var3, @var4) = @args;
my ($var5, $var6) = @args;

Lists
• List flattening
my @vehicles = („truck‟, @cars, („tank‟,‟jeep‟));

• Remember: each element of list must be a
scalar, not another list
=> NOT hierarchical list of 3 elements
FLATTENED list of scalar values
• Individual access and slicing cf. arrays
my @vehicles = („truck‟, @cars, („tank‟,‟jeep‟))[2,-1];

Hash variables
• Hash variables are denoted by the %
dereferencing symbol.
• Hash variables is a list of key-value pairs
• Both keys and values must be scalar
my %fruit_color = ("apple", "red", "banana", "yellow");
my %fruit_color = (
apple => "red",
banana => "yellow",
);

• Notice the ‘=>’ aka ‘quotifying comma’

Hash variables
• Individual access: $hash{key}
• Access multiple values via slice:
@slice = @hash{„key2‟,‟key23‟,‟key4‟}

• Assign multiple values via slice:
@hash{„key2‟,‟key23‟,‟key4‟} = @new_values;

Non data types
• Filehandle
– There are several predefined filehandles, including
STDIN, STDOUT, STDERR and DATA (default
opened).
– No prefix
• Code value aka subroutine
– Dereferencing symbol “&”

Subroutines
• We can reuse a segment of Perl code by
placing it within a subroutine.
• The subroutine is defined using the sub
keyword and a name (= variable name !!).
• The subroutine body is defined by placing
code statements within the {} code block
symbols. sub MySubroutine{
#Perl code goes here.
my @args = @_;
}

Subroutines
• To call a subroutine, prepend the name with
the & symbol:

&MySubroutine; # w/o arguments
Or:
MySubroutine(); # with or w/o arguments

Subroutines
• Arguments in underscore array variable (@_)
my @results = MySubroutine(@arg1, „arg2‟, („arg3‟,
„arg4‟));

sub MySubroutine{
#Perl code goes here.
my ($thingy, @args) = @_;
}

• List flattening !!

Subroutines
• Return value
– Nothing
sub MySubroutine{
– Scalar value #Perl code goes here.
my ($thingy, @args) = @_;
– List value do_something(@args);
• Return value }

– Explicit with return function
– Implicit: value of the last statement

Subroutines
• Calling contexts getFiles($dir);
– Void
my $num = getFiles($dir);
– Scalar
– List my @files = getFiles($dir);

• wantarray function
– Void => undef
– Scalar => 0
– List => 1

Functions and operators
• Built-in routines
• Function
– Arguments at right hand side
– Sensible name (defined, open, print, ...)

Functions
• Perl provides a rich set of built-in functions to
help you perform common tasks.
• Several categories of useful built-in function
include
– Arithmetic functions (sqrt, sin, … )
– List functions (push, chomp, … )
– String functions (length, substr, … )
– Existance functions (defined, undef)

Array functions
• Array as queue: push/shift (FIFO)
• Array as stack: push/pop (LIFO)

@row1 push
shift
1 2 3
unshift pop

List functions
• chomp: remove newline from every element
in the list
• map: kind of loop without escape, every
element ($_) is ‘processed’
• grep: kind of filter
• sort
• join

Hash functions
• keys: returns the hash keys in random order
• values: returns values of the hash in random
order but same order as keys function call
• each: returns (key, value) pairs
• delete: remove a particular key (and
associated value) from a hash

Operators
• Operator
– Complex and subtle (=,<>, <=>, ?:, ->,=>,...)
– Symbolic name (+,<,>,&,!, ...)

Operators
• Calling context
Eg. assignment operator ‘=‘
($item1, $item2) = @array;
$item1 = @array;

perldoc
• Access to Perl’s documentation system
– Command line
– Web: http://perldoc.perl.org/
• Documentation overview: perldoc perl
• Function info:
– perldoc perlfunc
– perldoc -f <function name>
• Operator info: perldoc perlop
• Searching the FAQs: perldoc -q <FAQ keyword>

perldoc
• Move around

Action Key stroke
Page down space
Page up b
Scroll down/up Down/up arrow
Jump to end Shift+G
Jump to beginning 1 shift+G
Jump to line <x> <x> shift+G

perldoc
• Searching

Action Key stroke
Find forward <query> /<query>
Find backward <query> ?<query>
Next match n
Previous match p

Creating a script
• Text editor (vi, textpad, notepad++, ...)
• IDE (Komodo, Eclipse, EMACS, Geany, ...)
See: www.perlide.org
• Shebang (not for modules)
#!/usr/bin/perl

Executing a script
• Command line
– Windows
.pl extension
– *NIX
Shebang line
chmod +x script
./script

Executing a script
• Geany IDE

References (and referents)
• A reference is a special scalar value which
“refers to” or “points to” any value.
– A variable name is one kind of reference that you
are already familiar with. It’s a given name.
– Reference is a kind of private, internal, computer
generated name
• A referent is the value that the reference is
pointing to

Creating References
• Method 1: references to variables are created
by using the backslash() operator.
$name = „bioperl‟;
$reference = $name;
$array_reference = @array_name;
$hash_reference = %hash_name;
$subroutine_ref = &sub_name;

Creating References
• Method 2:
– [ ITEMS ] makes a new, anonymous array and
returns a reference to that array.
– { ITEMS } makes a new, anonymous hash, and
returns a reference to that hash

my $array_ref = [ 1, „foo‟, undef, 13 ];
my $hash_ref = {one => 1, two => 2};

Dereferencing a Reference
• Use the appropriate dereferencing symbol
Scalar: $
Array: @
Hash: %
Subroutine: &

• Remember $name, ${‘name’} ?
Means: give me the scalar value where the variable
‘name’ is pointing to.
• A reference $reference ìs a name, so
$$reference, ${$reference}
Means: give me the scalar value where the
reference $reference is pointing to

• The arrow operator: ->
– Arrays and hashes
my $array_ref = [ 1, „foo‟, undef, 13 ];

${$array_ref}[1] = ${$hash_ref}{„two‟}
# can be written as:
$array_ref->[1] = $hash_ref->{two}

– Subroutines
&{$sub_ref}($arg1,$arg2)
# can be written as:
$sub_ref->($arg1, $arg2)

Identifying a referent
• ref function

$scalar_ref ref($scalar_ref)
Scalar value undef
Reference to scalar ‘SCALAR’
Reference to array ‘ARRAY’
Reference to hash ‘HASH’
Reference to subroutine ‘CODE’
Refernce to filehandle ‘IO’ or ‘IO::HANDLE’
Reference to other reference ‘REF’
Reference to blessed referent Package name aka type

References
• Why do we need references ???
– Create complex data structures
!! Arrays and hashes can only store scalar values
– Pass arrays, hashes, subroutines, ... as arguments
to subroutines and functions
!! List flattening

Complex data structures
• Remind:
– Reference is a scalar value
– Arrays and hashes are sets of scalar values
my $array_ref = [ 1, 2, 3 ];
my %data = ( arrayref => $array_ref,
hash_ref => $hash_ref);

– In one go:
my %data = ( arrayref => [ 1, 2, 3 ],
hash_ref => {one => 1, two => 2}
);

• Individual access
my %data = ( arrayref => [ 1, 2, 3 ],
hash_ref => {one => 1,
two => [„a‟,‟b‟]});

How to access this value ?

my $wanted_value = $data{hash_ref}->{two}->[1];

my @row1 = (1..3);
my @row2 = (2,4,6);
my @row3 = (3,6,9);
my @rows = (@row1,@row2,@row3);
my $table = @rows;
@row1
$table
1 2 3
@rows @row2
2 4 6

@row3
3 6 9

my $table = [
[1, 2, 3],
[2, 4, 6],
[3, 6, 9]
];

$table
1 2 3

2 4 6

3 6 9

• Individual access my $wanted_value = $table->[1]->[2];
# shorter form:
$wanted_value = $table->[1][2]

$table
1 2 3

2 4 6

3 6 9

Packages and modules
• 2 types of variables:
– Global aka package variables
– Lexical variables

• Global / package variables
– Visible everywhere in every program
– You get the if you don’t say otherwise
– !! Autovivification $var1 = 42;
print “$var1, “, ++$var2;
# results in:
42, 1

• Name has 2 parts: family name + given name
– Default family name is ‘main’. $John is actually
$main::John
– $Cleese::John has nothing to do with $Wayne::John
– Family name = package name

• Lexical / private variables
– Explicitely declared as my $var1 = 42;
– Only visible within the boundaries of a code block
or file.
– They cease to exist as soon as the program leaves
the code block or the program ends
– The do not have a family name aka they do not
belong to a package
• ALWAYS USE LEXICAL VARIABLES #!/usr/bin/perl
(except for subroutines ...) use strict;
my $var1 = 42;

Packages
• Wikipedia:
In general, a namespace is a container that
provides context for the identifiers (variable
names) it holds, and allows the disambiguation of
homonym identifiers residing in different
namespaces.
• Family where the (global!) variables (incl.
subroutines) live (remember $John)

Packages
• Family has a:
– name, defined via package declaration
package Bio::SeqIO::genbank;
# welcome to the Bio::SeqIO::genbank family
sub write_seq{}
package Bio::SeqIO::fasta;
# welcome to the Bio::SeqIO::fasta family
sub write_seq{}

– House, block or blocks of code that follow the
package declaration

Packages
• Why do we need packages ???
– To organize code
– To improve maintainability
– To avoid name space collisions

Modules
• What ?
A text ﬁle (with a .pm suffix) containing Perl source
code, that can contain any number of
namespaces. It must evaluate to a true value.
use Data::Dumper;
• Loading
– At compile time: use <module> require Data::Dumper;
require ‘my_file.pl’;
– At run time: require <expr> require $class;

– <expr> and <module>:compiler translates each
double-colon '::' into a path separator and
appends '.pm'.
E.g. Data::Dumper yields Data/Dumper.pm

Modules
• A module can contain multiple packages, but
convention dictates that each module
contains a package of the same name.
– easy to quickly locate the code in any given
package (perldoc –m <module>)
– not obligatory !!
• A module name is unique
– 1 to 1 mapping to file system !!
– Should start with capital letter

Module files
• Module files are stored in a subdirectory
hierarchy that parallels the module name
hierarchy.
• All module files must have an extension
of .pm.
Module Is stored in
Config Config.pm
Math::Complex Math/Complex.pm
String::Approx String/Approx.pm

Modules
• Module path is relative. So, where is Perl
searching for that module ?
• Possible modules roots
– @INC []$ perldoc –V
…
@INC:
/etc/perl
/usr/local/lib/perl/5.10.1
/usr/local/share/perl/5.10.1
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.10
/usr/share/perl/5.10
/usr/local/lib/site_perl
.

Modules
• Alternative module roots (perldoc -q library)
– In script
use lib ‘/my/alternative/module/path’;

– Command line
[]$ perl -I/my/alternative/module/path script.pl

– Environment
export PERL5LIB=$PERL5LIB:/my/alternative/module/path

Modules
• Test/Speak.pm • Test.pl
package My::Package::Says::Hello; #!/usr/bin/perl
use strict;
sub speak use Test::Speak;
{
print __PACKAGE__, " says: 'Hello'n"; My::Package::Says::Hello::speak();
} My::Package::Says::Blah::speak();

package My::Package::Says::Blah;

sub speak
{
print __PACKAGE__, " says: 'Blah'n";
}

1;

Modules
• Why do we need modules???
– To organize packages into files/folders
– Code reuse (ne copy & paste !)
• Module repository: CPAN
– http://search.cpan.org
– https://metacpan.org/
• Pragma
– Special module that influences the code (compilation)
– Lowercase
– Lexically scoped

Modules
• Module information
– In standard distribution: perldoc perlmodlib
– Manually installed: perldoc perllocal
– All modules: perldoc –q installed
– Documentation: perldoc <module name>
– Location: perldoc –l <module name>
– Source: perldoc –m <module name>

Packages and Modules - Summary
1. A package is a separate namespace within Perl code.
2. A module can have more than one package defined within it.
3. The default package is main.
4. We can get to variables (and subroutines) within packages by
using the fully qualified name
5. To write a package, just write package <package name> where
you want the package to start.
6. Package declarations last until the end of the enclosing block, file
or until the next package statement
7. The require and use keywords can be used to import the contents
of other files for use in a program.
8. Files which are included must end with a true value.
9. Perl looks for modules in a list of directories stored in @INC
10. Module names map to the file system

Exercises
• Bioperl Training Exercise 1: perldoc
• Bioperl Training Exercise 2: thou shalt not forget
• Bioperl Training Exercise 3: arrays
• Bioperl Training Exercise 4: hashes
• Bioperl Training Exercise 5: packages and
modules 1
• Bioperl Training Exercise 6: packages and
modules 2
• Bioperl Training Exercise 7: complex data
structures

Object Oriented Programming in Perl
• Why do we need objects and OOP ?
– It’s fun
– Code reuse
– Abstraction

Object Oriented Programming in Perl
• What is an object ?
– An object is a (complex) data structure
representing a new, user defined type with a
collection of behaviors (functions aka methods)
– Collection of attributes
• Developer’s perspective: 3 little make rules
1. To create a class, build a package
2. To create a method, write a subroutine
3. To create an object, bless a referent

Rule 1: To create a class, build a
package
• Defining a class
– A class is simply a package with subroutines that
function as methods. Class name = type = label =
namespace
package Cat;
1;

Rule 2: To create a method, write a
subroutine
• First argument of methods is always class
name or object itself (or rather: reference)
package Cat;
sub meow {
my $self = shift;
print __PACKAGE__ “ says: meow !n”;
}
1;

• Subroutine call the OO way (method
invocation arrow operator) Cat->meow;
$cat->meow;

Rule 3: To create an object, bless a
referent
• ‘Special’ method: constructor
– Any name will do, in most cases new
– Object can be anything, in most cases hash
– Reference to object is stored in variable
– bless
• Arguments: reference (+ class). Does not change !!
• Underlying referent is blessed (= typed, labelled)
• Returns reference package Cat;
sub new {
my ($class, @args) = @_;
my $self = { _name => $_args[0] };
bless $self, $class;
}

Objects
• Perl objects are data structures ( a collection
of attributes).
• To create an object we have to take 3 rules
into account:
1. Classes are just packages
2. Methods are just subroutines
3. Blessing a referent creates an object

Objects
• Objects are passed around as references
• Calling an object method can be done using
the method invocation arrow: $object_ref->method()
• Constructor functions in Perl are
conventionally called new() and can be called
by writing: $object_ref = ClassName->new()

Inheritance
• Concept
– Way to extend functionality of a class by deriving a
(more specific) sub-class from it
• In Perl:
– Way of specifying where to look for methods
– store the name of 1 or more classes in the
package variable @ISA package NorthAmericanCat;
use Cat;
@ISA = qw(Cat);

– Multiple inheritance !!
package NorthAmericanCat;
use Cat;
use Animal;
@ISA = qw(Cat Animal);

Inheritance
• UNIVERSAL, parent of all classes
• Predifined methods
– isa(‘<class name>’): check if the object inherits
from a particular class
– can(‘<method name>’): check if <method name>
is a callable method

Inheritance
• SUPER: superclass of the current package
$self->SUPER::do_something()

– start looking in @ISA for a class that can()
do_something
– explicitely call a method of a parental class
– often used by Bioperl to initialize object attributes

Polymorphism
• Concept
– methods defined in the base class will override
methods defined in the parent classes
– same method has different behaviours

Exercises
• Bioperl Training Exercise 8: OOP
• Bioperl Training Exercise 9: inheritance,
polymorphism
• Bioperl Training Exercise 10: aggregation,
delegation

Marc’s (bio)perl course

More Related Content

What's hot

Viewers also liked

Similar to Marc’s (bio)perl course

Recently uploaded

Marc’s (bio)perl course