• Save
Marc’s (bio)perl course
Upcoming SlideShare
Loading in...5

Marc’s (bio)perl course



BITS/VIB training March 2011

BITS/VIB training March 2011



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Marc’s (bio)perl course Marc’s (bio)perl course Presentation Transcript

  • Marc Logghe
  • Goals• Perl Positioning System: find your way in the Perl World• Write Once, Use Many Times• Object Oriented Perl – Consumer – Developer• Thou shalt not be afraid of the Bioperl Beast
  • Agenda Day1• Perl refresher – Scalars – Arrays and lists – Hashes – Subroutines and functions• Perldoc• Creating and running a Perl script• References and advanced data structures• Packages and modules• Objects, (multiple) inheritance, polymorphism
  • Agenda Day 2• What is bioperl ?• Taming the Bioperl Beast – Finding modules – Finding methods – Data::Dumper• Sequence processing• One image says more than 1000 words
  • Variables• Data of any type may be stored within three basic types of variables: – Scalar (strings, numbers, references) – Array (aka list but not quite the same) – Hash (aka associative array)• Variable names are always preceded by a “dereferencing symbol” or prefix. If needed: {} – $ - Scalar variables – @ - List variables – % - Associative array aka hash variables
  • Variables• You do NOT have to – Declare the variable before using it – Define the variable’s data type – Allocate memory for new data values
  • Scalar variables• Scalar variable stores a string, a number, a character, a reference, undef• $name, ${name}, ${‘name’}• More magic: $_
  • Array variables• Array variable stores a list of scalars• @name, @{name}, @{‘name’}• Index – Map: index => scalar value – zero-indexed (distance from start)
  • Array variables• List assignment: @count = (1, 2, 3, 4, 5); @count = („apple‟, „bat‟, „cat‟); @count2 = @count;• Individual assignment: $count[2] = 42• Individual acces: print $count[2]• Special variable $#<array name>• Scalar context
  • Array variables• Access multiple values via array slice: print @array[3,2,4,1,0,-1];• Assign multiple values via array slice: @array[3,2,4,1,0,-1] = @new_values;
  • Lists• List = temporary sequence of comma separated values usually in () or result of qw operator my @array = qw/blood sweat tears/;• Array = container for a list• Use: – Array initialization – Extract values from array my ($var1, $var2[42], $var3, @var4) = @args; my ($var5, $var6) = @args;
  • Lists• List flattening my @vehicles = („truck‟, @cars, („tank‟,‟jeep‟));• Remember: each element of list must be a scalar, not another list=> NOT hierarchical list of 3 elementsFLATTENED list of scalar values• Individual access and slicing cf. arrays my @vehicles = („truck‟, @cars, („tank‟,‟jeep‟))[2,-1];
  • Hash variables• Hash variables are denoted by the % dereferencing symbol.• Hash variables is a list of key-value pairs• Both keys and values must be scalar my %fruit_color = ("apple", "red", "banana", "yellow"); my %fruit_color = ( apple => "red", banana => "yellow", );• Notice the ‘=>’ aka ‘quotifying comma’
  • Hash variables• Individual access: $hash{key}• Access multiple values via slice: @slice = @hash{„key2‟,‟key23‟,‟key4‟}• Assign multiple values via slice: @hash{„key2‟,‟key23‟,‟key4‟} = @new_values;
  • Non data types• Filehandle – There are several predefined filehandles, including STDIN, STDOUT, STDERR and DATA (default opened). – No prefix• Code value aka subroutine – Dereferencing symbol “&”
  • Subroutines• We can reuse a segment of Perl code by placing it within a subroutine.• The subroutine is defined using the sub keyword and a name (= variable name !!).• The subroutine body is defined by placing code statements within the {} code block symbols. sub MySubroutine{ #Perl code goes here. my @args = @_; }
  • Subroutines• To call a subroutine, prepend the name with the & symbol: &MySubroutine; # w/o arguments Or: MySubroutine(); # with or w/o arguments
  • Subroutines• Arguments in underscore array variable (@_) my @results = MySubroutine(@arg1, „arg2‟, („arg3‟, „arg4‟)); sub MySubroutine{ #Perl code goes here. my ($thingy, @args) = @_; }• List flattening !!
  • Subroutines• Return value – Nothing sub MySubroutine{ – Scalar value #Perl code goes here. my ($thingy, @args) = @_; – List value do_something(@args);• Return value } – Explicit with return function – Implicit: value of the last statement
  • Subroutines• Calling contexts getFiles($dir); – Void my $num = getFiles($dir); – Scalar – List my @files = getFiles($dir);• wantarray function – Void => undef – Scalar => 0 – List => 1
  • Functions and operators• Built-in routines• Function – Arguments at right hand side – Sensible name (defined, open, print, ...)
  • Functions• Perl provides a rich set of built-in functions to help you perform common tasks.• Several categories of useful built-in function include – Arithmetic functions (sqrt, sin, … ) – List functions (push, chomp, … ) – String functions (length, substr, … ) – Existance functions (defined, undef)
  • Array functions• Array as queue: push/shift (FIFO)• Array as stack: push/pop (LIFO) @row1 push shift 1 2 3 unshift pop
  • List functions• chomp: remove newline from every element in the list• map: kind of loop without escape, every element ($_) is ‘processed’• grep: kind of filter• sort• join
  • Hash functions• keys: returns the hash keys in random order• values: returns values of the hash in random order but same order as keys function call• each: returns (key, value) pairs• delete: remove a particular key (and associated value) from a hash
  • Operators• Operator – Complex and subtle (=,<>, <=>, ?:, ->,=>,...) – Symbolic name (+,<,>,&,!, ...)
  • Operators• Calling context Eg. assignment operator ‘=‘ ($item1, $item2) = @array; $item1 = @array;
  • perldoc• Access to Perl’s documentation system – Command line – Web: http://perldoc.perl.org/• Documentation overview: perldoc perl• Function info: – perldoc perlfunc – perldoc -f <function name>• Operator info: perldoc perlop• Searching the FAQs: perldoc -q <FAQ keyword>
  • perldoc• Move around Action Key stroke Page down space Page up b Scroll down/up Down/up arrow Jump to end Shift+G Jump to beginning 1 shift+G Jump to line <x> <x> shift+G
  • perldoc• Searching Action Key stroke Find forward <query> /<query> Find backward <query> ?<query> Next match n Previous match p
  • Creating a script• Text editor (vi, textpad, notepad++, ...)• IDE (Komodo, Eclipse, EMACS, Geany, ...) See: www.perlide.org• Shebang (not for modules) #!/usr/bin/perl
  • Executing a script• Command line – Windows .pl extension – *NIX Shebang line chmod +x script ./script
  • Executing a script• Geany IDE
  • References (and referents)• A reference is a special scalar value which “refers to” or “points to” any value. – A variable name is one kind of reference that you are already familiar with. It’s a given name. – Reference is a kind of private, internal, computer generated name• A referent is the value that the reference is pointing to
  • Creating References• Method 1: references to variables are created by using the backslash() operator. $name = „bioperl‟; $reference = $name; $array_reference = @array_name; $hash_reference = %hash_name; $subroutine_ref = &sub_name;
  • Creating References• Method 2: – [ ITEMS ] makes a new, anonymous array and returns a reference to that array. – { ITEMS } makes a new, anonymous hash, and returns a reference to that hash my $array_ref = [ 1, „foo‟, undef, 13 ]; my $hash_ref = {one => 1, two => 2};
  • Dereferencing a Reference• Use the appropriate dereferencing symbol Scalar: $ Array: @ Hash: % Subroutine: &
  • Dereferencing a Reference• Remember $name, ${‘name’} ? Means: give me the scalar value where the variable ‘name’ is pointing to.• A reference $reference ìs a name, so $$reference, ${$reference} Means: give me the scalar value where the reference $reference is pointing to
  • Dereferencing a Reference• The arrow operator: -> – Arrays and hashes my $array_ref = [ 1, „foo‟, undef, 13 ]; my $hash_ref = {one => 1, two => 2}; ${$array_ref}[1] = ${$hash_ref}{„two‟} # can be written as: $array_ref->[1] = $hash_ref->{two} – Subroutines &{$sub_ref}($arg1,$arg2) # can be written as: $sub_ref->($arg1, $arg2)
  • Identifying a referent• ref function $scalar_ref ref($scalar_ref) Scalar value undef Reference to scalar ‘SCALAR’ Reference to array ‘ARRAY’ Reference to hash ‘HASH’ Reference to subroutine ‘CODE’ Refernce to filehandle ‘IO’ or ‘IO::HANDLE’ Reference to other reference ‘REF’ Reference to blessed referent Package name aka type
  • References• Why do we need references ??? – Create complex data structures !! Arrays and hashes can only store scalar values – Pass arrays, hashes, subroutines, ... as arguments to subroutines and functions !! List flattening
  • Complex data structures• Remind: – Reference is a scalar value – Arrays and hashes are sets of scalar values my $array_ref = [ 1, 2, 3 ]; my $hash_ref = {one => 1, two => 2}; my %data = ( arrayref => $array_ref, hash_ref => $hash_ref); – In one go: my %data = ( arrayref => [ 1, 2, 3 ], hash_ref => {one => 1, two => 2} );
  • Complex data structures• Individual access my %data = ( arrayref => [ 1, 2, 3 ], hash_ref => {one => 1, two => [„a‟,‟b‟]}); How to access this value ? my $wanted_value = $data{hash_ref}->{two}->[1];
  • Complex data structures my @row1 = (1..3); my @row2 = (2,4,6); my @row3 = (3,6,9); my @rows = (@row1,@row2,@row3); my $table = @rows; @row1$table 1 2 3 @rows @row2 2 4 6 @row3 3 6 9
  • Complex data structures my $table = [ [1, 2, 3], [2, 4, 6], [3, 6, 9] ];$table 1 2 3 2 4 6 3 6 9
  • Complex data structures• Individual access my $wanted_value = $table->[1]->[2]; # shorter form: $wanted_value = $table->[1][2] $table 1 2 3 2 4 6 3 6 9
  • Packages and modules• 2 types of variables: – Global aka package variables – Lexical variables
  • Packages and modules• Global / package variables – Visible everywhere in every program – You get the if you don’t say otherwise – !! Autovivification $var1 = 42; print “$var1, “, ++$var2; # results in: 42, 1• Name has 2 parts: family name + given name – Default family name is ‘main’. $John is actually $main::John – $Cleese::John has nothing to do with $Wayne::John – Family name = package name
  • Packages and modules• Lexical / private variables – Explicitely declared as my $var1 = 42; – Only visible within the boundaries of a code block or file. – They cease to exist as soon as the program leaves the code block or the program ends – The do not have a family name aka they do not belong to a package• ALWAYS USE LEXICAL VARIABLES #!/usr/bin/perl (except for subroutines ...) use strict; my $var1 = 42;
  • Packages• Wikipedia: In general, a namespace is a container that provides context for the identifiers (variable names) it holds, and allows the disambiguation of homonym identifiers residing in different namespaces.• Family where the (global!) variables (incl. subroutines) live (remember $John)
  • Packages• Family has a: – name, defined via package declaration package Bio::SeqIO::genbank; # welcome to the Bio::SeqIO::genbank family sub write_seq{} package Bio::SeqIO::fasta; # welcome to the Bio::SeqIO::fasta family sub write_seq{} – House, block or blocks of code that follow the package declaration
  • Packages• Why do we need packages ??? – To organize code – To improve maintainability – To avoid name space collisions
  • Modules• What ? A text file (with a .pm suffix) containing Perl source code, that can contain any number of namespaces. It must evaluate to a true value. use Data::Dumper;• Loading – At compile time: use <module> require Data::Dumper; require ‘my_file.pl’; – At run time: require <expr> require $class; – <expr> and <module>:compiler translates each double-colon :: into a path separator and appends .pm. E.g. Data::Dumper yields Data/Dumper.pm
  • Modules• A module can contain multiple packages, but convention dictates that each module contains a package of the same name. – easy to quickly locate the code in any given package (perldoc –m <module>) – not obligatory !!• A module name is unique – 1 to 1 mapping to file system !! – Should start with capital letter
  • Module files• Module files are stored in a subdirectory hierarchy that parallels the module name hierarchy.• All module files must have an extension of .pm. Module Is stored in Config Config.pm Math::Complex Math/Complex.pm String::Approx String/Approx.pm
  • Modules• Module path is relative. So, where is Perl searching for that module ?• Possible modules roots – @INC []$ perldoc –V … @INC: /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .
  • Modules• Alternative module roots (perldoc -q library) – In script use lib ‘/my/alternative/module/path’; – Command line []$ perl -I/my/alternative/module/path script.pl – Environment export PERL5LIB=$PERL5LIB:/my/alternative/module/path
  • Modules• Test/Speak.pm • Test.plpackage My::Package::Says::Hello; #!/usr/bin/perl use strict;sub speak use Test::Speak;{ print __PACKAGE__, " says: Hellon"; My::Package::Says::Hello::speak();} My::Package::Says::Blah::speak();package My::Package::Says::Blah;sub speak{ print __PACKAGE__, " says: Blahn";}1;
  • Modules• Why do we need modules??? – To organize packages into files/folders – Code reuse (ne copy & paste !)• Module repository: CPAN – http://search.cpan.org – https://metacpan.org/• Pragma – Special module that influences the code (compilation) – Lowercase – Lexically scoped
  • Modules• Module information – In standard distribution: perldoc perlmodlib – Manually installed: perldoc perllocal – All modules: perldoc –q installed – Documentation: perldoc <module name> – Location: perldoc –l <module name> – Source: perldoc –m <module name>
  • Packages and Modules - Summary1. A package is a separate namespace within Perl code.2. A module can have more than one package defined within it.3. The default package is main.4. We can get to variables (and subroutines) within packages by using the fully qualified name5. To write a package, just write package <package name> where you want the package to start.6. Package declarations last until the end of the enclosing block, file or until the next package statement7. The require and use keywords can be used to import the contents of other files for use in a program.8. Files which are included must end with a true value.9. Perl looks for modules in a list of directories stored in @INC10. Module names map to the file system
  • Exercises• Bioperl Training Exercise 1: perldoc• Bioperl Training Exercise 2: thou shalt not forget• Bioperl Training Exercise 3: arrays• Bioperl Training Exercise 4: hashes• Bioperl Training Exercise 5: packages and modules 1• Bioperl Training Exercise 6: packages and modules 2• Bioperl Training Exercise 7: complex data structures
  • Object Oriented Programming in Perl• Why do we need objects and OOP ? – It’s fun – Code reuse – Abstraction
  • Object Oriented Programming in Perl• What is an object ? – An object is a (complex) data structure representing a new, user defined type with a collection of behaviors (functions aka methods) – Collection of attributes• Developer’s perspective: 3 little make rules 1. To create a class, build a package 2. To create a method, write a subroutine 3. To create an object, bless a referent
  • Rule 1: To create a class, build a package• Defining a class – A class is simply a package with subroutines that function as methods. Class name = type = label = namespace package Cat; 1;
  • Rule 2: To create a method, write a subroutine• First argument of methods is always class name or object itself (or rather: reference) package Cat; sub meow { my $self = shift; print __PACKAGE__ “ says: meow !n”; } 1;• Subroutine call the OO way (method invocation arrow operator) Cat->meow; $cat->meow;
  • Rule 3: To create an object, bless a referent• ‘Special’ method: constructor – Any name will do, in most cases new – Object can be anything, in most cases hash – Reference to object is stored in variable – bless • Arguments: reference (+ class). Does not change !! • Underlying referent is blessed (= typed, labelled) • Returns reference package Cat; sub new { my ($class, @args) = @_; my $self = { _name => $_args[0] }; bless $self, $class; }
  • Objects• Perl objects are data structures ( a collection of attributes).• To create an object we have to take 3 rules into account: 1. Classes are just packages 2. Methods are just subroutines 3. Blessing a referent creates an object
  • Objects• Objects are passed around as references• Calling an object method can be done using the method invocation arrow: $object_ref->method()• Constructor functions in Perl are conventionally called new() and can be called by writing: $object_ref = ClassName->new()
  • Inheritance• Concept – Way to extend functionality of a class by deriving a (more specific) sub-class from it• In Perl: – Way of specifying where to look for methods – store the name of 1 or more classes in the package variable @ISA package NorthAmericanCat; use Cat; @ISA = qw(Cat); – Multiple inheritance !! package NorthAmericanCat; use Cat; use Animal; @ISA = qw(Cat Animal);
  • Inheritance• UNIVERSAL, parent of all classes• Predifined methods – isa(‘<class name>’): check if the object inherits from a particular class – can(‘<method name>’): check if <method name> is a callable method
  • Inheritance• SUPER: superclass of the current package $self->SUPER::do_something() – start looking in @ISA for a class that can() do_something – explicitely call a method of a parental class – often used by Bioperl to initialize object attributes
  • Polymorphism• Concept – methods defined in the base class will override methods defined in the parent classes – same method has different behaviours
  • Exercises• Bioperl Training Exercise 8: OOP• Bioperl Training Exercise 9: inheritance, polymorphism• Bioperl Training Exercise 10: aggregation, delegation