Marcs (bio)perl course


Published on

These are the lecture slides for the BITS training session "Introduction to programming in Bioperl".

See for more material:

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Remember to wear it !
  • Not often needed. Why might you need the braces ? String interpolation:$name = ‘Johnny’;Print “$name1”; # => nothing printedPrint “${name}1”; # => ‘Johnny1’
  • If there are more variables in the list than elements in the array, the extra variables are assigned the udefined value. If there are fewer variables than array elements, the extra elements are ignored.Distributiviteit: my ()
  • If there are more variables in the list than elements in the array, the extra variables are assigned the udefined value. If there are fewer variables than array elements, the extra elements are ignored.
  • Comma is operator: flattens (‘concatenates’) lists/arrays
  • Comma is operator: flattens (‘concatenates’) lists/arrays
  • No parens needed: comma operators produce list
  • main should have been called ‘our’ ;-)Not needed to use the family name when you are with your family. If you call John for dinner, John will know it’s him and you know who will come.But if your family has visitors of another family and they have a John in the family as well ...Family name + given name = fully qualified variable name
  • Marcs (bio)perl course

    1. 1. Marc Logghe<br />
    2. 2. Perl<br />“A script is what you give an actor, but a program is what you give an audience.”<br />
    3. 3. Goals<br />Perl Positioning System: find your way in the Perl World<br />Write Once, Use Many Times<br />Object Oriented Perl<br />Consumer<br />Developer<br />Thou shalt not be afraid of the Bioperl Beast<br />
    4. 4.
    5. 5. Agenda Day1<br />Perl refresher<br />Scalars<br />Arrays and lists<br />Hashes<br />Subroutines and functions<br />Perldoc<br />Creating and running a Perl script<br />References and advanced data structures<br />Packages and modules<br />Objects, (multiple) inheritance, polymorphism<br />
    6. 6. Agenda Day 2<br />What is bioperl ?<br />Taming the Bioperl Beast<br />Finding modules<br />Finding methods<br />Data::Dumper<br />Sequence processing<br />One image says more than 1000 words<br />
    7. 7. Variables<br />Data of any type may be stored within three basic types of variables:<br />Scalar (strings, numbers, references)<br />Array (aka list but not quite the same)<br />Hash (aka associative array)<br />Variable names are always preceded by a “dereferencing symbol” or prefix. If needed: {}<br />$ - Scalar variables<br />@ - List variables<br />% - Associative array aka hash variables<br />
    8. 8. Variables<br />You do NOT have to <br />Declare the variable before using it<br />Define the variable’s data type<br />Allocate memory for new data values<br />
    9. 9. Scalar variables<br />Scalar variable stores a string, a number, a character, a reference, undef<br />$name, ${name}, ${‘name’}<br />More magic: $_<br />
    10. 10. Array variables<br />Array variable stores a list of scalars<br />@name, @{name}, @{‘name’}<br />Index<br />Map: index => scalar value<br />zero-indexed (distance from start)<br />
    11. 11. Array variables<br />List assignment:<br />Individual assignment: $count[2] = 42<br />Individual acces: print $count[2]<br />Special variable $#<array name><br />Scalar context<br />@count = (1, 2, 3, 4, 5);<br />@count = (‘apple’, ‘bat’, ‘cat’);<br />@count2 = @count;<br />
    12. 12. Array variables <br />Access multiple values via array slice:<br />Assign multiple values via array slice:<br />print @array[3,2,4,1,0,-1];<br />@array[3,2,4,1,0,-1] = @new_values;<br />
    13. 13. Lists<br />List = temporary sequence of comma separated values usually in () or result of qw operator<br />Array = container for a list<br />Use:<br />Array initialization<br />Extract values from array<br />my @array = qw/blood sweat tears/;<br />my ($var1, $var2[42], $var3, @var4) = @args;<br />my ($var5, $var6) = @args;<br />
    14. 14. Lists<br />List flattening<br />Remember: each element of list must be a scalar, not another list<br />=> NOT hierarchical list of 3 elements<br /><ul><li>FLATTENED list of scalar values</li></ul>Individual access and slicing cf. arrays<br />my @vehicles = (‘truck’, @cars, (‘tank’,’jeep’));<br />my @vehicles = (‘truck’, @cars, (‘tank’,’jeep’))[2,-1];<br />
    15. 15. Hash variables<br />Hash variables are denoted by the % dereferencing symbol.<br />Hash variables is a list of key-value pairs<br />Both keys and values must be scalar<br />Notice the ‘=>’ aka ‘quotifying comma’<br />my %fruit_color = ("apple", "red", "banana", "yellow");<br />my %fruit_color = (<br /> apple => "red",<br /> banana => "yellow",<br /> );<br />
    16. 16. Hash variables<br />Individual access: $hash{key}<br />Access multiple values via slice:<br />Assign multiple values via slice:<br />@slice = @hash{‘key2’,’key23’,’key4’} <br />@hash{‘key2’,’key23’,’key4’} = @new_values; <br />
    17. 17. Non data types<br />Filehandle<br />There are several predefined filehandles, including STDIN, STDOUT, STDERR and DATA (default opened).<br />No prefix<br />Code value aka subroutine<br />Dereferencing symbol “&”<br />
    18. 18.
    19. 19. Subroutines<br />We can reuse a segment of Perl code by placing it within a subroutine.<br />The subroutine is defined using the sub keyword and a name (= variable name !!).<br />The subroutine body is defined by placing code statements within the {} code block symbols.<br />sub MySubroutine{<br /> #Perl code goes here.<br /> my @args = @_;<br />}<br />
    20. 20. Subroutines<br />To call a subroutine, prepend the name with the & symbol:<br />&MySubroutine; # w/o arguments<br />Or:<br />MySubroutine(); # with or w/o arguments<br />
    21. 21. Subroutines<br />Arguments in underscore array variable (@_)<br />List flattening !!<br />my @results = MySubroutine(@arg1, ‘arg2’, (‘arg3’, ‘arg4’));<br />sub MySubroutine{<br /> #Perl code goes here.<br /> my ($thingy, @args) = @_;<br />}<br />
    22. 22. Subroutines<br />Return value<br />Nothing<br />Scalar value<br />List value<br />Return value<br />Explicit with return function<br />Implicit: value of the last statement<br />sub MySubroutine{<br /> #Perl code goes here.<br /> my ($thingy, @args) = @_;<br />do_something(@args);<br />}<br />
    23. 23. Subroutines<br />Calling contexts<br />Void<br />Scalar<br />List<br />wantarray function<br />Void => undef<br />Scalar => 0<br />List => 1<br />getFiles($dir);<br />my $num = getFiles($dir);<br />my @files = getFiles($dir);<br />
    24. 24. Functions and operators <br /> Built-in routines<br />Function<br />Arguments at right hand side<br />Sensible name (defined, open, print, ...)<br />
    25. 25. Functions<br />Perl provides a rich set of built-in functions to help you perform common tasks.<br />Several categories of useful built-in function include<br />Arithmetic functions (sqrt, sin, … )<br />List functions (push, chomp, … )<br />String functions (length, substr, … )<br />Existance functions (defined, undef)<br />
    26. 26. Array functions<br />Array as queue: push/shift (FIFO)<br />Array as stack: push/pop (LIFO)<br />@row1<br />push<br />shift<br />1<br />2<br />3<br />unshift<br />pop<br />
    27. 27. List functions<br />chomp: remove newline from every element in the list<br />map: kind of loop without escape, every element ($_) is ‘processed’<br />grep: kind of filter<br />sort <br />join<br />
    28. 28. Hash functions<br />keys: returns the hash keys in random order<br />values: returns values of the hash in random order but same order as keys function call<br />each: returns (key, value) pairs<br />delete: remove a particular key (and associated value) from a hash <br />
    29. 29. Operators <br />Operator<br />Complex and subtle (=,<>, <=>, ?:, ->,=>,...)<br />Symbolic name (+,<,>,&,!, ...)<br />
    30. 30. Operators<br />Calling context<br />Eg. assignment operator ‘=‘<br />($item1, $item2) = @array;<br />$item1 = @array;<br />
    31. 31.
    32. 32. perldoc<br />Access to Perl’s documentation system<br />Command line<br />Web:<br />Documentation overview: perldoc perl<br />Function info:<br />perldoc perlfunc<br />perldoc -f <function name><br />Operator info: perldoc perlop<br />Searching the FAQs: perldoc -q <FAQ keyword><br />
    33. 33. perldoc<br />Looking up module info<br />Documentation: perldoc <module><br />Installation path: perldoc -l <module><br />Source: perldoc -m <module><br />All installed modules: perldoc -q installed<br />
    34. 34. perldoc<br />Move around <br />
    35. 35. perldoc<br />Searching <br />
    36. 36. Creating a script<br />Text editor (vi, textpad, notepad++, ...)<br />IDE (Komodo, Eclipse, EMACS, Geany, ...)<br />See: <br />Shebang (not for modules)<br />#!/usr/bin/perl<br />
    37. 37. Executing a script<br />Command line<br />Windows<br />.pl extension<br />*NIX<br />Shebang line<br />chmod +x script<br />./script<br />
    38. 38. Executing a script<br />Geany IDE<br />
    39. 39.
    40. 40. References(and referents)<br />A reference is a special scalar value which “refers to” or “points to” any value.<br />A variable name is one kind of reference that you are already familiar with. It’s a given name.<br />Reference is a kind of private, internal, computer generated name<br />A referent is the value that the reference is pointing to<br />
    41. 41. Creating References <br />Method 1: references to variables are created by using the backslash() operator.<br />$name = ‘bioperl’;<br /> $reference = $name;<br /> $array_reference = @array_name;<br /> $hash_reference = %hash_name;<br /> $subroutine_ref = &sub_name;<br />
    42. 42. Creating References <br />Method 2:<br />[ ITEMS ] makes a new, anonymous array and returns a reference to that array.<br />{ ITEMS } makes a new, anonymous hash, and returns a reference to that hash<br />my $array_ref = [ 1, ‘foo’, undef, 13 ]; <br />my $hash_ref = {one => 1, two => 2};<br />
    43. 43. Dereferencing a Reference <br />Use the appropriate dereferencing symbol<br />Scalar: $<br />Array: @<br />Hash: %<br />Subroutine: &<br />
    44. 44. Dereferencing a Reference <br />Remember $name, ${‘name’} ?<br />Means: give me the scalar value where the variable ‘name’ is pointing to.<br />A reference $reference ìs a name, so $$reference, ${$reference}<br />Means: give me the scalar value where the reference $reference is pointing to<br />
    45. 45. Dereferencing a Reference <br />The arrow operator: -><br />Arrays and hashes<br />Subroutines<br />my $array_ref = [ 1, ‘foo’, undef, 13 ]; <br />my $hash_ref = {one => 1, two => 2};<br />${$array_ref}[1] = ${$hash_ref}{‘two’}<br /># can be written as:<br />$array_ref->[1] = $hash_ref->{two}<br />&{$sub_ref}($arg1,$arg2)<br /># can be written as:<br />$sub_ref->($arg1, $arg2)<br />
    46. 46. Identifying a referent<br />ref function<br />
    47. 47. References<br />Why do we need references ???<br />Create complex data structures<br />!! Arrays and hashes can only store scalar values <br />Pass arrays, hashes, subroutines, ... as arguments to subroutines and functions<br />!! List flattening<br />
    48. 48. Complex data structures <br />Remind:<br />Reference is a scalar value<br />Arrays and hashes are sets of scalar values<br />In one go: <br />my $array_ref = [ 1, 2, 3 ];<br />my $hash_ref = {one => 1, two => 2}; <br />my %data = ( arrayref => $array_ref,<br />hash_ref => $hash_ref);<br />my %data = ( arrayref => [ 1, 2, 3 ],<br />hash_ref => {one => 1, two => 2} <br /> );<br />
    49. 49. Complex data structures <br />Individual access<br />my %data = ( arrayref => [ 1, 2, 3 ],<br />hash_ref => {one => 1, <br /> two => [‘a’,’b’]});<br />How to access this value ?<br />my $wanted_value = $data{hash_ref}->{two}->[1];<br />
    50. 50. Complex data structures<br />my @row1 = (1..3);<br />my @row2 = (2,4,6);<br />my @row3 = (3,6,9);<br />my @rows = (@row1,@row2,@row3);<br />my $table = @rows;<br />@row1<br />$table<br />1<br />2<br />3<br />@rows<br />@row2<br />2<br />4<br />6<br />@row3<br />3<br />6<br />9<br />
    51. 51. Complex data structures<br />my $table = [<br /> [1, 2, 3],<br /> [2, 4, 6],<br /> [3, 6, 9]<br />];<br />$table<br />1<br />2<br />3<br />2<br />4<br />6<br />3<br />6<br />9<br />
    52. 52. Complex data structures<br />Individual access<br />my $wanted_value = $table->[1]->[2];<br /># shorter form:<br />$wanted_value = $table->[1][2] <br />$table<br />1<br />2<br />3<br />2<br />4<br />6<br />3<br />6<br />9<br />
    53. 53. Packages and modules<br />2 types of variables:<br />Global aka package variables<br />Lexical variables<br />
    54. 54. Packages and modules<br />Global / package variables<br />Visible everywhere in every program<br />You get the if you don’t say otherwise<br />!! Autovivification<br />Name has 2 parts: family name + given name<br />Default family name is ‘main’. $John is actually $main::John<br />$Cleese::John has nothing to do with $Wayne::John<br />Family name = package name<br />$var1 = 42;<br />print “$var1, “, ++$var2;<br /># results in:<br />42, 1<br />
    55. 55.
    56. 56. Packages and modules<br />Lexical / private variables<br />Explicitely declared as <br />Only visible within the boundaries of a code block or file.<br />They cease to exist as soon as the program leaves the code block or the program ends<br />The do not have a family name aka they do not belong to a package<br />ALWAYS USE LEXICAL VARIABLES<br />(except for subroutines ...)<br />my $var1 = 42;<br />#!/usr/bin/perl<br />use strict;<br />my $var1 = 42;<br />
    57. 57. Packages<br />Wikipedia:<br />Family where the (global!) variables (incl. subroutines) live (remember $John)<br />In general, a namespace is a container that provides context for the identifiers (variable names) it holds, and allows the disambiguation of homonym identifiers residing in different namespaces.<br />
    58. 58. Packages<br />Family has a:<br />name, defined via package declaration<br />House, block or blocks of code that follow the package declaration <br />package Bio::SeqIO::genbank;<br /># welcome to the Bio::SeqIO::genbank family<br />sub write_seq{}<br />package Bio::SeqIO::fasta;<br /># welcome to the Bio::SeqIO::fasta family<br />sub write_seq{}<br />
    59. 59. Packages<br />Why do we need packages ???<br />To organize code<br />To improve maintainability<br />To avoid name space collisions<br />
    60. 60. Modules<br />What ?<br />A text file(with a .pm suffix) containing Perl source code, that can contain any number of namespaces. It must evaluate to a true value.<br />Loading<br />At compile time: use <module><br />At run time: require <expr><br /><expr> and <module>:compiler translates each double-colon '::' into a path separator and appends '.pm'.<br />E.g. Data::Dumper yields Data/<br />use Data::Dumper;<br />require Data::Dumper;<br />require ‘’;<br />require $class; <br />
    61. 61. Modules<br />A module can contain multiple packages, but convention dictates that each module contains a package of the same name. <br />easy to quickly locate the code in any given package (perldoc –m <module>)<br />not obligatory !!<br />A module name is unique<br />1 to 1 mapping to file system !!<br />Should start with capital letter<br />
    62. 62. Module files<br />Module files are stored in a subdirectory hierarchy that parallels the module name hierarchy.<br />All module files must have an extension of .pm.<br />
    63. 63. Modules<br />Module path is relative. So, where is Perl searching for that module ?<br />Possible modules roots<br />@INC<br />[]$ perldoc –V<br />…<br />@INC:<br /> /etc/perl<br /> /usr/local/lib/perl/5.10.1<br /> /usr/local/share/perl/5.10.1<br /> /usr/lib/perl5<br /> /usr/share/perl5<br /> /usr/lib/perl/5.10<br /> /usr/share/perl/5.10<br /> /usr/local/lib/site_perl<br /> .<br />
    64. 64. Modules<br />Alternative module roots (perldoc -q library)<br />In script<br />Command line <br />Environment<br />use lib ‘/my/alternative/module/path’;<br />[]$ perl -I/my/alternative/module/path<br />export PERL5LIB=$PERL5LIB:/my/alternative/module/path<br />
    65. 65. Modules<br />Test/<br /><br />package My::Package::Says::Hello;<br />sub speak<br />{<br /> print __PACKAGE__, " says: 'Hello'n";<br />}<br />package My::Package::Says::Blah;<br />sub speak<br />{<br /> print __PACKAGE__, " says: 'Blah'n";<br />}<br />1;<br />#!/usr/bin/perl<br />use strict;<br />use Test::Speak;<br />My::Package::Says::Hello>speak;<br />My::Package::Says::Blah->speak;<br />
    66. 66. Modules<br />Why do we need modules???<br />To organize packages into files/folders<br />Code reuse (ne copy & paste !)<br />Module repository: CPAN<br /><br /><br />Pragma<br />Special module that influences the code (compilation)<br />Lowercase<br />Lexically scoped<br />
    67. 67. Modules<br />Module information<br />In standard distribution: perldoc perlmodlib<br />Manually installed: perldoc perllocal<br />All modules: perldoc –q installed<br />Documentation: perldoc <module name><br />Location: perldoc –l <module name><br />Source: perldoc –m <module name><br />
    68. 68. Packages and Modules - Summary<br />A package is a separate namespace within Perl code.<br />A module can have more than one package defined within it.<br />The default package is main.<br />We can get to variables (and subroutines) within packages by using the fully qualified name<br />To write a package, just write package <package name> where you want the package to start.<br />Package declarations last until the end of the enclosing block, file or until the next package statement<br />The require and use keywords can be used to import the contents of other files for use in a program.<br />Files which are included must end with a true value.<br />Perl looks for modules in a list of directories stored in @INC<br />Module names map to the file system<br />
    69. 69.
    70. 70. Exercises<br />Bioperl Training Exercise 1: perldoc<br />Bioperl Training Exercise 2: thou shalt not forget<br />Bioperl Training Exercise 3: arrays<br />Bioperl Training Exercise 4: hashes<br />Bioperl Training Exercise 5: packages and modules 1<br />Bioperl Training Exercise 6: packages and modules 2<br />Bioperl Training Exercise 7: complex data structures<br />
    71. 71.
    72. 72. Object Oriented Programming in Perl<br />Why do we need objects and OOP ?<br />It’s fun<br />Code reuse<br />Abstraction<br />
    73. 73. Object Oriented Programming in Perl<br />What is an object ?<br />An object is a (complex) data structure representing a new, user defined type with a collection of behaviors (functions aka methods)<br />Collection of attributes<br />Developer’s perspective: 3 little make rules<br />To create a class, build a package<br />To create a method, write a subroutine<br />To create an object, bless a referent<br />
    74. 74. Rule 1: To create a class, build a package<br />Defining a class<br />A class is simply a package with subroutines that function as methods. Class name = type = label = namespace<br />package Cat;<br />1;<br />
    75. 75. Rule 2: To create a method, write a subroutine<br />First argument of methods is always class name or object itself (or rather: reference)<br />Subroutine call the OO way (method invocation arrow operator)<br />package Cat;<br />sub meow {<br /> my $self = shift;<br /> print __PACKAGE__ “ says: meow !n”;<br />}<br />1;<br />Cat->meow;<br />$cat->meow;<br />
    76. 76. Rule 3: To create an object, bless a referent <br />‘Special’ method: constructor<br />Any name will do, in most cases new<br />Object can be anything, in most cases hash<br />Reference to object is stored in variable<br />bless<br />Arguments: reference (+ class). Does not change !!<br />Underlying referent is blessed (= typed, labelled)<br />Returns reference<br />package Cat;<br />sub new {<br /> my ($class, @args) = @_;<br /> my $self = { _name => $_args[0] };<br /> bless $self, $class;<br />}<br />
    77. 77. Objects<br />Perl objects are data structures ( a collection of attributes).<br />To create an object we have to take 3 rules into account:<br />Classes are just packages<br />Methods are just subroutines<br />Blessing a referent creates an object<br />
    78. 78. Objects<br />Objects are passed around as references<br />Calling an object method can be done using the method invocation arrow:<br />Constructor functions in Perl are conventionally called new() and can be called by writing: <br />$object_ref->method()<br />$object_ref = ClassName->new()<br />
    79. 79. Inheritance<br />Concept<br />Way to extend functionality of a class by deriving a (more specific) sub-class from it<br />In Perl:<br />Way of specifying where to look for methods<br />store the name of 1 or more classes in the package variable @ISA<br />Multiple inheritance !!<br />package NorthAmericanCat;<br />use Cat;<br />@ISA = qw(Cat);<br />package NorthAmericanCat;<br />use Cat;<br />use Animal;<br />@ISA = qw(Cat Animal);<br />
    80. 80. Inheritance<br />UNIVERSAL, parent of all classes<br />Predifined methods<br />isa(‘<class name>’): check if the object inherits from a particular class<br />can(‘<method name>’): check if <method name> is a callable method <br />
    81. 81. Inheritance<br />SUPER: superclass of the current package<br />start looking in @ISA for a class that can() do_something<br />explicitely call a method of a parental class<br />often used by Bioperl to initialize object attributes<br />$self->SUPER::do_something()<br />
    82. 82. Polymorphism<br />Concept<br />methods defined in the base class will override methods defined in the parent classes<br />same method has different behaviours<br />
    83. 83.
    84. 84. Exercises<br />Bioperl Training Exercise 8: OOP<br />Bioperl Training Exercise 9: inheritance, polymorphism<br />Bioperl Training Exercise 10: aggregation, delegation<br />