Data Types in Perl




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Agenda
 •    Perl Basics
 •    Hello World
 •    Scalars
 •    Arrays
 •    Hashes




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Task Today




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Parsing

 Parse a file
 Sort its words alphabetically
 Sort its words by number of occurences
 




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Perl Basics

                                            
                                            



Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
PERL

 Practical Extraction 
 and Reporting Language
 
 ü  Handle text files 
 ü  Web (CGI)
 ü  Small scripts

 http://www.perltutorial.org/

Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Install
                                              
 Windows 
 http://www.activestate.com/activeperl/
 Cygwin (linux emulation)
 
 Linux / OS-X 
 Native




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Hello World!

                                            
                                            



Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
First script
 Open an editor (e.g. gedit)
 
 #!/usr/bin/perl -w
 use strict;
 use warnings;
 print "Hello World!n";
 
 Save as -> first.pl
 

Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
How to run a script
 Terminal -> move to the script folder
 
 perl first.pl 
 
 or
 chmod a+x first.pl <- now it is executable by                              
   
   
    
     everyone
 ./first.pl <- ./ means ‘in this folder’
 
 
      

Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Variable Overview




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Overview
   Data types
   Casting
   Variable Scope
   (De)referencing




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Data Types



     (1,1,2,3,5,8)
                                        Mamma:3381245671
    5
                      Hello world!


                                  ACCGACGACGCAGC
            J
                                                               1.6e-4
                               6.28


Bioinformatics master course, ‘11/’12     
   
   
   
Paolo Marcatili
Overview
     Scalars       
    
    
 arrays       
    
   
   hashes
     5
                         (1,1,2,3,5,8)
            Mamma:3381245671

   Hello world!

     J

     ACCGACGACGCAGC

     6.28

     1.6e-4




Bioinformatics master course, ‘11/’12   
    
   
   
Paolo Marcatili
Scalars




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Scalars
       my $scalar;
      $scalar=5;
      $scalar=$scalar+3;
      $scalar= “scalar vale $scalarn”;
      print $scalar;
      
      
      > scalar vale 8




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Scalars - 2 
      ü  Scalar data can be number or string. 
      ü  In Perl, string and number can be used "
          nearly interchangeable."
      
      ü  Scalar variable is used to hold scalar data.
      ü  Scalar variable starts with dollar sign ($) "
          followed by Perl identifier.
      ü  Perl identifier can contain "
        alphanumeric and underscores.
      ü  It is not allowed to start with a digit.



Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Examples
           #floating-point values
           my $x = 3.14;
           my $y = -2.78;
       
           #integer values
           my $a = 1000;
           my $b = -2000;
       
           my $s = "2000"; # similar to $s = 2000;
                                                 
       
           #strings
           my $str = "this is a string in Perl".
           my $str2 = 'this is also as string too'.
        
Bioinformatics master course, ‘11/’12
        
                               
   
   
   
Paolo Marcatili
Operations
   my $x = 5 + 9; # Add 5 and 9, and then store the result in $x
   $x = 30 - 4; # Subtract 4 from 30 and then store the result in $x
   $x = 3 * 7; # Multiply 3 and 7 and then store the result in $x
   $x = 6 / 2; # Divide 6 by 2
   $x = 2 ** 8; # two to the power of 8
   $x = 3 % 2; # Remainder of 3 divided by 2
   $x++; # Increase $x by 1
   $x--; # Decrease $x by 1

   my $y = $x; # Assign $x to $y
   $x += $y; # Add $y to $x
   $x -= $y; # Subtract $y from $x              
   $x .= $y; # Append $y onto $x


Bioinformatics master course, ‘11/’12   
   
       
   
Paolo Marcatili
Operations - 2
   my $x = 3;
   my $c = "he ";
   my $s = $c x $x; # $c repeated $x times
   my $b = "bye"; 
   print $s . "n"; #print s and start a new line
   # similar to
   print "$sn";
   my $a = $s . $b; # Concatenate $s and $b
   print $a;
   # Interpolation
   my $x = 10;
   my $s = "you get $x";
   print $s;
   
   

Bioinformatics master course, ‘11/’12   
   
    
   
Paolo Marcatili
Type Casting
    (or	
  data	
  conversion,	
  or	
  coercion)	
  is	
  usually	
  silent	
  in	
  perl	
  


   my $x = “3”;
   print $x + 4 .”n”;
   
   Be careful!!
   
   my $x = "3";
   my $y = 1;
   my $z = "uno";
   print $x + $y."n";
   print $x + $z."n";
   print $x + 4 . 1 ."n";
   print $x + 4.1 ."n";
   
Bioinformatics master course, ‘11/’12                      
       
       
      
Paolo Marcatili
Arrays




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
boxed scalars
    
    




                  Scalar
                                                                Array



 Indices	
  are	
  sequen6al	
  integers	
  star6ng	
  from	
  0	
  	
  

Bioinformatics master course, ‘11/’12                    
       
         
   
Paolo Marcatili
array - 1

             
        ("Perl","array","tutorial");
        (5,7,9,10);
        (5,7,9,"Perl","list");
        (1..20);
        ();




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
array - 2
            
        my @str_array=("Perl","array","tutorial");
        my @num_array=(5,7,9,10);
        my @mixed_array=(5,7,9,"Perl","list");
        my @rg_array=(1..20);
        my @empty_array=();
        
        
        print $str_array[1]; # 1st element is [0]




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
operations
        my @int =(1,3,5,2);
        push(@int,10); #add 10 to @int
        print "@intn";
             
        my $last = pop(@int); #remove 10 from @int
        print "@intn";
             
        unshift(@int,0); #add 0 to @int
        print "@intn";
        my $start = shift(@int); # add 0 to @int
        print "@intn";
        
        

Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
on array
        my @int =(1,3,5,2);
        
        foreach my $element (@int){
        print “element is $elementn”;
        }
        
        my @sorted=sort(@int);
        foreach my $element (@sorted){
        print “element is $elementn”;
        }
        


Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Hashes

                                            
                                            



Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Hashes
    •  Hashes are like array, they store collections of scalars"
       ... but unlike arrays, indexing is by name (just like in
       real life!!!)"
       
    •  Two components to each hash entry:
          –  Key 
      
     example : name
          –  Value      
    
example : phone number

    •  Hashes denoted with %
          –  Example : %phoneDirectory

    •  Elements are accessed using {} (like [] in arrays)


Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Hashes continued ...
    •  Adding a new key-value pair
               $phoneDirectory{“Shirly”} = 7267975
         
         –  Note the $ to specify “scalar” context!
    •  Each key can have only one value
               $phoneDirectory{“Shirly”} = 7265797
              # overwrites previous assignment


    •  Multiple keys can have the same value

    •  Accessing the value of a key
           $phoneNumber =$phoneDirectory{“Shirly”};

Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Hashes and Foreach
  •  Foreach works in hashes as well!

        foreach $person (keys (%phoneDirectory) )
          
 {
          
print “$person: $phoneDirectory{$person}”;
          
}
        
  •  Never depend on the order you put key/values
     in the hash! Perl has its own magic to make
     hashes amazingly fast!!


Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Hashes and Sorting
   •  The sort function works with hashes as well 
   •  Sorting on the keys
        foreach $person (sort keys %phoneDirectory) {
               print “$person : $directory{$person}n”;
        }
        –  This will print the phoneDirectory hash table in
           alphabetical order based on the name of the person,
           i.e. the key.




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Hash and Sorting cont...
     •  Sorting by value

          foreach $person (sort {$phoneDirectory{$a} <=>
             $phoneDirectory{$b}} keys %phoneDirectory)
            

{
            

 
print “$person :    
 
 
 
 
 
 
                          
            

 
$phoneDirectory{$person}n”;
            

}
          
          –  Prints the person and their phone number in the
             order of their respective phone numbers, i.e. the
             value.



Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Exercise
 •  Chose your own test or use wget "
     

 •  Identify the 10 most frequent words

 •  Sort the words alphabetically"
    
 •  Sort the words by the number of
    occurrences 

Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Counting Words
     my %seen;
      my $l=“Lorem ipsum”;
      my @w=split (“ “, $l);# questa è una funzione nuova…
      foreach my $word (@w){
         
 
$seen{$word}++;
      }
      print “Sorted by occurrencesn”;
      foreach my $word (sort {$seen{$a}<=>$seen{$b}} keys %seen){
         
print “Word $word N: $seen{$word}n”;
      }
      
      print “Sorted alphabeticallyn”;
      foreach my $word (sort ( keys %seen)){ 

      print “Word $word N: $seen{$word}n”;
      }




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Homeworks
 Download the “Divina commedia”
 (wget
 http://www.gutenberg.org/cache/epub/1000/pg1000.txt )
 
 For each word length, count the number of occurences (e.g.
 123456 words of length 2, etc.)
 
 Length of a string : length($a)
 
 


Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili
Modalità	
  di	
  esame:	
  
   Difficoltà:	
  febbraio	
  <	
  giugno	
  <	
  seBembre	
  
   Per	
  fare	
  l’esame	
  è	
  NECESSARIO	
  	
  
   avermi	
  mandato	
  tuM	
  i	
  compi6	
  	
  
   e	
  una	
  esercitazione	
  




Bioinformatics master course, ‘11/’12   
   
   
   
Paolo Marcatili

Master datatypes 2011

  • 1.
    Data Types inPerl Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 2.
    Agenda •  Perl Basics •  Hello World •  Scalars •  Arrays •  Hashes Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 3.
    Task Today Bioinformatics mastercourse, ‘11/’12 Paolo Marcatili
  • 4.
    Parsing Parse afile Sort its words alphabetically Sort its words by number of occurences Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 5.
    Perl Basics Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 6.
    PERL Practical Extraction and Reporting Language ü  Handle text files ü  Web (CGI) ü  Small scripts http://www.perltutorial.org/ Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 7.
    Install Windows http://www.activestate.com/activeperl/ Cygwin (linux emulation) Linux / OS-X Native Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 8.
    Hello World! Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 9.
    First script Openan editor (e.g. gedit) #!/usr/bin/perl -w use strict; use warnings; print "Hello World!n"; Save as -> first.pl Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 10.
    How to runa script Terminal -> move to the script folder perl first.pl or chmod a+x first.pl <- now it is executable by everyone ./first.pl <- ./ means ‘in this folder’ Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 11.
    Variable Overview Bioinformatics mastercourse, ‘11/’12 Paolo Marcatili
  • 12.
    Overview Data types Casting Variable Scope (De)referencing Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 13.
    Data Types (1,1,2,3,5,8) Mamma:3381245671 5 Hello world! ACCGACGACGCAGC J 1.6e-4 6.28 Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 14.
    Overview Scalars arrays hashes 5 (1,1,2,3,5,8) Mamma:3381245671 Hello world! J ACCGACGACGCAGC 6.28 1.6e-4 Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 15.
    Scalars Bioinformatics master course,‘11/’12 Paolo Marcatili
  • 16.
    Scalars my $scalar; $scalar=5; $scalar=$scalar+3; $scalar= “scalar vale $scalarn”; print $scalar; > scalar vale 8 Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 17.
    Scalars - 2 ü  Scalar data can be number or string. ü  In Perl, string and number can be used " nearly interchangeable." ü  Scalar variable is used to hold scalar data. ü  Scalar variable starts with dollar sign ($) " followed by Perl identifier. ü  Perl identifier can contain " alphanumeric and underscores. ü  It is not allowed to start with a digit. Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 18.
    Examples #floating-point values my $x = 3.14; my $y = -2.78; #integer values my $a = 1000; my $b = -2000; my $s = "2000"; # similar to $s = 2000; #strings my $str = "this is a string in Perl". my $str2 = 'this is also as string too'. Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 19.
    Operations my $x = 5 + 9; # Add 5 and 9, and then store the result in $x $x = 30 - 4; # Subtract 4 from 30 and then store the result in $x $x = 3 * 7; # Multiply 3 and 7 and then store the result in $x $x = 6 / 2; # Divide 6 by 2 $x = 2 ** 8; # two to the power of 8 $x = 3 % 2; # Remainder of 3 divided by 2 $x++; # Increase $x by 1 $x--; # Decrease $x by 1 my $y = $x; # Assign $x to $y $x += $y; # Add $y to $x $x -= $y; # Subtract $y from $x $x .= $y; # Append $y onto $x Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 20.
    Operations - 2 my $x = 3; my $c = "he "; my $s = $c x $x; # $c repeated $x times my $b = "bye"; print $s . "n"; #print s and start a new line # similar to print "$sn"; my $a = $s . $b; # Concatenate $s and $b print $a; # Interpolation my $x = 10; my $s = "you get $x"; print $s; Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 21.
    Type Casting (or  data  conversion,  or  coercion)  is  usually  silent  in  perl   my $x = “3”; print $x + 4 .”n”; Be careful!! my $x = "3"; my $y = 1; my $z = "uno"; print $x + $y."n"; print $x + $z."n"; print $x + 4 . 1 ."n"; print $x + 4.1 ."n"; Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 22.
    Arrays Bioinformatics master course,‘11/’12 Paolo Marcatili
  • 23.
    boxed scalars Scalar Array Indices  are  sequen6al  integers  star6ng  from  0     Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 24.
    array - 1 ("Perl","array","tutorial"); (5,7,9,10); (5,7,9,"Perl","list"); (1..20); (); Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 25.
    array - 2 my @str_array=("Perl","array","tutorial"); my @num_array=(5,7,9,10); my @mixed_array=(5,7,9,"Perl","list"); my @rg_array=(1..20); my @empty_array=(); print $str_array[1]; # 1st element is [0] Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 26.
    operations my @int =(1,3,5,2); push(@int,10); #add 10 to @int print "@intn"; my $last = pop(@int); #remove 10 from @int print "@intn"; unshift(@int,0); #add 0 to @int print "@intn"; my $start = shift(@int); # add 0 to @int print "@intn"; Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 27.
    on array my @int =(1,3,5,2); foreach my $element (@int){ print “element is $elementn”; } my @sorted=sort(@int); foreach my $element (@sorted){ print “element is $elementn”; } Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 28.
    Hashes Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 29.
    Hashes •  Hashes are like array, they store collections of scalars" ... but unlike arrays, indexing is by name (just like in real life!!!)" •  Two components to each hash entry: –  Key example : name –  Value example : phone number •  Hashes denoted with % –  Example : %phoneDirectory •  Elements are accessed using {} (like [] in arrays) Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 30.
    Hashes continued ... •  Adding a new key-value pair $phoneDirectory{“Shirly”} = 7267975 –  Note the $ to specify “scalar” context! •  Each key can have only one value $phoneDirectory{“Shirly”} = 7265797 # overwrites previous assignment •  Multiple keys can have the same value •  Accessing the value of a key $phoneNumber =$phoneDirectory{“Shirly”}; Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 31.
    Hashes and Foreach •  Foreach works in hashes as well! foreach $person (keys (%phoneDirectory) ) { print “$person: $phoneDirectory{$person}”; } •  Never depend on the order you put key/values in the hash! Perl has its own magic to make hashes amazingly fast!! Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 32.
    Hashes and Sorting •  The sort function works with hashes as well •  Sorting on the keys foreach $person (sort keys %phoneDirectory) { print “$person : $directory{$person}n”; } –  This will print the phoneDirectory hash table in alphabetical order based on the name of the person, i.e. the key. Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 33.
    Hash and Sortingcont... •  Sorting by value foreach $person (sort {$phoneDirectory{$a} <=> $phoneDirectory{$b}} keys %phoneDirectory) { print “$person : $phoneDirectory{$person}n”; } –  Prints the person and their phone number in the order of their respective phone numbers, i.e. the value. Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 34.
    Exercise •  Choseyour own test or use wget " •  Identify the 10 most frequent words •  Sort the words alphabetically" •  Sort the words by the number of occurrences Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 35.
    Counting Words my %seen; my $l=“Lorem ipsum”; my @w=split (“ “, $l);# questa è una funzione nuova… foreach my $word (@w){ $seen{$word}++; } print “Sorted by occurrencesn”; foreach my $word (sort {$seen{$a}<=>$seen{$b}} keys %seen){ print “Word $word N: $seen{$word}n”; } print “Sorted alphabeticallyn”; foreach my $word (sort ( keys %seen)){ print “Word $word N: $seen{$word}n”; } Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 36.
    Homeworks Download the“Divina commedia” (wget http://www.gutenberg.org/cache/epub/1000/pg1000.txt ) For each word length, count the number of occurences (e.g. 123456 words of length 2, etc.) Length of a string : length($a) Bioinformatics master course, ‘11/’12 Paolo Marcatili
  • 37.
    Modalità  di  esame:   Difficoltà:  febbraio  <  giugno  <  seBembre   Per  fare  l’esame  è  NECESSARIO     avermi  mandato  tuM  i  compi6     e  una  esercitazione   Bioinformatics master course, ‘11/’12 Paolo Marcatili