Linux intro 5 extra: makefiles
Upcoming SlideShare
Loading in...5
×
 

Linux intro 5 extra: makefiles

on

  • 1,088 views

Lecture for the "Programming for Evolutionary Biology" workshop in Leipzig 2012 (http://evop.bioinf.uni-leipzig.de/)

Lecture for the "Programming for Evolutionary Biology" workshop in Leipzig 2012 (http://evop.bioinf.uni-leipzig.de/)

Statistics

Views

Total Views
1,088
Views on SlideShare
1,065
Embed Views
23

Actions

Likes
2
Downloads
32
Comments
0

1 Embed 23

http://bioinfoblog.it 23

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Linux intro 5 extra: makefiles Linux intro 5 extra: makefiles Presentation Transcript

  • Programming for Evolutionary Biology March 17th - April 1st 2012 Leipzig, GermanyIntroduction to Unix systemsExtra: writing simple pipelines with make Giovanni Marco DallOlio Universitat Pompeu Fabra Barcelona (Spain)
  • GNU/make make is a tool to store command­line instructions  and re­execute them quickly, along with all their  parameters It is a declarative programming language It belongs to a class of softwares called automated  build tools
  • Simplest Makefile example The simplest Makefile contains just the name of a task and  the commands associated with it: print_hello is a makefile rule: it stores the commands needed to say Hello, world! to the screen.
  • Simplest Makefile example Makefile ruleTarget of therule Commands associated This is a with the rule tabulation (not 8 spaces)
  • Simplest Makefile example Create a file in your  computer and save it as  Makefile. Write these instructions in it: print_hello: echo Hello, world!! This is a tabulation Then, open a terminal and  (<Tab> key) type: make -f Makefile print_hello
  • Simplest Makefile example
  • Simplest Makefile example – explanation When invoked, the program make looks for a file in the  current directory called Makefile When we type make print_hello, it executes any procedure  (target) called print_hello in the makefile It then shows the commands executed and their output
  • Tip1: the Makefile file The ­f option allows you to define the file which  contains the instructions for make If you omit this option, make will look for any file  called Makefile in the current directory make -f Makefile all is equivalent to: make all
  • A sligthly longer example You can add as many  commands you like  to a rule For example, this  print_hello rule  contains 5 commands Note: ignore the @  thing, it is only to  disable verbose mode  (explained later)
  • A more complex example
  • Make - advantages Make allows you to save shell commands along  with their parameters and re­execute them; It allows you to use command­line tools which are  more flexible; Combined with a revision control software, it  makes possible to reproduce all the operations  made to your data;
  • Second partA closer look at make syntax (target and commands)
  • The target syntax Makefile syntax: <target>: (prerequisites) <commands associated to the rule>
  • The target syntax The target of a rule can be either a title for the task, or a file  name. Everytime you call a make rule (example: make all), the  program looks for a file called like the target name (e.g.  all, clean, inputdata.txt, results.txt) The rule is executed only if that file doesnt exists.
  • Filename as target names  In this makefile, we have two rules: testfile.txt and clean
  • Filename as target names  In this makefile, we have two rules: testfile.txt and clean  When we call make testfile.txt, make checks if a file called testfile.txt already exists.
  • Filename as target names The commands associated with the rule testfile.txt are executed only if that file doesnt exists already
  • Multiple target definition A target can also be a  list of files You can retrieve the  matched target with  the special variable  $@
  • Special characters The % character can be used as a wild card For example, a rule with the target: %.txt: .... would be activated by any file ending with .txt  make 1.txt, make 2.txt, etc.. We will be able to retrieve the matched expression  with $*
  • Special character % /creating more than a file at a time
  • Makefile – cluster support Note that in the previous  example we created three  files at the same time, by  executing three times the  command touch If we use the ­j option when  invoking make, the three  processess will be launched  in parallel
  • The commands syntax Makefile syntax: <target>: (prerequisites) <commands associated to the rule>
  • Inactivating verbose mode  You can disactivate the verbose mode for a line by  adding @ at its beginning:Differences here
  • Skipping errors The modifiers ­ tells make to ignore errors returned  by a command Example:   mkdir /var will cause an error (the /var directory  already exists) and cause gnu/make to exit  ­mkdir /var will cause an error anyway, but  gnu/make will ignore it
  • Moving throught directories A big issue with make is that every line is executed as a  different shell process. So, this: lsvar: cd /var ls  Wont work (it will list only the files in the current  directory, not /var) The solution is to put everything in a single process: lsvar: (cd /var; ls)
  • Third partPrerequisites and conditional execution
  • The commands syntax Makefile syntax: <target>: (prerequisites) <commands associated to the rule> We will look at the prerequisites part of a make  rule, that I had skipped before
  • Real Makefile-rule syntax Complete syntax for a Makefile rule: <target>: <list of prerequisites> <commands associated to the rule> Example: result1.txt: data1.txt data2.txt cat data1.txt data2.txt > result1.txt @echo result1.txt has been calculated Prerequisites are files (or rules) that need to exists already in  order to create the target file. If data1.txt and data2.txt dont exist, the rule result1.txt will  exit with an error (no rule to create them)
  • Piping Makefile rules together You can pipe two Makefile rules together by  defining prerequisites
  • Piping Makefile rules together The rule result1.txt depends on the rule data1.txt,  which should be executed first
  • Piping Makefile rules together Lets look at this  example  again: what happens if  we remove the  file result1.txt  we just  created?
  • Piping Makefile rules together Lets look at this  example  again: what happens if  we remove the  file result1.txt  we just  created? The second time  we run the  make  result1.txt  command, it is  not necessary  to create  data1.txt 
  • Other pipe example all: result1.txt result2.txt result1.txt: data1.txt calculate_result.py python calculate_result.txt --input data1.txt result2.txt: data2.txt cut -f 1, 3 data2.txt > result2.txt Make all will calculate result1.txt and result2.txt, if  they dont exist already (and they are older than  their prerequisites)
  • Conditional execution by modification date We have seen how make can be used to create a  file, if it doesnt exists. file.txt: # if file.txt doesnt exists, then create it: echo contents of file.txt > file.txt We can do better: create or update a file only if it is  newer than its prerequisites
  • Conditional execution by modification date Lets have a better look at this example: result1.txt: data1.txt calculate_result.py python calculate_result.txt --input data1.txt A great feature of make is that it execute a rule not  only if the target file doesnt exist, but also if it  has a last modification date earlier than all of its  prerequisites
  • Conditional execution by modification date result1.txt: data1.txt @sed s/b/B/i data1.txt > result1.txt @echo result1.txt has been calculated In this example, result1.txt will be recalculated  every time data1.txt is modified $: touch data1.txt calculate_result.py $: make result1.txt result1.txt has been calculated $: make result1.txt result1.txt is already up-to-date $: touch data1.txt $: make result1.txt result1.txt has been calculated
  • Conditional execution - applications This conditional execution by modification date  comparison feature of make is very useful Lets say you discover an error in one of your input  data: you will be able to repeat the analysis by  executing only the operations needed You can also use it to re­calculate results every time  you modify a script: result.txt: scripts/calculate_result.py python calculate_result.py > result.py
  • Another example
  • Fourth partVariables and functions
  • Variables and functions You may have already noticed that Makes syntax is  really old :) In fact, it is a ~40 years old language It uses special variables like $@, $^, and it can be  worst than perl!!!  (perl developers – please dont get mad at me :­) )
  • Variables  Variables are declared with a = and by convention  are upper case.  They are called by including their name in $()  WORKING_DIRis a variable
  • Special variables - $@ Make uses some custom variables, with a syntax  similar to perl $@ always corresponds to the target name: $: cat >Makefile %.txt: echo $@ $: make filename.txt $@ took the value of echo filename.txt filename.txt filename.txt
  • Other special variables$@ The rules target$< The rules first prerequisite$? All the rules out of date prerequisites$^ All Prerequisites
  • Functions Usually you dont want to declare functions in  make, but there are some built­in utilities that can  be useful  Most frequently used functions:  $(addprefix <prefix>, list) → add a prefix to a space­separated list  example: FILES = file1 file2 file3 $(addprefix /home/user/data, $(FILES) $(addsuffix) work similarly
  • Full makefile exampleINPUTFILES = lower_DAF lower_maf upper_maf lower_daf upper_dafRESULTSDIR = ./resultsRESULTFILES = $(addprefix $(RESULTSDIR)/, $(addsuffix _filtered.txt,$(INPUTFILES)help: @echo type "make filter" to calculate resultsall: $(RESULTFILES)$(RESULTSDIR)/%_filtered.txt: data/%.txt src/filter_genes.py python src/filter_genes.py --genes data/Genes.txt --window $< --output $@ It looks like very complicated, but in the end you always use the same Makefile structure
  • Fifth partTesting, discussion, other examples and alternatives
  • Testing a makefile make ­n: only shows the commands to be executed You can pass variables to make: $: make say_hello MYNAME=”Giovanni” hello, Giovanni Strongly suggested: use a Revision Control  Software with support for branching (git, hg,  bazaar) and create a branch for testing
  • Another complex Makefile example # make masked sequence  our starting point is the  myseq.m: myseq file myseq, the end point  rmask myseq > myseq.m is the blast results blastout # run blast on masked seq blastout: mydb.psq myseq.m  we first want to mask out  blastx mydb myseq.m > blastout any repeats using rmask to  echo “ran blast!” create myseq.m # index blastable db  we then blastx myseq.m  mydb.psq: mydb against a protein db called  formatdb -p T mydb mydb # rules follow this pattern: target: subtarget1, ..., subtargetN  before blastx is run the  shell command 1 protein db must be  shell command 2... indexed using formatdb(slide taken from biomake web site)
  • The “make” command % make blastout # run blast on masked seq formatdb -p T mydb blastout: mydb.psq myseq.m rmask myseq.fst > myseq.m blastx mydb myseq.m > blastout blastx mydb myseq.m > blastout echo “ran blast!” % make blastout # index blastable db make: blastout is up to date mydb.psq: mydb % cat newseqs >> mydb formatdb -p T mydb % make blastout formatdb -p T mydb # make masked sequence blastx mydb myseq.m > blastout myseq.m: myseq rmask myseq > myseq.m  make uses unix file  modification timestamps when  checking dependencies  if a subtarget is more recent  than the goal target, then (slide taken from biomake web site) re­execute action
  • BioMake and alternatives BioMake is an alternative to make, thought to be  used in bioinformatics Developed to annotate the Drosophila  melanogaster genome (Berkeley university) Cleaner syntax,derived from prolog Separates the rules name from the name of the  target files
  • A BioMake example formatdb(DB) req: DB run: formatdb DB comment: prepares blastdb for blasting (wublast) rmask(Seq) flat: masked_seqs/Seq.masked req: Seq srun: RepeatMasker -lib $(LIB) Seq comment: masks out repeats from input sequence mblastx(Seq,DB) flat: blast_results/Seq.DB.blastx req: formatdb(DB) rmask(Seq) srun: blastx -filter SEG+XNU DB rmask(Seq) comment: this target is for the results of running blastx on a masked input genomic sequence (wublast)(slide taken from biomake web site)
  • Other alternatives There are other many alternatives to make:  BioMake (prolog?)  o/q/dist/etc.. make  Ant (Java)  Scons (python)  Paver (python)  Waf (python) This list is biased because I am a python programmer :) These tools are more oriented to software development
  • Conclusions Make is very basic for bioinformatics It is useful for the simpler tasks:  Logging the operations made to your data files  Working with clusters  Avoid re­calculations  Apply a pipeline to different datasets It is installed in almost any unix system and has a standard  syntax (interchangeable, reproducible) Study it and understand its logic. Use it in the most basic way,  without worrying about prerequisites and special variables.  Later you can look for easier tools (biomake, rake, taverna, 
  • Suggested readings Software Carpentry for bioinformatics  http://swc.scipy.org/lec/build.html A Makefile is a pipeline http://www.nodalpoint.org/2007/03/18/a_pipeline_is_a_makefil BioMake and SKAM  http://skam.sourceforge.net/ BioWiki Make Manifesto  http://biowiki.org/MakefileManifesto Discussion on the BIP mailing list http://www.mail­archive.com/biology­in­python@lists.idyll.org Gnu/Make manual by R.Stallman and R.MacGrath http://theory.uwinnipeg.ca/gnu/make/make_toc.html