Linux intro 5 extra: makefiles


Published on

Lecture for the "Programming for Evolutionary Biology" workshop in Leipzig 2012 (

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Linux intro 5 extra: makefiles

  1. 1. Programming for Evolutionary Biology March 17th - April 1st 2012 Leipzig, GermanyIntroduction to Unix systemsExtra: writing simple pipelines with make Giovanni Marco DallOlio Universitat Pompeu Fabra Barcelona (Spain)
  2. 2. GNU/make make is a tool to store command­line instructions  and re­execute them quickly, along with all their  parameters It is a declarative programming language It belongs to a class of softwares called automated  build tools
  3. 3. Simplest Makefile example The simplest Makefile contains just the name of a task and  the commands associated with it: print_hello is a makefile rule: it stores the commands needed to say Hello, world! to the screen.
  4. 4. Simplest Makefile example Makefile ruleTarget of therule Commands associated This is a with the rule tabulation (not 8 spaces)
  5. 5. Simplest Makefile example Create a file in your  computer and save it as  Makefile. Write these instructions in it: print_hello: echo Hello, world!! This is a tabulation Then, open a terminal and  (<Tab> key) type: make -f Makefile print_hello
  6. 6. Simplest Makefile example
  7. 7. Simplest Makefile example – explanation When invoked, the program make looks for a file in the  current directory called Makefile When we type make print_hello, it executes any procedure  (target) called print_hello in the makefile It then shows the commands executed and their output
  8. 8. Tip1: the Makefile file The ­f option allows you to define the file which  contains the instructions for make If you omit this option, make will look for any file  called Makefile in the current directory make -f Makefile all is equivalent to: make all
  9. 9. A sligthly longer example You can add as many  commands you like  to a rule For example, this  print_hello rule  contains 5 commands Note: ignore the @  thing, it is only to  disable verbose mode  (explained later)
  10. 10. A more complex example
  11. 11. Make - advantages Make allows you to save shell commands along  with their parameters and re­execute them; It allows you to use command­line tools which are  more flexible; Combined with a revision control software, it  makes possible to reproduce all the operations  made to your data;
  12. 12. Second partA closer look at make syntax (target and commands)
  13. 13. The target syntax Makefile syntax: <target>: (prerequisites) <commands associated to the rule>
  14. 14. The target syntax The target of a rule can be either a title for the task, or a file  name. Everytime you call a make rule (example: make all), the  program looks for a file called like the target name (e.g.  all, clean, inputdata.txt, results.txt) The rule is executed only if that file doesnt exists.
  15. 15. Filename as target names  In this makefile, we have two rules: testfile.txt and clean
  16. 16. Filename as target names  In this makefile, we have two rules: testfile.txt and clean  When we call make testfile.txt, make checks if a file called testfile.txt already exists.
  17. 17. Filename as target names The commands associated with the rule testfile.txt are executed only if that file doesnt exists already
  18. 18. Multiple target definition A target can also be a  list of files You can retrieve the  matched target with  the special variable  $@
  19. 19. Special characters The % character can be used as a wild card For example, a rule with the target: %.txt: .... would be activated by any file ending with .txt  make 1.txt, make 2.txt, etc.. We will be able to retrieve the matched expression  with $*
  20. 20. Special character % /creating more than a file at a time
  21. 21. Makefile – cluster support Note that in the previous  example we created three  files at the same time, by  executing three times the  command touch If we use the ­j option when  invoking make, the three  processess will be launched  in parallel
  22. 22. The commands syntax Makefile syntax: <target>: (prerequisites) <commands associated to the rule>
  23. 23. Inactivating verbose mode  You can disactivate the verbose mode for a line by  adding @ at its beginning:Differences here
  24. 24. Skipping errors The modifiers ­ tells make to ignore errors returned  by a command Example:   mkdir /var will cause an error (the /var directory  already exists) and cause gnu/make to exit  ­mkdir /var will cause an error anyway, but  gnu/make will ignore it
  25. 25. Moving throught directories A big issue with make is that every line is executed as a  different shell process. So, this: lsvar: cd /var ls  Wont work (it will list only the files in the current  directory, not /var) The solution is to put everything in a single process: lsvar: (cd /var; ls)
  26. 26. Third partPrerequisites and conditional execution
  27. 27. The commands syntax Makefile syntax: <target>: (prerequisites) <commands associated to the rule> We will look at the prerequisites part of a make  rule, that I had skipped before
  28. 28. Real Makefile-rule syntax Complete syntax for a Makefile rule: <target>: <list of prerequisites> <commands associated to the rule> Example: result1.txt: data1.txt data2.txt cat data1.txt data2.txt > result1.txt @echo result1.txt has been calculated Prerequisites are files (or rules) that need to exists already in  order to create the target file. If data1.txt and data2.txt dont exist, the rule result1.txt will  exit with an error (no rule to create them)
  29. 29. Piping Makefile rules together You can pipe two Makefile rules together by  defining prerequisites
  30. 30. Piping Makefile rules together The rule result1.txt depends on the rule data1.txt,  which should be executed first
  31. 31. Piping Makefile rules together Lets look at this  example  again: what happens if  we remove the  file result1.txt  we just  created?
  32. 32. Piping Makefile rules together Lets look at this  example  again: what happens if  we remove the  file result1.txt  we just  created? The second time  we run the  make  result1.txt  command, it is  not necessary  to create  data1.txt 
  33. 33. Other pipe example all: result1.txt result2.txt result1.txt: data1.txt python calculate_result.txt --input data1.txt result2.txt: data2.txt cut -f 1, 3 data2.txt > result2.txt Make all will calculate result1.txt and result2.txt, if  they dont exist already (and they are older than  their prerequisites)
  34. 34. Conditional execution by modification date We have seen how make can be used to create a  file, if it doesnt exists. file.txt: # if file.txt doesnt exists, then create it: echo contents of file.txt > file.txt We can do better: create or update a file only if it is  newer than its prerequisites
  35. 35. Conditional execution by modification date Lets have a better look at this example: result1.txt: data1.txt python calculate_result.txt --input data1.txt A great feature of make is that it execute a rule not  only if the target file doesnt exist, but also if it  has a last modification date earlier than all of its  prerequisites
  36. 36. Conditional execution by modification date result1.txt: data1.txt @sed s/b/B/i data1.txt > result1.txt @echo result1.txt has been calculated In this example, result1.txt will be recalculated  every time data1.txt is modified $: touch data1.txt $: make result1.txt result1.txt has been calculated $: make result1.txt result1.txt is already up-to-date $: touch data1.txt $: make result1.txt result1.txt has been calculated
  37. 37. Conditional execution - applications This conditional execution by modification date  comparison feature of make is very useful Lets say you discover an error in one of your input  data: you will be able to repeat the analysis by  executing only the operations needed You can also use it to re­calculate results every time  you modify a script: result.txt: scripts/ python >
  38. 38. Another example
  39. 39. Fourth partVariables and functions
  40. 40. Variables and functions You may have already noticed that Makes syntax is  really old :) In fact, it is a ~40 years old language It uses special variables like $@, $^, and it can be  worst than perl!!!  (perl developers – please dont get mad at me :­) )
  41. 41. Variables  Variables are declared with a = and by convention  are upper case.  They are called by including their name in $()  WORKING_DIRis a variable
  42. 42. Special variables - $@ Make uses some custom variables, with a syntax  similar to perl $@ always corresponds to the target name: $: cat >Makefile %.txt: echo $@ $: make filename.txt $@ took the value of echo filename.txt filename.txt filename.txt
  43. 43. Other special variables$@ The rules target$< The rules first prerequisite$? All the rules out of date prerequisites$^ All Prerequisites
  44. 44. Functions Usually you dont want to declare functions in  make, but there are some built­in utilities that can  be useful  Most frequently used functions:  $(addprefix <prefix>, list) → add a prefix to a space­separated list  example: FILES = file1 file2 file3 $(addprefix /home/user/data, $(FILES) $(addsuffix) work similarly
  45. 45. Full makefile exampleINPUTFILES = lower_DAF lower_maf upper_maf lower_daf upper_dafRESULTSDIR = ./resultsRESULTFILES = $(addprefix $(RESULTSDIR)/, $(addsuffix _filtered.txt,$(INPUTFILES)help: @echo type "make filter" to calculate resultsall: $(RESULTFILES)$(RESULTSDIR)/%_filtered.txt: data/%.txt src/ python src/ --genes data/Genes.txt --window $< --output $@ It looks like very complicated, but in the end you always use the same Makefile structure
  46. 46. Fifth partTesting, discussion, other examples and alternatives
  47. 47. Testing a makefile make ­n: only shows the commands to be executed You can pass variables to make: $: make say_hello MYNAME=”Giovanni” hello, Giovanni Strongly suggested: use a Revision Control  Software with support for branching (git, hg,  bazaar) and create a branch for testing
  48. 48. Another complex Makefile example # make masked sequence  our starting point is the  myseq.m: myseq file myseq, the end point  rmask myseq > myseq.m is the blast results blastout # run blast on masked seq blastout: mydb.psq myseq.m  we first want to mask out  blastx mydb myseq.m > blastout any repeats using rmask to  echo “ran blast!” create myseq.m # index blastable db  we then blastx myseq.m  mydb.psq: mydb against a protein db called  formatdb -p T mydb mydb # rules follow this pattern: target: subtarget1, ..., subtargetN  before blastx is run the  shell command 1 protein db must be  shell command 2... indexed using formatdb(slide taken from biomake web site)
  49. 49. The “make” command % make blastout # run blast on masked seq formatdb -p T mydb blastout: mydb.psq myseq.m rmask myseq.fst > myseq.m blastx mydb myseq.m > blastout blastx mydb myseq.m > blastout echo “ran blast!” % make blastout # index blastable db make: blastout is up to date mydb.psq: mydb % cat newseqs >> mydb formatdb -p T mydb % make blastout formatdb -p T mydb # make masked sequence blastx mydb myseq.m > blastout myseq.m: myseq rmask myseq > myseq.m  make uses unix file  modification timestamps when  checking dependencies  if a subtarget is more recent  than the goal target, then (slide taken from biomake web site) re­execute action
  50. 50. BioMake and alternatives BioMake is an alternative to make, thought to be  used in bioinformatics Developed to annotate the Drosophila  melanogaster genome (Berkeley university) Cleaner syntax,derived from prolog Separates the rules name from the name of the  target files
  51. 51. A BioMake example formatdb(DB) req: DB run: formatdb DB comment: prepares blastdb for blasting (wublast) rmask(Seq) flat: masked_seqs/Seq.masked req: Seq srun: RepeatMasker -lib $(LIB) Seq comment: masks out repeats from input sequence mblastx(Seq,DB) flat: blast_results/Seq.DB.blastx req: formatdb(DB) rmask(Seq) srun: blastx -filter SEG+XNU DB rmask(Seq) comment: this target is for the results of running blastx on a masked input genomic sequence (wublast)(slide taken from biomake web site)
  52. 52. Other alternatives There are other many alternatives to make:  BioMake (prolog?)  o/q/dist/etc.. make  Ant (Java)  Scons (python)  Paver (python)  Waf (python) This list is biased because I am a python programmer :) These tools are more oriented to software development
  53. 53. Conclusions Make is very basic for bioinformatics It is useful for the simpler tasks:  Logging the operations made to your data files  Working with clusters  Avoid re­calculations  Apply a pipeline to different datasets It is installed in almost any unix system and has a standard  syntax (interchangeable, reproducible) Study it and understand its logic. Use it in the most basic way,  without worrying about prerequisites and special variables.  Later you can look for easier tools (biomake, rake, taverna, 
  54. 54. Suggested readings Software Carpentry for bioinformatics A Makefile is a pipeline BioMake and SKAM BioWiki Make Manifesto Discussion on the BIP mailing list http://www.mail­­in­ Gnu/Make manual by R.Stallman and R.MacGrath