0
Programming for Evolutionary Biology         March 17th - April 1st 2012             Leipzig, GermanyIntroduction to Unix ...
GNU/make   make is a tool to store command­line instructions      and re­execute them quickly, along with all their      ...
Simplest Makefile example   The simplest Makefile contains just the name of a task and       the commands associated with...
Simplest Makefile example                                          Makefile ruleTarget of therule                    Comma...
Simplest Makefile example   Create a file in your       computer and save it as       Makefile.   Write these instructio...
Simplest Makefile example
Simplest Makefile example            –       explanation   When invoked, the program make looks for a file in the      cu...
Tip1: the Makefile file   The ­f option allows you to define the file which      contains the instructions for make   If...
A sligthly longer example   You can add as many      commands you like      to a rule   For example, this       print_he...
A more complex example
Make - advantages   Make allows you to save shell commands along      with their parameters and re­execute them;   It al...
Second partA closer look at make syntax (target and               commands)
The target syntax   Makefile syntax:         <target>: (prerequisites)            <commands associated to the          ru...
The target syntax   The target of a rule can be either a title for the task, or a file       name.   Everytime you call ...
Filename as target names                     In this                      makefile, we                      have two rule...
Filename as target names                     In this                      makefile, we                      have two rule...
Filename as target names                  The commands                  associated with the                  rule testfile...
Multiple target definition   A target can also be a       list of files   You can retrieve the       matched target with...
Special characters   The % character can be used as a wild card   For example, a rule with the target:     %.txt:       ...
Special character % /creating more than a file at           a time
Makefile – cluster support   Note that in the previous       example we created three       files at the same time, by   ...
The commands syntax   Makefile syntax:         <target>: (prerequisites)            <commands associated to the          ...
Inactivating verbose mode    You can disactivate the verbose mode for a line by       adding @ at its beginning:Differenc...
Skipping errors   The modifiers ­ tells make to ignore errors returned      by a command   Example:           mkdir /va...
Moving throught directories   A big issue with make is that every line is executed as a       different shell process.  ...
Third partPrerequisites and conditional execution
The commands syntax   Makefile syntax:         <target>: (prerequisites)            <commands associated to the          ...
Real Makefile-rule syntax   Complete syntax for a Makefile rule:          <target>: <list of prerequisites>             <...
Piping Makefile rules             together   You can pipe two Makefile rules together by      defining prerequisites
Piping Makefile rules              together   The rule result1.txt depends on the rule data1.txt,      which should be ex...
Piping Makefile rules                together   Lets look at this       example       again:    what happens if       we ...
Piping Makefile rules                together   Lets look at this       example       again:    what happens if       we ...
Other pipe example   all: result1.txt result2.txt      result1.txt: data1.txt      calculate_result.py        python calc...
Conditional execution by       modification date   We have seen how make can be used to create a      file, if it doesnt ...
Conditional execution by       modification date   Lets have a better look at this example:      result1.txt: data1.txt  ...
Conditional execution by       modification date    result1.txt: data1.txt        @sed s/b/B/i data1.txt > result1.txt    ...
Conditional execution -          applications   This conditional execution by modification date      comparison feature o...
Another example
Fourth partVariables and functions
Variables and functions   You may have already noticed that Makes syntax is      really old :)   In fact, it is a ~40 ye...
Variables      Variables are declared with a = and by convention         are upper case.      They are called by includi...
Special variables - $@   Make uses some custom variables, with a syntax      similar to perl   $@ always corresponds to ...
Other special variables$@           The rules target$<           The rules first             prerequisite$?           All ...
Functions   Usually you dont want to declare functions in      make, but there are some built­in utilities that can      ...
Full makefile exampleINPUTFILES = lower_DAF lower_maf upper_maf                                lower_daf upper_dafRESULTSD...
Fifth partTesting, discussion, other examples and              alternatives
Testing a makefile   make ­n: only shows the commands to be executed   You can pass variables to make:     $: make say_h...
Another complex Makefile               example         # make masked sequence                   our starting point is the...
The “make” command                                     % make blastout   # run blast on masked seq                        ...
BioMake and alternatives   BioMake is an alternative to make, thought to be      used in bioinformatics   Developed to a...
A BioMake example       formatdb(DB)          req: DB          run: formatdb DB          comment: prepares blastdb for bla...
Other alternatives   There are other many alternatives to make:           BioMake (prolog?)           o/q/dist/etc.. ma...
Conclusions   Make is very basic for bioinformatics   It is useful for the simpler tasks:            Logging the operat...
Suggested readings   Software Carpentry for bioinformatics          http://swc.scipy.org/lec/build.html   A Makefile is ...
Upcoming SlideShare
Loading in...5
×

Linux intro 5 extra: makefiles

1,018

Published on

Lecture for the "Programming for Evolutionary Biology" workshop in Leipzig 2012 (http://evop.bioinf.uni-leipzig.de/)

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,018
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
38
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Linux intro 5 extra: makefiles"

  1. 1. Programming for Evolutionary Biology March 17th - April 1st 2012 Leipzig, GermanyIntroduction to Unix systemsExtra: writing simple pipelines with make Giovanni Marco DallOlio Universitat Pompeu Fabra Barcelona (Spain)
  2. 2. GNU/make make is a tool to store command­line instructions  and re­execute them quickly, along with all their  parameters It is a declarative programming language It belongs to a class of softwares called automated  build tools
  3. 3. Simplest Makefile example The simplest Makefile contains just the name of a task and  the commands associated with it: print_hello is a makefile rule: it stores the commands needed to say Hello, world! to the screen.
  4. 4. Simplest Makefile example Makefile ruleTarget of therule Commands associated This is a with the rule tabulation (not 8 spaces)
  5. 5. Simplest Makefile example Create a file in your  computer and save it as  Makefile. Write these instructions in it: print_hello: echo Hello, world!! This is a tabulation Then, open a terminal and  (<Tab> key) type: make -f Makefile print_hello
  6. 6. Simplest Makefile example
  7. 7. Simplest Makefile example – explanation When invoked, the program make looks for a file in the  current directory called Makefile When we type make print_hello, it executes any procedure  (target) called print_hello in the makefile It then shows the commands executed and their output
  8. 8. Tip1: the Makefile file The ­f option allows you to define the file which  contains the instructions for make If you omit this option, make will look for any file  called Makefile in the current directory make -f Makefile all is equivalent to: make all
  9. 9. A sligthly longer example You can add as many  commands you like  to a rule For example, this  print_hello rule  contains 5 commands Note: ignore the @  thing, it is only to  disable verbose mode  (explained later)
  10. 10. A more complex example
  11. 11. Make - advantages Make allows you to save shell commands along  with their parameters and re­execute them; It allows you to use command­line tools which are  more flexible; Combined with a revision control software, it  makes possible to reproduce all the operations  made to your data;
  12. 12. Second partA closer look at make syntax (target and commands)
  13. 13. The target syntax Makefile syntax: <target>: (prerequisites) <commands associated to the rule>
  14. 14. The target syntax The target of a rule can be either a title for the task, or a file  name. Everytime you call a make rule (example: make all), the  program looks for a file called like the target name (e.g.  all, clean, inputdata.txt, results.txt) The rule is executed only if that file doesnt exists.
  15. 15. Filename as target names  In this makefile, we have two rules: testfile.txt and clean
  16. 16. Filename as target names  In this makefile, we have two rules: testfile.txt and clean  When we call make testfile.txt, make checks if a file called testfile.txt already exists.
  17. 17. Filename as target names The commands associated with the rule testfile.txt are executed only if that file doesnt exists already
  18. 18. Multiple target definition A target can also be a  list of files You can retrieve the  matched target with  the special variable  $@
  19. 19. Special characters The % character can be used as a wild card For example, a rule with the target: %.txt: .... would be activated by any file ending with .txt  make 1.txt, make 2.txt, etc.. We will be able to retrieve the matched expression  with $*
  20. 20. Special character % /creating more than a file at a time
  21. 21. Makefile – cluster support Note that in the previous  example we created three  files at the same time, by  executing three times the  command touch If we use the ­j option when  invoking make, the three  processess will be launched  in parallel
  22. 22. The commands syntax Makefile syntax: <target>: (prerequisites) <commands associated to the rule>
  23. 23. Inactivating verbose mode  You can disactivate the verbose mode for a line by  adding @ at its beginning:Differences here
  24. 24. Skipping errors The modifiers ­ tells make to ignore errors returned  by a command Example:   mkdir /var will cause an error (the /var directory  already exists) and cause gnu/make to exit  ­mkdir /var will cause an error anyway, but  gnu/make will ignore it
  25. 25. Moving throught directories A big issue with make is that every line is executed as a  different shell process. So, this: lsvar: cd /var ls  Wont work (it will list only the files in the current  directory, not /var) The solution is to put everything in a single process: lsvar: (cd /var; ls)
  26. 26. Third partPrerequisites and conditional execution
  27. 27. The commands syntax Makefile syntax: <target>: (prerequisites) <commands associated to the rule> We will look at the prerequisites part of a make  rule, that I had skipped before
  28. 28. Real Makefile-rule syntax Complete syntax for a Makefile rule: <target>: <list of prerequisites> <commands associated to the rule> Example: result1.txt: data1.txt data2.txt cat data1.txt data2.txt > result1.txt @echo result1.txt has been calculated Prerequisites are files (or rules) that need to exists already in  order to create the target file. If data1.txt and data2.txt dont exist, the rule result1.txt will  exit with an error (no rule to create them)
  29. 29. Piping Makefile rules together You can pipe two Makefile rules together by  defining prerequisites
  30. 30. Piping Makefile rules together The rule result1.txt depends on the rule data1.txt,  which should be executed first
  31. 31. Piping Makefile rules together Lets look at this  example  again: what happens if  we remove the  file result1.txt  we just  created?
  32. 32. Piping Makefile rules together Lets look at this  example  again: what happens if  we remove the  file result1.txt  we just  created? The second time  we run the  make  result1.txt  command, it is  not necessary  to create  data1.txt 
  33. 33. Other pipe example all: result1.txt result2.txt result1.txt: data1.txt calculate_result.py python calculate_result.txt --input data1.txt result2.txt: data2.txt cut -f 1, 3 data2.txt > result2.txt Make all will calculate result1.txt and result2.txt, if  they dont exist already (and they are older than  their prerequisites)
  34. 34. Conditional execution by modification date We have seen how make can be used to create a  file, if it doesnt exists. file.txt: # if file.txt doesnt exists, then create it: echo contents of file.txt > file.txt We can do better: create or update a file only if it is  newer than its prerequisites
  35. 35. Conditional execution by modification date Lets have a better look at this example: result1.txt: data1.txt calculate_result.py python calculate_result.txt --input data1.txt A great feature of make is that it execute a rule not  only if the target file doesnt exist, but also if it  has a last modification date earlier than all of its  prerequisites
  36. 36. Conditional execution by modification date result1.txt: data1.txt @sed s/b/B/i data1.txt > result1.txt @echo result1.txt has been calculated In this example, result1.txt will be recalculated  every time data1.txt is modified $: touch data1.txt calculate_result.py $: make result1.txt result1.txt has been calculated $: make result1.txt result1.txt is already up-to-date $: touch data1.txt $: make result1.txt result1.txt has been calculated
  37. 37. Conditional execution - applications This conditional execution by modification date  comparison feature of make is very useful Lets say you discover an error in one of your input  data: you will be able to repeat the analysis by  executing only the operations needed You can also use it to re­calculate results every time  you modify a script: result.txt: scripts/calculate_result.py python calculate_result.py > result.py
  38. 38. Another example
  39. 39. Fourth partVariables and functions
  40. 40. Variables and functions You may have already noticed that Makes syntax is  really old :) In fact, it is a ~40 years old language It uses special variables like $@, $^, and it can be  worst than perl!!!  (perl developers – please dont get mad at me :­) )
  41. 41. Variables  Variables are declared with a = and by convention  are upper case.  They are called by including their name in $()  WORKING_DIRis a variable
  42. 42. Special variables - $@ Make uses some custom variables, with a syntax  similar to perl $@ always corresponds to the target name: $: cat >Makefile %.txt: echo $@ $: make filename.txt $@ took the value of echo filename.txt filename.txt filename.txt
  43. 43. Other special variables$@ The rules target$< The rules first prerequisite$? All the rules out of date prerequisites$^ All Prerequisites
  44. 44. Functions Usually you dont want to declare functions in  make, but there are some built­in utilities that can  be useful  Most frequently used functions:  $(addprefix <prefix>, list) → add a prefix to a space­separated list  example: FILES = file1 file2 file3 $(addprefix /home/user/data, $(FILES) $(addsuffix) work similarly
  45. 45. Full makefile exampleINPUTFILES = lower_DAF lower_maf upper_maf lower_daf upper_dafRESULTSDIR = ./resultsRESULTFILES = $(addprefix $(RESULTSDIR)/, $(addsuffix _filtered.txt,$(INPUTFILES)help: @echo type "make filter" to calculate resultsall: $(RESULTFILES)$(RESULTSDIR)/%_filtered.txt: data/%.txt src/filter_genes.py python src/filter_genes.py --genes data/Genes.txt --window $< --output $@ It looks like very complicated, but in the end you always use the same Makefile structure
  46. 46. Fifth partTesting, discussion, other examples and alternatives
  47. 47. Testing a makefile make ­n: only shows the commands to be executed You can pass variables to make: $: make say_hello MYNAME=”Giovanni” hello, Giovanni Strongly suggested: use a Revision Control  Software with support for branching (git, hg,  bazaar) and create a branch for testing
  48. 48. Another complex Makefile example # make masked sequence  our starting point is the  myseq.m: myseq file myseq, the end point  rmask myseq > myseq.m is the blast results blastout # run blast on masked seq blastout: mydb.psq myseq.m  we first want to mask out  blastx mydb myseq.m > blastout any repeats using rmask to  echo “ran blast!” create myseq.m # index blastable db  we then blastx myseq.m  mydb.psq: mydb against a protein db called  formatdb -p T mydb mydb # rules follow this pattern: target: subtarget1, ..., subtargetN  before blastx is run the  shell command 1 protein db must be  shell command 2... indexed using formatdb(slide taken from biomake web site)
  49. 49. The “make” command % make blastout # run blast on masked seq formatdb -p T mydb blastout: mydb.psq myseq.m rmask myseq.fst > myseq.m blastx mydb myseq.m > blastout blastx mydb myseq.m > blastout echo “ran blast!” % make blastout # index blastable db make: blastout is up to date mydb.psq: mydb % cat newseqs >> mydb formatdb -p T mydb % make blastout formatdb -p T mydb # make masked sequence blastx mydb myseq.m > blastout myseq.m: myseq rmask myseq > myseq.m  make uses unix file  modification timestamps when  checking dependencies  if a subtarget is more recent  than the goal target, then (slide taken from biomake web site) re­execute action
  50. 50. BioMake and alternatives BioMake is an alternative to make, thought to be  used in bioinformatics Developed to annotate the Drosophila  melanogaster genome (Berkeley university) Cleaner syntax,derived from prolog Separates the rules name from the name of the  target files
  51. 51. A BioMake example formatdb(DB) req: DB run: formatdb DB comment: prepares blastdb for blasting (wublast) rmask(Seq) flat: masked_seqs/Seq.masked req: Seq srun: RepeatMasker -lib $(LIB) Seq comment: masks out repeats from input sequence mblastx(Seq,DB) flat: blast_results/Seq.DB.blastx req: formatdb(DB) rmask(Seq) srun: blastx -filter SEG+XNU DB rmask(Seq) comment: this target is for the results of running blastx on a masked input genomic sequence (wublast)(slide taken from biomake web site)
  52. 52. Other alternatives There are other many alternatives to make:  BioMake (prolog?)  o/q/dist/etc.. make  Ant (Java)  Scons (python)  Paver (python)  Waf (python) This list is biased because I am a python programmer :) These tools are more oriented to software development
  53. 53. Conclusions Make is very basic for bioinformatics It is useful for the simpler tasks:  Logging the operations made to your data files  Working with clusters  Avoid re­calculations  Apply a pipeline to different datasets It is installed in almost any unix system and has a standard  syntax (interchangeable, reproducible) Study it and understand its logic. Use it in the most basic way,  without worrying about prerequisites and special variables.  Later you can look for easier tools (biomake, rake, taverna, 
  54. 54. Suggested readings Software Carpentry for bioinformatics  http://swc.scipy.org/lec/build.html A Makefile is a pipeline http://www.nodalpoint.org/2007/03/18/a_pipeline_is_a_makefil BioMake and SKAM  http://skam.sourceforge.net/ BioWiki Make Manifesto  http://biowiki.org/MakefileManifesto Discussion on the BIP mailing list http://www.mail­archive.com/biology­in­python@lists.idyll.org Gnu/Make manual by R.Stallman and R.MacGrath http://theory.uwinnipeg.ca/gnu/make/make_toc.html 
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×