Makefiles Bioinfo

4,828
-1

Published on

make is a basic tool to define pipelines of shell commands.
It is useful if you have many shell scripts and commands, and you want to organize them.
Even if it has been written to automatize the build of compiled language programs, make is also useful in bioinformatics and other fields.

Published in: Technology
1 Comment
9 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,828
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
150
Comments
1
Likes
9
Embeds 0
No embeds

No notes for slide
  • Makefiles Bioinfo

    1. 1. BioEvo technical seminars GNU/Make and bioinformatics G.M. Dall'Olio Barcelona, 06/02/2009
    2. 2. Original problem statement <ul><li>Compiled languages programmers (C, C++, fortran, etc..) have to frequently execute complex shell commands: </li><ul><li>gcc -c -Wall -ansi -I/pkg/chempak/include dat2csv.c
    3. 3. g++ -c main.cpp; g++ -c func.cpp; g++ main.o func.o
    4. 4. rm *.o </li></ul><li>These commands are needed to convert a C++/C source code file to a binary file. </li></ul>
    5. 5. Shell commands in bioinformatics <ul><li>In bioinformatics it is frequent to use command line tools with complex syntax: </li><ul><li>grep, head, gawk, sed, cat.. (tools to work with flat files data)
    6. 6. perl/python/R/other scripts
    7. 7. Many suites of binary programs (emboss, phylip, blast, t-coffee, plink, genepop, gromacs, rosetta...)
    8. 8. etc... </li></ul></ul>
    9. 9. Common problem <ul><li>In short, C programmers and many bioinformaticians have two problems in common: </li><ul><li>Have a way to store command-line instructions with different parameters
    10. 10. Execute these commands only when necessary (don't calculate again some results, if they have already been calculated) </li></ul></ul>
    11. 11. GNU/make <ul><li>make is a tool to store command-line instructions and re-execute them quickly, along with all their parameters
    12. 12. It is a declarative programming language
    13. 13. It belongs to a class of softwares called 'automated build tools' </li></ul>
    14. 14. Simplest Makefile example <ul><li>The simplest Makefile contains just the name of a task and the commands associated with it: </li></ul><ul><li>print_hello is a makefile 'rule': it stores the commands needed to say 'Hello, world!' to the screen. </li></ul>
    15. 15. Simplest Makefile example Makefile rule Target of the rule Commands associated with the rule This is a tabulation (not 8 spaces)
    16. 16. Simplest Makefile example <ul><li>Create a file in your computer and save it as ' Makefile '.
    17. 17. Write these instructions in it: print_hello : echo 'Hello, world!!'
    18. 18. Then, open a terminal and type: </li></ul>This is a tabulation (<Tab> key) <ul>make -f Makefile print_hello </ul>
    19. 19. Simplest Makefile example
    20. 20. Simplest Makefile example – explanation <ul><li>When invoked, the program 'make' looks for a file in the current directory called 'Makefile'
    21. 21. When we type 'make print_hello', it executes any procedure (target) called 'print_hello' in the makefile
    22. 22. It then shows the commands executed and their output </li></ul>
    23. 23. Tip1: the 'Makefile' file <ul><li>The '-f' option allows you to define the file which contains the instructions for make
    24. 24. If you omit this option, make will look for any file called 'Makefile' in the current directory
    25. 25. make -f Makefile all is equivalent to: make all </li></ul>
    26. 26. A sligthly longer example <ul><li>You can add as many commands you like to a rule
    27. 27. For example, this ' print_hello ' rule contains 5 commands
    28. 28. Note: ignore the '@' thing, it is only to disable verbose mode (explained later) </li></ul>
    29. 29. A more complex example
    30. 30. Make - advantages <ul><li>Make allows you to save shell commands along with their parameters and re-execute them;
    31. 31. It allows you to use command-line tools which are more flexible;
    32. 32. Combined with a revision control software, it makes possible to reproduce all the operations made to your data; </li></ul>
    33. 33. Second part A closer look at make syntax (target and commands)
    34. 34. The target syntax <ul><li>Makefile syntax: </li><ul><li><target> : (prerequisites) <commands associated to the rule> </li></ul></ul>
    35. 35. The target syntax <ul><li>The target of a rule can be either a title for the task, or a file name.
    36. 36. Everytime you call a make rule (example: 'make all'), the program looks for a file called like the target name (e.g. 'all', 'clean', 'inputdata.txt', 'results.txt')
    37. 37. The rule is executed only if that file doesn't exists. </li></ul>
    38. 38. Filename as target names <ul><li>In this makefile, we have two rules: 'testfile.txt' and 'clean' </li></ul>
    39. 39. Filename as target names <ul><li>In this makefile, we have two rules: ' testfile.txt ' and ' clean '
    40. 40. When we call ' make testfile.txt ', make checks if a file called 'testfile.txt' already exists. </li></ul>
    41. 41. Filename as target names The commands associated with the rule ' testfile.txt ' are executed only if that file doesn't exists already
    42. 42. Multiple target definition <ul><li>A target can also be a list of files
    43. 43. You can retrieve the matched target with the special variable $@ </li></ul>
    44. 44. Special characters <ul><li>The % character can be used as a wild card
    45. 45. For example, a rule with the target: %.txt : .... would be activated by any file ending with '.txt' </li><ul><li>'make 1.txt', 'make 2.txt', etc.. </li></ul><li>We will be able to retrieve the matched expression with '$*' </li></ul>
    46. 46. Special character % / creating more than a file at a time
    47. 47. Makefile – cluster support <ul><li>Note that in the previous example we created three files at the same time, by executing three times the command 'touch'
    48. 48. If we use the '-j' option when invoking make, the three processess will be launched in parallel </li></ul>
    49. 49. <ul><li>Makefile syntax: </li><ul><li><target> : (prerequisites) <commands associated to the rule> </li></ul></ul>The commands syntax
    50. 50. Inactivating verbose mode <ul><li>You can disactivate the verbose mode for a line by adding '@' at its beginning: </li></ul>Differences here
    51. 51. Skipping errors <ul><li>The modifiers '-' tells make to ignore errors returned by a command
    52. 52. Example: </li><ul><li>'mkdir /var' will cause an error (the '/var' directory already exists) and cause gnu/make to exit
    53. 53. '-mkdir /var' will cause an error anyway, but gnu/make will ignore it </li></ul></ul>
    54. 54. Moving throught directories <ul><li>A big issue with make is that every line is executed as a different shell process.
    55. 55. So, this: lsvar : cd /var ls
    56. 56. Won't work (it will list only the files in the current directory, not /var)
    57. 57. The solution is to put everything in a single process:
    58. 58. lsvar : (cd /var; ls) </li></ul>
    59. 59. Third part Prerequisites and conditional execution
    60. 60. <ul><li>Makefile syntax: </li><ul><li><target> : (prerequisites) <commands associated to the rule> </li></ul><li>We will look at the 'prerequisites' part of a make rule, that I had skipped before </li></ul>The commands syntax
    61. 61. Real Makefile-rule syntax <ul><li>Complete syntax for a Makefile rule: <target> : <list of prerequisites> <commands associated to the rule>
    62. 62. Example: result1.txt : data1.txt data2.txt cat data1.txt data2.txt > result1.txt @echo 'result1.txt' has been calculated'
    63. 63. Prerequisites are files (or rules) that need to exists already in order to create the target file.
    64. 64. If 'data1.txt' and 'data2.txt' don't exist, the rule 'result1.txt' will exit with an error (no rule to create them) </li></ul>
    65. 65. Piping Makefile rules together <ul><li>You can pipe two Makefile rules together by defining prerequisites </li></ul>
    66. 66. Piping Makefile rules together <ul><li>The rule 'result1.txt' depends on the rule 'data1.txt', which should be executed first </li></ul>
    67. 67. Piping Makefile rules together <ul><li>Let's look at this example again:
    68. 68. what happens if we remove the file 'result1.txt' we just created? </li></ul>
    69. 69. Piping Makefile rules together <ul><li>Let's look at this example again:
    70. 70. what happens if we remove the file 'result1.txt' we just created?
    71. 71. The second time we run the 'make result1.txt' command, it is not necessary to create data1.txt again, so only a rule is executed </li></ul>
    72. 72. Other pipe example <ul><li>all : result1.txt result2.txt result1.txt : data1.txt calculate_result.py python calculate_result.txt --input data1.txt result2.txt : data2.txt cut -f 1, 3 data2.txt > result2.txt
    73. 73. Make all will calculate result1.txt and result2.txt, if they don't exist already (and they are older than their prerequisites) </li></ul>
    74. 74. Conditional execution by modification date <ul><li>We have seen how make can be used to create a file, if it doesn't exists. file.txt: # if file.txt doesn't exists, then create it: echo 'contents of file.txt' > file.txt
    75. 75. We can do better: create or update a file only if it is newer than its prerequisites </li></ul>
    76. 76. Conditional execution by modification date <ul><li>Let's have a better look at this example: result1.txt : data1.txt calculate_result.py python calculate_result.txt --input data1.txt
    77. 77. A great feature of make is that it execute a rule not only if the target file doesn't exist, but also if it has a 'last modification date' earlier than all of its prerequisites </li></ul>
    78. 78. Conditional execution by modification date <ul>result1.txt : data1.txt @sed 's/b/B/i' data1.txt > result1.txt @echo 'result1.txt has been calculated' <li>In this example, result1.txt will be recalculated every time 'data1.txt' is modified
    79. 79. $: touch data1.txt calculate_result.py $: make result1.txt result1.txt has been calculated $: make result1.txt result1.txt is already up-to-date $: touch data1.txt $: make result1.txt result1.txt has been calculated </li></ul>
    80. 80. Conditional execution - applications <ul><li>This 'conditional execution by modification date comparison' feature of make is very useful
    81. 81. Let's say you discover an error in one of your input data: you will be able to repeat the analysis by executing only the operations needed
    82. 82. You can also use it to re-calculate results every time you modify a script: result.txt : scripts/calculate_result.py python calculate_result.py > result.py </li></ul>
    83. 83. Another example
    84. 84. Fourth part Variables and functions
    85. 85. Variables and functions <ul><li>You may have already noticed that Make's syntax is really old :)
    86. 86. In fact, it is a ~40 years old language
    87. 87. It uses special variables like $@, $^, and it can be worst than perl!!!
    88. 88. (perl developers – please don't get mad at me :-) ) </li></ul>
    89. 89. Variables <ul><li>Variables are declared with a '=' and by convention are upper case.
    90. 90. They are called by including their name in ' $() ' </li></ul>WORKING_DIR is a variable
    91. 91. Special variables - $@ <ul><li>Make uses some custom variables, with a syntax similar to perl
    92. 92. '$@' always corresponds to the target name: $: cat >Makefile %.txt : echo $@ $: make filename.txt echo filename.txt filename.txt $: </li></ul>$@ took the value of 'filename.txt'
    93. 93. Other special variables $@ The rule's target $< The rule's first prerequisite $? All the rule's out of date prerequisites $^ All Prerequisites
    94. 94. Functions <ul><li>Usually you don't want to declare functions in make, but there are some built-in utilities that can be useful
    95. 95. Most frequently used functions: $(addprefix <prefix>, list) -> add a prefix to a space-separated list example: FILES = file1 file2 file3 $(addprefix /home/user/data, $(FILES)
    96. 96. $(addsuffix) work similarly </li></ul>
    97. 97. Full makefile example INPUTFILES = lower_DAF lower_maf upper_maf lower_daf upper_daf RESULTSDIR = ./results RESULTFILES = $(addprefix $(RESULTSDIR)/, $(addsuffix _filtered.txt,$(INPUTFILES))) help : @echo 'type &quot;make filter&quot; to calculate results' all : $(RESULTFILES) $(RESULTSDIR)/%_filtered.txt : data/%.txt src/filter_genes.py python src/filter_genes.py --genes data/Genes.txt --window $< --output $@ <ul><li>It looks like very complicated, but in the end you always use the same Makefile structure </li></ul>
    98. 98. Fifth part Testing, discussion, other examples and alternatives
    99. 99. Testing a makefile <ul><li>make -n: only shows the commands to be executed
    100. 100. You can pass variables to make: $: make say_hello MYNAME=”Giovanni” hello, Giovanni
    101. 101. Strongly suggested: use a Revision Control Software with support for branching (git, hg, bazaar) and create a branch for testing </li></ul>
    102. 102. Another complex Makefile example <ul><li>our starting point is the file myseq , the end point is the blast results blastout
    103. 103. we first want to mask out any repeats using rmask to create myseq.m
    104. 104. we then blastx myseq.m against a protein db called mydb
    105. 105. before blastx is run the protein db must be indexed using formatdb </li></ul>(slide taken from biomake web site)
    106. 106. The “ make ” command <ul><li>make uses unix file modification timestamps when checking dependencies </li><ul><li>if a subtarget is more recent than the goal target, then re-execute action </li></ul></ul>(slide taken from biomake web site)
    107. 107. BioMake and alternatives <ul><li>BioMake is an alternative to make, thought to be used in bioinformatics
    108. 108. Developed to annotate the Drosophila melanogaster genome (Berkeley university)
    109. 109. Cleaner syntax,derived from prolog
    110. 110. Separates the rule's name from the name of the target files </li></ul>
    111. 111. A BioMake example (slide taken from biomake web site)
    112. 112. Other alternatives <ul><li>There are other many alternatives to make: </li><ul><li>BioMake (prolog?)
    113. 113. o/q/dist/etc.. make
    114. 114. Ant (Java)
    115. 115. Scons (python)
    116. 116. Paver (python)
    117. 117. Waf (python) </li></ul><li>This list is biased because I am a python programmer :)
    118. 118. These tools are more oriented to software development </li></ul>
    119. 119. Conclusions <ul><li>Make is very basic for bioinformatics
    120. 120. It is useful for the simpler tasks: </li><ul><li>Logging the operations made to your data files
    121. 121. Working with clusters
    122. 122. Avoid re-calculations
    123. 123. Apply a pipeline to different datasets </li></ul><li>It is installed in almost any unix system and has a standard syntax (interchangeable, reproducible)
    124. 124. Study it and understand its logic. Use it in the most basic way, without worrying about prerequisites and special variables. Later you can look for easier tools (biomake, rake, taverna, galaxy, your own, etc..) </li></ul>
    125. 125. Suggested readings <ul><li>Software Carpentry for bioinformatics http://swc.scipy.org/lec/build.html
    126. 126. A Makefile is a pipeline http://www.nodalpoint.org/2007/03/18/a_pipeline_is_a_makefile
    127. 127. BioMake and SKAM http://skam.sourceforge.net/
    128. 128. BioWiki Make Manifesto http://biowiki.org/MakefileManifesto
    129. 129. Discussion on the BIP mailing list http://www.mail-archive.com/biology-in-python@lists.idyll.org/msg00013.html
    130. 130. Gnu/Make manual by R.Stallman and R.MacGrath </li><ul><li>http://theory.uwinnipeg.ca/gnu/make/make_toc.html </li></ul></ul>
    131. 131. End of talk!! <ul><li>Are you still alive? :-)
    132. 132. Thanks to: </li><ul><li>The author of 'Software carpentry for bioinformatics'
    133. 133. The people in the bip mailing list, for discussion
    134. 134. The author of bioinformaticszen.org and the people on nodalpoint for priming
    135. 135. All the people that have worked on this topic or who wrote a blog post / free internet document on it </li></ul><li>And thanks to you all!! </li></ul>
    136. 137. Inactivating verbose mode <ul><li>On make, the verbose mode is activated by default
    137. 138. Every time a command is called, make shows the exact line which is being executed </li></ul>This is the statement being executed
    138. 139. Makefile syntax <ul><li>Make is also a real programming language, 30 years old, with a syntax similar to bash.
    139. 140. It is a declarative language. In a make source code file, you define a set of rules, each one corresponding to a task, with this syntax: </li><ul><li><target> : <list of prerequisites> <commands associated to the rule>
    140. 141. Example: results.txt : data1.txt cut -f 1-10 data.txt > results.txt </li></ul></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×