SlideShare a Scribd company logo
BioEvo technical seminars GNU/Make and bioinformatics G.M. Dall'Olio Barcelona, 06/02/2009
Original problem statement Compiled languages programmers (C, C++, fortran, etc..) have to frequently execute complex shell commands:  gcc -c -Wall -ansi -I/pkg/chempak/include dat2csv.c
g++ -c main.cpp; g++ -c func.cpp; g++ main.o func.o
rm *.o These commands are needed to convert a C++/C source code file to a binary file.
Shell commands in bioinformatics In bioinformatics it is frequent to use command line tools with complex syntax: grep, head, gawk, sed, cat.. (tools to work with flat files data)
perl/python/R/other scripts
Many suites of binary programs (emboss, phylip, blast, t-coffee, plink, genepop, gromacs, rosetta...)
etc...
Common problem In short, C programmers and many bioinformaticians have two problems in common: Have a way to store command-line instructions with different parameters
Execute these commands only when necessary (don't calculate again some results, if they have already been calculated)
GNU/make make is a tool to store command-line instructions and re-execute them quickly, along with all their parameters
It is a declarative programming language
It belongs to a class of softwares called 'automated build tools'
Simplest Makefile example The simplest Makefile contains just the name of a task and the commands associated with it: print_hello  is a makefile 'rule': it stores the commands needed to say 'Hello, world!' to the screen.
Simplest Makefile example Makefile rule Target of the rule Commands associated with the rule This is a tabulation (not 8 spaces)
Simplest Makefile example Create a file in your computer and save it as ' Makefile '.
Write these instructions in it: print_hello : echo 'Hello, world!!'
Then, open a terminal and type: This is a tabulation (<Tab> key) make -f  Makefile  print_hello
Simplest Makefile example
Simplest Makefile example –  explanation When invoked, the program 'make' looks for a file in the current directory called 'Makefile'
When we type 'make print_hello', it executes any procedure (target) called 'print_hello' in the makefile
It then shows the commands executed and their output
Tip1: the 'Makefile' file The '-f' option allows you to define the file which contains the instructions for make
If you omit this option, make will look for any file called 'Makefile' in the current directory
make -f Makefile all is equivalent to: make all
A sligthly longer example You can add as many commands you like to a rule
For example, this ' print_hello ' rule contains 5 commands
Note: ignore the '@' thing, it is only to disable verbose mode (explained later)
A more complex example
Make - advantages Make allows you to save shell commands along with their parameters and re-execute them;
It allows you to use command-line tools which are more flexible;
Combined with a revision control software, it makes possible to reproduce all the operations made to your data;
Second part A closer look at make syntax (target and commands)
The target syntax Makefile syntax: <target> : (prerequisites) <commands associated to the rule>
The target syntax The target of a rule can be either a title for the task, or a file name.
Everytime you call a make rule (example: 'make all'), the program looks for a file called like the target name (e.g. 'all', 'clean', 'inputdata.txt', 'results.txt')
The rule is executed only if that file doesn't exists.
Filename as target names In this makefile, we have two rules: 'testfile.txt' and 'clean'
Filename as target names In this makefile, we have two rules: ' testfile.txt ' and ' clean '
When we call ' make testfile.txt ', make checks if a file called 'testfile.txt' already exists.
Filename as target names The commands associated with the rule ' testfile.txt ' are executed only if that file doesn't exists already
Multiple target definition A target can also be a list of files
You can retrieve the matched target with the special variable $@
Special characters The % character can be used as a wild card
For example, a rule with the target: %.txt : .... would be activated by any file ending with '.txt' 'make 1.txt', 'make 2.txt', etc.. We will be able to retrieve the matched expression with '$*'
Special character % / creating more than a file at a time
Makefile – cluster support Note that in the previous example we created three files at the same time, by executing three times the command 'touch'
If we use the '-j' option when invoking make, the three processess will be launched in parallel
Makefile syntax: <target> : (prerequisites) <commands associated to the rule> The commands syntax
Inactivating verbose mode You can disactivate the verbose mode for a line by adding '@' at its beginning: Differences here
Skipping errors The modifiers '-' tells make to ignore errors returned by a command
Example:  'mkdir /var' will cause an error (the '/var' directory already exists) and cause gnu/make to exit
'-mkdir /var' will cause an error anyway, but gnu/make will ignore it
Moving throught directories A big issue with make is that every line is executed as a different shell process.
So, this: lsvar : cd /var ls
Won't work (it will list only the files in the current directory, not /var)
The solution is to put everything in a single process:
lsvar : (cd /var; ls)
Third part Prerequisites and conditional execution
Makefile syntax: <target> : (prerequisites) <commands associated to the rule> We will look at the 'prerequisites' part of a make rule, that I had skipped before The commands syntax
Real Makefile-rule syntax Complete syntax for a Makefile rule: <target> : <list of prerequisites> <commands associated to the rule>

More Related Content

What's hot

C++: Constructor, Copy Constructor and Assignment operator
C++: Constructor, Copy Constructor and Assignment operatorC++: Constructor, Copy Constructor and Assignment operator
C++: Constructor, Copy Constructor and Assignment operator
Jussi Pohjolainen
 
Concept of c data types
Concept of c data typesConcept of c data types
Concept of c data types
Manisha Keim
 
Storage classes in c++
Storage classes in c++Storage classes in c++
Storage classes in c++
Jaspal Singh
 
C sharp
C sharpC sharp
C sharp
Satish Verma
 
Functions in c language
Functions in c language Functions in c language
Functions in c language
tanmaymodi4
 
בדרך לפולימורפיזם - העמסת ועקיפת פונקציות
בדרך לפולימורפיזם - העמסת ועקיפת פונקציותבדרך לפולימורפיזם - העמסת ועקיפת פונקציות
בדרך לפולימורפיזם - העמסת ועקיפת פונקציות
מורן אלקובי
 
Generic programming and concepts that should be in C++
Generic programming and concepts that should be in C++Generic programming and concepts that should be in C++
Generic programming and concepts that should be in C++
Anton Kolotaev
 
Implementation of c string functions
Implementation of c string functionsImplementation of c string functions
Implementation of c string functions
mohamed sikander
 
Functions in c
Functions in cFunctions in c
Functions in c
sunila tharagaturi
 
Android Networking
Android NetworkingAndroid Networking
Android Networking
Maksym Davydov
 
Typedef
TypedefTypedef
Typedef
vaseemkhn
 
JAVA OOP
JAVA OOPJAVA OOP
JAVA OOP
Sunil OS
 
String in java
String in javaString in java
Class, object and inheritance in python
Class, object and inheritance in pythonClass, object and inheritance in python
Class, object and inheritance in python
Santosh Verma
 
Constructors and destructors
Constructors and destructorsConstructors and destructors
Constructors and destructors
Nilesh Dalvi
 
Smart pointers
Smart pointersSmart pointers
Smart pointers
Vishal Mahajan
 
Java awt (abstract window toolkit)
Java awt (abstract window toolkit)Java awt (abstract window toolkit)
Java awt (abstract window toolkit)
Elizabeth alexander
 
OOP java
OOP javaOOP java
OOP java
xball977
 
Php functions
Php functionsPhp functions
Php functions
JIGAR MAKHIJA
 
Dynamic Polymorphism in C++
Dynamic Polymorphism in C++Dynamic Polymorphism in C++
Dynamic Polymorphism in C++
Dharmisha Sharma
 

What's hot (20)

C++: Constructor, Copy Constructor and Assignment operator
C++: Constructor, Copy Constructor and Assignment operatorC++: Constructor, Copy Constructor and Assignment operator
C++: Constructor, Copy Constructor and Assignment operator
 
Concept of c data types
Concept of c data typesConcept of c data types
Concept of c data types
 
Storage classes in c++
Storage classes in c++Storage classes in c++
Storage classes in c++
 
C sharp
C sharpC sharp
C sharp
 
Functions in c language
Functions in c language Functions in c language
Functions in c language
 
בדרך לפולימורפיזם - העמסת ועקיפת פונקציות
בדרך לפולימורפיזם - העמסת ועקיפת פונקציותבדרך לפולימורפיזם - העמסת ועקיפת פונקציות
בדרך לפולימורפיזם - העמסת ועקיפת פונקציות
 
Generic programming and concepts that should be in C++
Generic programming and concepts that should be in C++Generic programming and concepts that should be in C++
Generic programming and concepts that should be in C++
 
Implementation of c string functions
Implementation of c string functionsImplementation of c string functions
Implementation of c string functions
 
Functions in c
Functions in cFunctions in c
Functions in c
 
Android Networking
Android NetworkingAndroid Networking
Android Networking
 
Typedef
TypedefTypedef
Typedef
 
JAVA OOP
JAVA OOPJAVA OOP
JAVA OOP
 
String in java
String in javaString in java
String in java
 
Class, object and inheritance in python
Class, object and inheritance in pythonClass, object and inheritance in python
Class, object and inheritance in python
 
Constructors and destructors
Constructors and destructorsConstructors and destructors
Constructors and destructors
 
Smart pointers
Smart pointersSmart pointers
Smart pointers
 
Java awt (abstract window toolkit)
Java awt (abstract window toolkit)Java awt (abstract window toolkit)
Java awt (abstract window toolkit)
 
OOP java
OOP javaOOP java
OOP java
 
Php functions
Php functionsPhp functions
Php functions
 
Dynamic Polymorphism in C++
Dynamic Polymorphism in C++Dynamic Polymorphism in C++
Dynamic Polymorphism in C++
 

Viewers also liked

Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
Yasset Perez-Riverol
 
makefiles tutorial
makefiles tutorialmakefiles tutorial
makefiles tutorial
vsubhashini
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
Juan Antonio Vizcaino
 
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusablePRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
Yasset Perez-Riverol
 
Introduction To Makefile
Introduction To MakefileIntroduction To Makefile
Introduction To Makefile
Waqqas Jabbar
 
PBS Web (Spanish)
PBS Web (Spanish)PBS Web (Spanish)
PBS Web (Spanish)
Yasset Perez-Riverol
 
Makefiles Intro
Makefiles IntroMakefiles Intro
Makefiles Intro
Ynon Perek
 
Introduction to Makefile
Introduction to MakefileIntroduction to Makefile
Introduction to Makefile
Zakaria El ktaoui
 
Makefile
MakefileMakefile
Makefile
Ionela
 
Introduction to Makefile
Introduction to MakefileIntroduction to Makefile
Introduction to Makefile
Tusharadri Sarkar
 
Introduction to GNU Make Programming Language
Introduction to GNU Make Programming LanguageIntroduction to GNU Make Programming Language
Introduction to GNU Make Programming Language
Shih-Hsiang Lin
 
Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software Development
Neil Swainston
 
OpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleOpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scale
Yasset Perez-Riverol
 
Chapter 4 Thermochemistry
Chapter 4 ThermochemistryChapter 4 Thermochemistry
Chapter 4 Thermochemistry
M BR
 

Viewers also liked (14)

Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
makefiles tutorial
makefiles tutorialmakefiles tutorial
makefiles tutorial
 
Proteomics data standards
Proteomics data standardsProteomics data standards
Proteomics data standards
 
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusablePRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
 
Introduction To Makefile
Introduction To MakefileIntroduction To Makefile
Introduction To Makefile
 
PBS Web (Spanish)
PBS Web (Spanish)PBS Web (Spanish)
PBS Web (Spanish)
 
Makefiles Intro
Makefiles IntroMakefiles Intro
Makefiles Intro
 
Introduction to Makefile
Introduction to MakefileIntroduction to Makefile
Introduction to Makefile
 
Makefile
MakefileMakefile
Makefile
 
Introduction to Makefile
Introduction to MakefileIntroduction to Makefile
Introduction to Makefile
 
Introduction to GNU Make Programming Language
Introduction to GNU Make Programming LanguageIntroduction to GNU Make Programming Language
Introduction to GNU Make Programming Language
 
Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software Development
 
OpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleOpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scale
 
Chapter 4 Thermochemistry
Chapter 4 ThermochemistryChapter 4 Thermochemistry
Chapter 4 Thermochemistry
 

Similar to Makefiles Bioinfo

Linux intro 5 extra: makefiles
Linux intro 5 extra: makefilesLinux intro 5 extra: makefiles
Linux intro 5 extra: makefiles
Giovanni Marco Dall'Olio
 
Linux intro 4 awk + makefile
Linux intro 4  awk + makefileLinux intro 4  awk + makefile
Linux intro 4 awk + makefile
Giovanni Marco Dall'Olio
 
Introduction to Command Line & Batch files
Introduction to Command Line& Batch filesIntroduction to Command Line& Batch files
Introduction to Command Line & Batch files
Hayder F. Shamary
 
Bioinformatica 27-10-2011-p4-files
Bioinformatica 27-10-2011-p4-filesBioinformatica 27-10-2011-p4-files
Bioinformatica 27-10-2011-p4-files
Prof. Wim Van Criekinge
 
course slides -- powerpoint
course slides -- powerpointcourse slides -- powerpoint
course slides -- powerpoint
webhostingguy
 
Demystifying Maven
Demystifying MavenDemystifying Maven
Demystifying Maven
Mike Desjardins
 
A05
A05A05
A05
lksoo
 
Basic Make
Basic MakeBasic Make
Basic Make
Alec Clews
 
NYPHP March 2009 Presentation
NYPHP March 2009 PresentationNYPHP March 2009 Presentation
NYPHP March 2009 Presentation
brian_dailey
 
Perl In The Command Line
Perl In The Command LinePerl In The Command Line
Perl In The Command Line
Marcos Rebelo
 
50 Most Frequently Used UNIX Linux Commands -hmftj
50 Most Frequently Used UNIX  Linux Commands -hmftj50 Most Frequently Used UNIX  Linux Commands -hmftj
50 Most Frequently Used UNIX Linux Commands -hmftj
LGS, GBHS&IC, University Of South-Asia, TARA-Technologies
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
dwm042
 
Linux intermediate level
Linux intermediate levelLinux intermediate level
Linux intermediate level
Madhavendra Dutt
 
matmultHomework3.pdfNames of Files to Submit matmult..docx
matmultHomework3.pdfNames of Files to Submit  matmult..docxmatmultHomework3.pdfNames of Files to Submit  matmult..docx
matmultHomework3.pdfNames of Files to Submit matmult..docx
andreecapon
 
Packaging for the Maemo Platform
Packaging for the Maemo PlatformPackaging for the Maemo Platform
Packaging for the Maemo Platform
Jeremiah Foster
 
Love Your Command Line
Love Your Command LineLove Your Command Line
Love Your Command Line
Liz Henry
 
Linux
LinuxLinux
Linux
Rathan Raj
 
6 preprocessor macro header
6 preprocessor macro header6 preprocessor macro header
6 preprocessor macro header
hasan Mohammad
 
Unit_V_Files handling in c programming language.pptx
Unit_V_Files handling in c programming language.pptxUnit_V_Files handling in c programming language.pptx
Unit_V_Files handling in c programming language.pptx
raushankumarthakur7
 
Algorithm2e package for Latex
Algorithm2e package for LatexAlgorithm2e package for Latex
Algorithm2e package for Latex
Chris Lee
 

Similar to Makefiles Bioinfo (20)

Linux intro 5 extra: makefiles
Linux intro 5 extra: makefilesLinux intro 5 extra: makefiles
Linux intro 5 extra: makefiles
 
Linux intro 4 awk + makefile
Linux intro 4  awk + makefileLinux intro 4  awk + makefile
Linux intro 4 awk + makefile
 
Introduction to Command Line & Batch files
Introduction to Command Line& Batch filesIntroduction to Command Line& Batch files
Introduction to Command Line & Batch files
 
Bioinformatica 27-10-2011-p4-files
Bioinformatica 27-10-2011-p4-filesBioinformatica 27-10-2011-p4-files
Bioinformatica 27-10-2011-p4-files
 
course slides -- powerpoint
course slides -- powerpointcourse slides -- powerpoint
course slides -- powerpoint
 
Demystifying Maven
Demystifying MavenDemystifying Maven
Demystifying Maven
 
A05
A05A05
A05
 
Basic Make
Basic MakeBasic Make
Basic Make
 
NYPHP March 2009 Presentation
NYPHP March 2009 PresentationNYPHP March 2009 Presentation
NYPHP March 2009 Presentation
 
Perl In The Command Line
Perl In The Command LinePerl In The Command Line
Perl In The Command Line
 
50 Most Frequently Used UNIX Linux Commands -hmftj
50 Most Frequently Used UNIX  Linux Commands -hmftj50 Most Frequently Used UNIX  Linux Commands -hmftj
50 Most Frequently Used UNIX Linux Commands -hmftj
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
 
Linux intermediate level
Linux intermediate levelLinux intermediate level
Linux intermediate level
 
matmultHomework3.pdfNames of Files to Submit matmult..docx
matmultHomework3.pdfNames of Files to Submit  matmult..docxmatmultHomework3.pdfNames of Files to Submit  matmult..docx
matmultHomework3.pdfNames of Files to Submit matmult..docx
 
Packaging for the Maemo Platform
Packaging for the Maemo PlatformPackaging for the Maemo Platform
Packaging for the Maemo Platform
 
Love Your Command Line
Love Your Command LineLove Your Command Line
Love Your Command Line
 
Linux
LinuxLinux
Linux
 
6 preprocessor macro header
6 preprocessor macro header6 preprocessor macro header
6 preprocessor macro header
 
Unit_V_Files handling in c programming language.pptx
Unit_V_Files handling in c programming language.pptxUnit_V_Files handling in c programming language.pptx
Unit_V_Files handling in c programming language.pptx
 
Algorithm2e package for Latex
Algorithm2e package for LatexAlgorithm2e package for Latex
Algorithm2e package for Latex
 

More from Giovanni Marco Dall'Olio

Fehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal ClubFehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal Club
Giovanni Marco Dall'Olio
 
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Giovanni Marco Dall'Olio
 
Agile bioinf
Agile bioinfAgile bioinf
Version control
Version controlVersion control
Version control
Giovanni Marco Dall'Olio
 
Linux intro 5 extra: awk
Linux intro 5 extra: awkLinux intro 5 extra: awk
Linux intro 5 extra: awk
Giovanni Marco Dall'Olio
 
Linux intro 3 grep + Unix piping
Linux intro 3 grep + Unix pipingLinux intro 3 grep + Unix piping
Linux intro 3 grep + Unix piping
Giovanni Marco Dall'Olio
 
Linux intro 2 basic terminal
Linux intro 2   basic terminalLinux intro 2   basic terminal
Linux intro 2 basic terminal
Giovanni Marco Dall'Olio
 
Linux intro 1 definitions
Linux intro 1  definitionsLinux intro 1  definitions
Linux intro 1 definitions
Giovanni Marco Dall'Olio
 
Wagner chapter 5
Wagner chapter 5Wagner chapter 5
Wagner chapter 5
Giovanni Marco Dall'Olio
 
Wagner chapter 4
Wagner chapter 4Wagner chapter 4
Wagner chapter 4
Giovanni Marco Dall'Olio
 
Wagner chapter 3
Wagner chapter 3Wagner chapter 3
Wagner chapter 3
Giovanni Marco Dall'Olio
 
Wagner chapter 2
Wagner chapter 2Wagner chapter 2
Wagner chapter 2
Giovanni Marco Dall'Olio
 
Wagner chapter 1
Wagner chapter 1Wagner chapter 1
Wagner chapter 1
Giovanni Marco Dall'Olio
 
Hg for bioinformatics, second part
Hg for bioinformatics, second partHg for bioinformatics, second part
Hg for bioinformatics, second part
Giovanni Marco Dall'Olio
 
Hg version control bioinformaticians
Hg version control bioinformaticiansHg version control bioinformaticians
Hg version control bioinformaticians
Giovanni Marco Dall'Olio
 
The true story behind the annotation of a pathway
The true story behind the annotation of a pathwayThe true story behind the annotation of a pathway
The true story behind the annotation of a pathway
Giovanni Marco Dall'Olio
 
Plotting data with python and pylab
Plotting data with python and pylabPlotting data with python and pylab
Plotting data with python and pylab
Giovanni Marco Dall'Olio
 
biopython, doctest and makefiles
biopython, doctest and makefilesbiopython, doctest and makefiles
biopython, doctest and makefiles
Giovanni Marco Dall'Olio
 
Web 2.0 e ricerca scientifica - Web 2.0 and scientific research
Web 2.0 e ricerca scientifica - Web 2.0 and scientific researchWeb 2.0 e ricerca scientifica - Web 2.0 and scientific research
Web 2.0 e ricerca scientifica - Web 2.0 and scientific research
Giovanni Marco Dall'Olio
 

More from Giovanni Marco Dall'Olio (20)

Fehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal ClubFehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal Club
 
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
 
Agile bioinf
Agile bioinfAgile bioinf
Agile bioinf
 
Version control
Version controlVersion control
Version control
 
Linux intro 5 extra: awk
Linux intro 5 extra: awkLinux intro 5 extra: awk
Linux intro 5 extra: awk
 
Linux intro 3 grep + Unix piping
Linux intro 3 grep + Unix pipingLinux intro 3 grep + Unix piping
Linux intro 3 grep + Unix piping
 
Linux intro 2 basic terminal
Linux intro 2   basic terminalLinux intro 2   basic terminal
Linux intro 2 basic terminal
 
Linux intro 1 definitions
Linux intro 1  definitionsLinux intro 1  definitions
Linux intro 1 definitions
 
Wagner chapter 5
Wagner chapter 5Wagner chapter 5
Wagner chapter 5
 
Wagner chapter 4
Wagner chapter 4Wagner chapter 4
Wagner chapter 4
 
Wagner chapter 3
Wagner chapter 3Wagner chapter 3
Wagner chapter 3
 
Wagner chapter 2
Wagner chapter 2Wagner chapter 2
Wagner chapter 2
 
Wagner chapter 1
Wagner chapter 1Wagner chapter 1
Wagner chapter 1
 
Hg for bioinformatics, second part
Hg for bioinformatics, second partHg for bioinformatics, second part
Hg for bioinformatics, second part
 
Hg version control bioinformaticians
Hg version control bioinformaticiansHg version control bioinformaticians
Hg version control bioinformaticians
 
The true story behind the annotation of a pathway
The true story behind the annotation of a pathwayThe true story behind the annotation of a pathway
The true story behind the annotation of a pathway
 
Plotting data with python and pylab
Plotting data with python and pylabPlotting data with python and pylab
Plotting data with python and pylab
 
Pycon
PyconPycon
Pycon
 
biopython, doctest and makefiles
biopython, doctest and makefilesbiopython, doctest and makefiles
biopython, doctest and makefiles
 
Web 2.0 e ricerca scientifica - Web 2.0 and scientific research
Web 2.0 e ricerca scientifica - Web 2.0 and scientific researchWeb 2.0 e ricerca scientifica - Web 2.0 and scientific research
Web 2.0 e ricerca scientifica - Web 2.0 and scientific research
 

Recently uploaded

Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
Zilliz
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
Zilliz
 
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
bellared2
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3
DianaGray10
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
Improving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning ContentImproving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning Content
Enterprise Knowledge
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
SynapseIndia
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
shyamraj55
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdfLeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
SelfMade bd
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Nicolás Lopéz
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
David Wilson
 
kk vathada _digital transformation frameworks_2024.pdf
kk vathada _digital transformation frameworks_2024.pdfkk vathada _digital transformation frameworks_2024.pdf
kk vathada _digital transformation frameworks_2024.pdf
KIRAN KV
 
Patch Tuesday de julio
Patch Tuesday de julioPatch Tuesday de julio
Patch Tuesday de julio
Ivanti
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
Baishakhi Ray
 

Recently uploaded (20)

Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
 
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Russian Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3Communications Mining Series - Zero to Hero - Session 3
Communications Mining Series - Zero to Hero - Session 3
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
Improving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning ContentImproving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning Content
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
Tailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer InsightsTailored CRM Software Development for Enhanced Customer Insights
Tailored CRM Software Development for Enhanced Customer Insights
 
Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecaseIntegrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdfLeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
 
Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for SuccessMastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
 
kk vathada _digital transformation frameworks_2024.pdf
kk vathada _digital transformation frameworks_2024.pdfkk vathada _digital transformation frameworks_2024.pdf
kk vathada _digital transformation frameworks_2024.pdf
 
Patch Tuesday de julio
Patch Tuesday de julioPatch Tuesday de julio
Patch Tuesday de julio
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
 

Makefiles Bioinfo

  • 1. BioEvo technical seminars GNU/Make and bioinformatics G.M. Dall'Olio Barcelona, 06/02/2009
  • 2. Original problem statement Compiled languages programmers (C, C++, fortran, etc..) have to frequently execute complex shell commands: gcc -c -Wall -ansi -I/pkg/chempak/include dat2csv.c
  • 3. g++ -c main.cpp; g++ -c func.cpp; g++ main.o func.o
  • 4. rm *.o These commands are needed to convert a C++/C source code file to a binary file.
  • 5. Shell commands in bioinformatics In bioinformatics it is frequent to use command line tools with complex syntax: grep, head, gawk, sed, cat.. (tools to work with flat files data)
  • 7. Many suites of binary programs (emboss, phylip, blast, t-coffee, plink, genepop, gromacs, rosetta...)
  • 9. Common problem In short, C programmers and many bioinformaticians have two problems in common: Have a way to store command-line instructions with different parameters
  • 10. Execute these commands only when necessary (don't calculate again some results, if they have already been calculated)
  • 11. GNU/make make is a tool to store command-line instructions and re-execute them quickly, along with all their parameters
  • 12. It is a declarative programming language
  • 13. It belongs to a class of softwares called 'automated build tools'
  • 14. Simplest Makefile example The simplest Makefile contains just the name of a task and the commands associated with it: print_hello is a makefile 'rule': it stores the commands needed to say 'Hello, world!' to the screen.
  • 15. Simplest Makefile example Makefile rule Target of the rule Commands associated with the rule This is a tabulation (not 8 spaces)
  • 16. Simplest Makefile example Create a file in your computer and save it as ' Makefile '.
  • 17. Write these instructions in it: print_hello : echo 'Hello, world!!'
  • 18. Then, open a terminal and type: This is a tabulation (<Tab> key) make -f Makefile print_hello
  • 20. Simplest Makefile example – explanation When invoked, the program 'make' looks for a file in the current directory called 'Makefile'
  • 21. When we type 'make print_hello', it executes any procedure (target) called 'print_hello' in the makefile
  • 22. It then shows the commands executed and their output
  • 23. Tip1: the 'Makefile' file The '-f' option allows you to define the file which contains the instructions for make
  • 24. If you omit this option, make will look for any file called 'Makefile' in the current directory
  • 25. make -f Makefile all is equivalent to: make all
  • 26. A sligthly longer example You can add as many commands you like to a rule
  • 27. For example, this ' print_hello ' rule contains 5 commands
  • 28. Note: ignore the '@' thing, it is only to disable verbose mode (explained later)
  • 29. A more complex example
  • 30. Make - advantages Make allows you to save shell commands along with their parameters and re-execute them;
  • 31. It allows you to use command-line tools which are more flexible;
  • 32. Combined with a revision control software, it makes possible to reproduce all the operations made to your data;
  • 33. Second part A closer look at make syntax (target and commands)
  • 34. The target syntax Makefile syntax: <target> : (prerequisites) <commands associated to the rule>
  • 35. The target syntax The target of a rule can be either a title for the task, or a file name.
  • 36. Everytime you call a make rule (example: 'make all'), the program looks for a file called like the target name (e.g. 'all', 'clean', 'inputdata.txt', 'results.txt')
  • 37. The rule is executed only if that file doesn't exists.
  • 38. Filename as target names In this makefile, we have two rules: 'testfile.txt' and 'clean'
  • 39. Filename as target names In this makefile, we have two rules: ' testfile.txt ' and ' clean '
  • 40. When we call ' make testfile.txt ', make checks if a file called 'testfile.txt' already exists.
  • 41. Filename as target names The commands associated with the rule ' testfile.txt ' are executed only if that file doesn't exists already
  • 42. Multiple target definition A target can also be a list of files
  • 43. You can retrieve the matched target with the special variable $@
  • 44. Special characters The % character can be used as a wild card
  • 45. For example, a rule with the target: %.txt : .... would be activated by any file ending with '.txt' 'make 1.txt', 'make 2.txt', etc.. We will be able to retrieve the matched expression with '$*'
  • 46. Special character % / creating more than a file at a time
  • 47. Makefile – cluster support Note that in the previous example we created three files at the same time, by executing three times the command 'touch'
  • 48. If we use the '-j' option when invoking make, the three processess will be launched in parallel
  • 49. Makefile syntax: <target> : (prerequisites) <commands associated to the rule> The commands syntax
  • 50. Inactivating verbose mode You can disactivate the verbose mode for a line by adding '@' at its beginning: Differences here
  • 51. Skipping errors The modifiers '-' tells make to ignore errors returned by a command
  • 52. Example: 'mkdir /var' will cause an error (the '/var' directory already exists) and cause gnu/make to exit
  • 53. '-mkdir /var' will cause an error anyway, but gnu/make will ignore it
  • 54. Moving throught directories A big issue with make is that every line is executed as a different shell process.
  • 55. So, this: lsvar : cd /var ls
  • 56. Won't work (it will list only the files in the current directory, not /var)
  • 57. The solution is to put everything in a single process:
  • 58. lsvar : (cd /var; ls)
  • 59. Third part Prerequisites and conditional execution
  • 60. Makefile syntax: <target> : (prerequisites) <commands associated to the rule> We will look at the 'prerequisites' part of a make rule, that I had skipped before The commands syntax
  • 61. Real Makefile-rule syntax Complete syntax for a Makefile rule: <target> : <list of prerequisites> <commands associated to the rule>
  • 62. Example: result1.txt : data1.txt data2.txt cat data1.txt data2.txt > result1.txt @echo 'result1.txt' has been calculated'
  • 63. Prerequisites are files (or rules) that need to exists already in order to create the target file.
  • 64. If 'data1.txt' and 'data2.txt' don't exist, the rule 'result1.txt' will exit with an error (no rule to create them)
  • 65. Piping Makefile rules together You can pipe two Makefile rules together by defining prerequisites
  • 66. Piping Makefile rules together The rule 'result1.txt' depends on the rule 'data1.txt', which should be executed first
  • 67. Piping Makefile rules together Let's look at this example again:
  • 68. what happens if we remove the file 'result1.txt' we just created?
  • 69. Piping Makefile rules together Let's look at this example again:
  • 70. what happens if we remove the file 'result1.txt' we just created?
  • 71. The second time we run the 'make result1.txt' command, it is not necessary to create data1.txt again, so only a rule is executed
  • 72. Other pipe example all : result1.txt result2.txt result1.txt : data1.txt calculate_result.py python calculate_result.txt --input data1.txt result2.txt : data2.txt cut -f 1, 3 data2.txt > result2.txt
  • 73. Make all will calculate result1.txt and result2.txt, if they don't exist already (and they are older than their prerequisites)
  • 74. Conditional execution by modification date We have seen how make can be used to create a file, if it doesn't exists. file.txt: # if file.txt doesn't exists, then create it: echo 'contents of file.txt' > file.txt
  • 75. We can do better: create or update a file only if it is newer than its prerequisites
  • 76. Conditional execution by modification date Let's have a better look at this example: result1.txt : data1.txt calculate_result.py python calculate_result.txt --input data1.txt
  • 77. A great feature of make is that it execute a rule not only if the target file doesn't exist, but also if it has a 'last modification date' earlier than all of its prerequisites
  • 78. Conditional execution by modification date result1.txt : data1.txt @sed 's/b/B/i' data1.txt > result1.txt @echo 'result1.txt has been calculated' In this example, result1.txt will be recalculated every time 'data1.txt' is modified
  • 79. $: touch data1.txt calculate_result.py $: make result1.txt result1.txt has been calculated $: make result1.txt result1.txt is already up-to-date $: touch data1.txt $: make result1.txt result1.txt has been calculated
  • 80. Conditional execution - applications This 'conditional execution by modification date comparison' feature of make is very useful
  • 81. Let's say you discover an error in one of your input data: you will be able to repeat the analysis by executing only the operations needed
  • 82. You can also use it to re-calculate results every time you modify a script: result.txt : scripts/calculate_result.py python calculate_result.py > result.py
  • 84. Fourth part Variables and functions
  • 85. Variables and functions You may have already noticed that Make's syntax is really old :)
  • 86. In fact, it is a ~40 years old language
  • 87. It uses special variables like $@, $^, and it can be worst than perl!!!
  • 88. (perl developers – please don't get mad at me :-) )
  • 89. Variables Variables are declared with a '=' and by convention are upper case.
  • 90. They are called by including their name in ' $() ' WORKING_DIR is a variable
  • 91. Special variables - $@ Make uses some custom variables, with a syntax similar to perl
  • 92. '$@' always corresponds to the target name: $: cat >Makefile %.txt : echo $@ $: make filename.txt echo filename.txt filename.txt $: $@ took the value of 'filename.txt'
  • 93. Other special variables $@ The rule's target $< The rule's first prerequisite $? All the rule's out of date prerequisites $^ All Prerequisites
  • 94. Functions Usually you don't want to declare functions in make, but there are some built-in utilities that can be useful
  • 95. Most frequently used functions: $(addprefix <prefix>, list) -> add a prefix to a space-separated list example: FILES = file1 file2 file3 $(addprefix /home/user/data, $(FILES)
  • 97. Full makefile example INPUTFILES = lower_DAF lower_maf upper_maf lower_daf upper_daf RESULTSDIR = ./results RESULTFILES = $(addprefix $(RESULTSDIR)/, $(addsuffix _filtered.txt,$(INPUTFILES))) help : @echo 'type &quot;make filter&quot; to calculate results' all : $(RESULTFILES) $(RESULTSDIR)/%_filtered.txt : data/%.txt src/filter_genes.py python src/filter_genes.py --genes data/Genes.txt --window $< --output $@ It looks like very complicated, but in the end you always use the same Makefile structure
  • 98. Fifth part Testing, discussion, other examples and alternatives
  • 99. Testing a makefile make -n: only shows the commands to be executed
  • 100. You can pass variables to make: $: make say_hello MYNAME=”Giovanni” hello, Giovanni
  • 101. Strongly suggested: use a Revision Control Software with support for branching (git, hg, bazaar) and create a branch for testing
  • 102. Another complex Makefile example our starting point is the file myseq , the end point is the blast results blastout
  • 103. we first want to mask out any repeats using rmask to create myseq.m
  • 104. we then blastx myseq.m against a protein db called mydb
  • 105. before blastx is run the protein db must be indexed using formatdb (slide taken from biomake web site)
  • 106. The “ make ” command make uses unix file modification timestamps when checking dependencies if a subtarget is more recent than the goal target, then re-execute action (slide taken from biomake web site)
  • 107. BioMake and alternatives BioMake is an alternative to make, thought to be used in bioinformatics
  • 108. Developed to annotate the Drosophila melanogaster genome (Berkeley university)
  • 110. Separates the rule's name from the name of the target files
  • 111. A BioMake example (slide taken from biomake web site)
  • 112. Other alternatives There are other many alternatives to make: BioMake (prolog?)
  • 117. Waf (python) This list is biased because I am a python programmer :)
  • 118. These tools are more oriented to software development
  • 119. Conclusions Make is very basic for bioinformatics
  • 120. It is useful for the simpler tasks: Logging the operations made to your data files
  • 123. Apply a pipeline to different datasets It is installed in almost any unix system and has a standard syntax (interchangeable, reproducible)
  • 124. Study it and understand its logic. Use it in the most basic way, without worrying about prerequisites and special variables. Later you can look for easier tools (biomake, rake, taverna, galaxy, your own, etc..)
  • 125. Suggested readings Software Carpentry for bioinformatics http://swc.scipy.org/lec/build.html
  • 126. A Makefile is a pipeline http://www.nodalpoint.org/2007/03/18/a_pipeline_is_a_makefile
  • 127. BioMake and SKAM http://skam.sourceforge.net/
  • 128. BioWiki Make Manifesto http://biowiki.org/MakefileManifesto
  • 129. Discussion on the BIP mailing list http://www.mail-archive.com/biology-in-python@lists.idyll.org/msg00013.html
  • 130. Gnu/Make manual by R.Stallman and R.MacGrath http://theory.uwinnipeg.ca/gnu/make/make_toc.html
  • 131. End of talk!! Are you still alive? :-)
  • 132. Thanks to: The author of 'Software carpentry for bioinformatics'
  • 133. The people in the bip mailing list, for discussion
  • 134. The author of bioinformaticszen.org and the people on nodalpoint for priming
  • 135. All the people that have worked on this topic or who wrote a blog post / free internet document on it And thanks to you all!!
  • 136.  
  • 137. Inactivating verbose mode On make, the verbose mode is activated by default
  • 138. Every time a command is called, make shows the exact line which is being executed This is the statement being executed
  • 139. Makefile syntax Make is also a real programming language, 30 years old, with a syntax similar to bash.
  • 140. It is a declarative language. In a make source code file, you define a set of rules, each one corresponding to a task, with this syntax: <target> : <list of prerequisites> <commands associated to the rule>
  • 141. Example: results.txt : data1.txt cut -f 1-10 data.txt > results.txt