Your SlideShare is downloading. ×
biopython, doctest and makefiles
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

biopython, doctest and makefiles


Published on

This is a very short 30-minutes talk that I gave to a barcelona python developers meeting. …

This is a very short 30-minutes talk that I gave to a barcelona python developers meeting.

It explain a proposal to use doctest for biopython documentation (and in general, in bioinformatics).

It also contains an introduction and the use of automated build tools in bioinformatics, like make and scons.

Published in: Technology

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Barcelona Python Developers Seminars biopython, doctest and makefiles
  • 2. This is me
    • Giovanni
    • Phd student in a Population Genetics lab
    • Not a biopython dev
    • (that could be not my real photo)
  • 3. Intro
    • BioPython -> a collection of standard python modules for bioinformatics
    • Advantages of using open source libraries in science:
      • more reproducibility
      • easier to compare results
      • less errors
      • less time spent
  • 4. BioPython – some use cases
    • The human genome sequencing project (2001):
    • Up to ~3*10 9 characters
    • Lot of regexs (perl-ists like it)
    • Could be obtained for <1000$ in the near future
  • 5. BioPython – use cases
    • Conversion between different formats
    • Structure data into objects (genes, proteins, species, etc..)
    • Match regular expressions/motifs
    • Launch external tools (web or local)
    • Retrieve data from public online resources
    • Interrogate databases
  • 6. BioPython documentation
    • How the documentation of a project like biopython should be?
      • follow strict specifications (it does already, epydoc)
      • be always up-to-date
      • have many examples of usage (there are many in the tutorials)
    • A python module called ' doctest ' that can help in doing this.
  • 7.
        • def say_hello (name): ''' print hello <name> to the screen example: >>> say_hello('Albert Einstein') hello Albert Einstein!!! ''' print 'hello ' + name + '!!!'
    • doctest allows to incorporate examples of the usage of a function in its docstring, and use them as tests.
    Example of say_hello's usage function's docstring (everything in green)
  • 8. The docstring
    • The docstring is what is shown when you ask for help for a function;
    >>> help (say_hello) Help on function say_hello in module __main__: say_hello(name) print hello <name> to the screen example: >>> say_hello('Albert Einstein') hello Albert Einstein!!!
  • 9. doctest – how does it works
    • #!/usr/bin/env python
    • def sum (x, y): ''' sums two numbers
    • example: >>> print sum(1, 2) 3 ''' return x + y
    • if __name__ == ' __main__ ': import doctest doctest.testmod()
    • doctest.testmod () looks for any line beginning with ' >>> ' and execute it as a python command
    • The result is compared with the subsequent lines (expected output). If there are differences, an error is raised.
    • If 'print sum(1, 2)' doesn't return 3, an error is raised
  • 10. doctest - examples
    • BioPython - SeqIO.parse
  • 11. doctest – file parsing example
    • In bioinformatics there are many formats with semi-homonymous names
      • ped, tped, bed, tmap, pdb, fasta...
    • It is useful to put an example of input file in every parser function
  • 12. Choose good examples
    • Write the doctest along with who will use the script (e.g. A fellow scientist)
    • Ask them 'how this function is supposed to behave in this example?'
    • Simplify: round all numbers to multiples of 100, put comments
  • 13. Doctest – Pros and Cons
    • Pros:
      • docs always up to date
      • Usage examples
      • Quick tests when you are coding
    • Cons:
      • Functions that read files (StringIO? NamedTempFile?)
      • Still need to write a unittest
      • Can't use lines longer than 80 characters (PEP8)
      • Random generators / statistics / rounding
  • 14. Bioinformatics – a different approach
    • The approach between programming software and programming experiments is different:
      • Testing has different dimensions (biological meaning, reproducibility)
      • Usually you write numerous scripts, each one carrying out a small task, and glue them with a pipeline/wrapper script/makefile/automated builds tool/xml described workflow/insert others here
    • I am a makefile guy
  • 15. What is a makefile?
    • gnu/make is an utility for building C/C++ programs.
    • It can be used to save shell commands (...) with their options and re-execute them at will.
    • Example: :$ make all python --option1 --option2 perl --input inputfile --option3 perl --inputfile inputfile2
  • 16. Simplest Makefile example
    • $: cat Makefile help : echo 'execute “make all” to carry out the whole analysis' get_data : python --database ensembl --specie Human --output sequences.fasta calculate_results : perl --option1 --option2 --input sequence.fasta --output results.txt all : get_data calculate_results
  • 17. Makefiles – Pros
    • Conditional execution
      • If there is no need to execute a command, it is skipped (checks if the expected output file already exists and is up-to-date)
    • Chaining commands
      • You can define the order in which commands must be executed (download sequences first, then read them)
    • Support for clusters
    • Syntax is ugly, but standard
  • 18. Make - Cons
    • Gnu/Make has a very ugly syntax
    • Really, I hate its syntax
    • I am looking for substitutes in python:
      • scons
      • paver
      • waf (google summer of code project)
    • Still haven't start using them
    • ¿Implement something in biopython?
  • 19. A more complicated Makefile
    • Variables like %, $@, $<
    • Modificators like -, @
    • addprefix, addsuffix ??
    • Triple parentesis ??
  • 20. Thanks for the attention! Did you like the talk?
  • 21. BioPython – use cases
    • Single Nucleotides Polymorphisms are positions in the genome that tend to vary most between different individuals
    • We are working with data on 650.000 SNPs on 1000 of individuals
    • Need to organize data on objects (SNPs, Genotypes, Individuals, Populations), use a database for support, calculate statistics on them
  • 22. Doctest – a closer look
    • #usr/bin/env python
    • def say_hello (name): ''' print hello (name) to the screen
    • example: >>> say_hello('Albert Einstein') hello Albert Einstein!!! ''' print ' hello ' + name + ' !!! '
    • if __name__ == ' __main__ ': import doctest doctest.testmod()
    normal doc example of function usage expected output body of the function call to the doctest module new function definition