Prins Bio Lib Bosc 2009

1,046 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,046
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Prins Bio Lib Bosc 2009

  1. 1. BioLib Development Report (BOSC 2009) C and C++ libraries for BioPerl, BioJAVA, BioPython, BioRuby. . . Pjotr Prins (pjotr.prins at wur.nl) Wageningen University, Dept. of Nematology; Groningen Bioinformatics Center BioLib Development Report (BOSC 2009) – p.
  2. 2. The stated problem Many high-level languages used in Biology (Perl, R, Java. . . ) Duplication of effort in all Bio* efforts - BioPerl, BioConductor, BioJAVA. . . in particular for data IO/parsing/interpretation (Alan’s keynote) BioLib Development Report (BOSC 2009) – p.
  3. 3. What if? What if you need some functionality (e.g. linear regression) in Perl, you can Roll your own in Perl (performance?) Bind against existing clib using Perl-XS (ugh) Bind using SWIG (better, but one-off like Perl::GSL) Bind using SWIG with Biolib (all languages) In fact, it may already be there (GSL or Rlib) BioLib Development Report (BOSC 2009) – p.
  4. 4. DRY-DRO Do not repeat yourself (DRY) Do not repeat ourselves (DRO) Bio*: BioPerl, BioPython, BioRuby, BioJAVA, BioConductor, BioHaskell, BioCPP, . . . Limited pool of programmers in bioinformatics Usually 2 or 3 competing implementations Use existing implementations BioLib Development Report (BOSC 2009) – p.
  5. 5. Why bother? Open Source Software is about eyes BioLib Development Report (BOSC 2009) – p.
  6. 6. Eyes! Eyes like these! BioLib Development Report (BOSC 2009) – p.
  7. 7. Eyes (3) Eyes like these!. . . BioLib Development Report (BOSC 2009) – p.
  8. 8. Eyes (5) Well, realistically. . . BioLib Development Report (BOSC 2009) – p.
  9. 9. BioLib project Objectives: Utilize existing C/C++ libraries Create mappings to all Bio* languages Focus on correctness and performance A central place (plumbing) An OBF affiliated project BioLib Development Report (BOSC 2009) – p.
  10. 10. Power Trio Plumbing power trio: Git - modular version control Cmake - make file generator SWIG - simplified wrapper and interface generator BioLib Development Report (BOSC 2009) – p. 1
  11. 11. Power trio (1) GIT Version control on steroids What source control should be Easy branching of development Submodules BioLib Development Report (BOSC 2009) – p. 1
  12. 12. Power trio (2) CMake Generator for make files Very modular approach Resolves complex dependencies Looks like a simple programming language Easy on the eyes and mind BioLib Development Report (BOSC 2009) – p. 1
  13. 13. Power trio (3) SWIG Code generator for mappings done right: Rules for generating code Macros (DRY) Pattern matching Flexible Supports many languages BioLib Development Report (BOSC 2009) – p. 1
  14. 14. Achievements (year one) Affyio: Affymetrix arrays (357 methods; 10K lines) Staden: Sequencer trace files (95; 16K) GSL: GNU Science Library (2702; 200K) Rlib: R routines (> 176; 43K) R/qtl: Quantitative genetics (> 100; 10K)* Libsequence: Sequence analysis (> 1000; 21K)* Bio++: Sequence analysis (> 1000; 52K)* Code base 350K lines USD 10 million R&D BioLib Development Report (BOSC 2009) – p. 1
  15. 15. Source tree |-- clibs | |-- affyio-1.8 | |-- biolib_R | |-- biolib_microarray | |-- libsequence-1.6.6 |-- mappings | ‘-- swig | |-- perl | | |-- affyio | | |-- staden_io_lib | | ‘-- test | |-- python | |-- ruby 104 directories, 668 files BioLib Development Report (BOSC 2009) – p. 1
  16. 16. Adding a C lib Unpack C/C++ library in ./src/clibs/modulename Add CMake file - compiles into .so shared library Create Perl mapping in ./src/mapping/swig/perl/module Add SWIG .i file Add CMake file - compiles into .pm and .so shared library BioLib Development Report (BOSC 2009) – p. 1
  17. 17. CMake goodies # Defining a C library build in Biolib: SET (M_NAME staden_io_lib) SET (M_VERSION 1.11.6) FIND_PACKAGE(ZLIB REQUIRED) BUILD_CLIB() ADD_LIBRARY(${LIBNAME} SHARED array.c compress.c compression.c ctfCompress.c (...) INSTALL_CLIB() BioLib Development Report (BOSC 2009) – p. 1
  18. 18. CMake for Perl # Defining a C library mapping for Perl SET (USE_ZLIB TRUE) SET (USE_INCLUDEPATH io_lib) FIND_PACKAGE(MapPerl) POST_BUILD_PERL_BINDINGS() TEST_PERL_BINDINGS() INSTALL_PERL_BINDINGS() BioLib Development Report (BOSC 2009) – p. 1
  19. 19. SWIG Map %include <Read.h> #define TT_ANY 0 #define TT_ZTR 7 typedef struct { int format; char *trace_name; int NPoints; int NBases; (...) } Read; Read *read_reading(char *fn, int format); BioLib Development Report (BOSC 2009) – p. 1
  20. 20. Perl use biolib::staden_io_lib; $result = staden_io_lib::read_reading($fn, $staden_io_lib::TT_ANY); print("format=",staden_io_libc::Read_format_get($result)); print("NBases=",$result->{NBases}); print("base=",staden_io_libc::Read_base_get($result)); Outputs: format=7 NBases=766 base=NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT CGGTCCCAACTTAATTGTACA... BioLib Development Report (BOSC 2009) – p. 2
  21. 21. Python import biolib.staden_io_lib as io_lib result = io_lib.read_reading(procsrffn, io_lib.TT_ANY) print result.format print result.NBases print result.base 7 766 NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT CGGTCCCAACTTAATTGTACA... BioLib Development Report (BOSC 2009) – p. 2
  22. 22. For the Perl coder Adding functionality in language of choice Easier deployment - ’install biolib-perl’ Shared correctness testing Generated API documentation BioLib Development Report (BOSC 2009) – p. 2
  23. 23. For the authors Independent source trees Increased exposure (Ruby, Perl. . . ) Added unit/integration testing environment Deployment, multi-platform support (Linux, OSX, Windows) No autoconf pain (./configure and friends) Implicit access to other libraries (GSL, Rlib) Online generated API documentation BioLib Development Report (BOSC 2009) – p. 2
  24. 24. Future work Automated API documentation (with doctests) More libraries (Emboss, NCBI, . . . ) New code (HPC) More languages (JAVA, R, OCaml, . . . ) Bio* integration (CPAN, Ruby gems, Python eggs) Debian/Fedora/OSX/Windows packages More platforms (Windows without Cygwin) BioLib Development Report (BOSC 2009) – p. 2
  25. 25. Credits Ben Bolstad (Affyio), James Bonfield (Staden), Karl Broman (R/qtl) Jonathan Leto (GSL SWIG) Xin Shuai (Google SoC libsequence) Adam Smith (Google SoC Bio++) Oswaldo Trelles, José Manuel Mateos-Duran and Andrés Rodríguez (UMA) Chris Fields (BioPerl), Mark Jensen (BioPerl), Hilmar Lap (Nescent, OBF) Jaap Bakker (WU), Geert Smant (WU), Ritsert Jansen (GBIC) BioLib Development Report (BOSC 2009) – p. 2
  26. 26. BoF BioLib: Birds of a Feather Session (BoF) at 16:50 hours BioLib Development Report (BOSC 2009) – p. 2

×