Your SlideShare is downloading. ×
0
BioLib Development Report (BOSC
             2009)
 C and C++ libraries for BioPerl, BioJAVA,
         BioPython, BioRuby....
The stated problem

Many high-level languages used in Biology
(Perl, R, Java. . . )
Duplication of effort in all Bio* effo...
What if?

What if you need some functionality (e.g. linear
regression) in Perl, you can
   Roll your own in Perl (performa...
DRY-DRO

Do not repeat yourself (DRY)
Do not repeat ourselves (DRO)
Bio*: BioPerl, BioPython, BioRuby, BioJAVA,
BioConduct...
Why bother?

Open Source Software is about eyes




                               BioLib Development Report (BOSC 2009) –...
Eyes!

Eyes like these!




                   BioLib Development Report (BOSC 2009) – p.
Eyes (3)

Eyes like these!. . .




                        BioLib Development Report (BOSC 2009) – p.
Eyes (5)

Well, realistically. . .




                           BioLib Development Report (BOSC 2009) – p.
BioLib project

Objectives:
   Utilize existing C/C++ libraries
   Create mappings to all Bio* languages
   Focus on corre...
Power Trio

Plumbing power trio:
   Git - modular version control
   Cmake - make file generator
   SWIG - simplified wrappe...
Power trio (1)

GIT
  Version control on steroids
  What source control should be
   Easy branching of development
   Subm...
Power trio (2)

CMake
  Generator for make files
  Very modular approach
  Resolves complex dependencies
  Looks like a sim...
Power trio (3)

SWIG
  Code generator for mappings done right:
    Rules for generating code
    Macros (DRY)
    Pattern ...
Achievements (year one)

  Affyio: Affymetrix arrays (357 methods; 10K lines)
  Staden: Sequencer trace files (95; 16K)
  G...
Source tree

|--   clibs
|     |-- affyio-1.8
|     |-- biolib_R
|     |-- biolib_microarray
|     |-- libsequence-1.6.6
|...
Adding a C lib

Unpack C/C++ library in
./src/clibs/modulename
Add CMake file - compiles into .so shared
library
Create Per...
CMake goodies

# Defining a C library build in Biolib:
SET (M_NAME staden_io_lib)
SET (M_VERSION 1.11.6)
FIND_PACKAGE(ZLIB...
CMake for Perl

# Defining a C library mapping for Perl
SET (USE_ZLIB TRUE)
SET (USE_INCLUDEPATH io_lib)

FIND_PACKAGE(Map...
SWIG Map

%include <Read.h>

#define TT_ANY 0
#define TT_ZTR 7

typedef struct
{
    int         format;
    char       *t...
Perl

use biolib::staden_io_lib;

$result = staden_io_lib::read_reading($fn,
                                      $staden...
Python

import biolib.staden_io_lib as io_lib

result = io_lib.read_reading(procsrffn,
                             io_lib...
For the Perl coder

Adding functionality in language of choice
Easier deployment - ’install biolib-perl’
Shared correctnes...
For the authors

Independent source trees
Increased exposure (Ruby, Perl. . . )
Added unit/integration testing environment...
Future work

Automated API documentation (with doctests)
More libraries (Emboss, NCBI, . . . )
New code (HPC)
More languag...
Credits

Ben Bolstad (Affyio), James Bonfield (Staden), Karl Broman (R/qtl)

Jonathan Leto (GSL SWIG)

Xin Shuai (Google So...
BoF

BioLib: Birds of a Feather Session (BoF) at 16:50 hours




                                                         ...
Upcoming SlideShare
Loading in...5
×

Prins Bio Lib Bosc 2009

787

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
787
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Prins Bio Lib Bosc 2009"

  1. 1. BioLib Development Report (BOSC 2009) C and C++ libraries for BioPerl, BioJAVA, BioPython, BioRuby. . . Pjotr Prins (pjotr.prins at wur.nl) Wageningen University, Dept. of Nematology; Groningen Bioinformatics Center BioLib Development Report (BOSC 2009) – p.
  2. 2. The stated problem Many high-level languages used in Biology (Perl, R, Java. . . ) Duplication of effort in all Bio* efforts - BioPerl, BioConductor, BioJAVA. . . in particular for data IO/parsing/interpretation (Alan’s keynote) BioLib Development Report (BOSC 2009) – p.
  3. 3. What if? What if you need some functionality (e.g. linear regression) in Perl, you can Roll your own in Perl (performance?) Bind against existing clib using Perl-XS (ugh) Bind using SWIG (better, but one-off like Perl::GSL) Bind using SWIG with Biolib (all languages) In fact, it may already be there (GSL or Rlib) BioLib Development Report (BOSC 2009) – p.
  4. 4. DRY-DRO Do not repeat yourself (DRY) Do not repeat ourselves (DRO) Bio*: BioPerl, BioPython, BioRuby, BioJAVA, BioConductor, BioHaskell, BioCPP, . . . Limited pool of programmers in bioinformatics Usually 2 or 3 competing implementations Use existing implementations BioLib Development Report (BOSC 2009) – p.
  5. 5. Why bother? Open Source Software is about eyes BioLib Development Report (BOSC 2009) – p.
  6. 6. Eyes! Eyes like these! BioLib Development Report (BOSC 2009) – p.
  7. 7. Eyes (3) Eyes like these!. . . BioLib Development Report (BOSC 2009) – p.
  8. 8. Eyes (5) Well, realistically. . . BioLib Development Report (BOSC 2009) – p.
  9. 9. BioLib project Objectives: Utilize existing C/C++ libraries Create mappings to all Bio* languages Focus on correctness and performance A central place (plumbing) An OBF affiliated project BioLib Development Report (BOSC 2009) – p.
  10. 10. Power Trio Plumbing power trio: Git - modular version control Cmake - make file generator SWIG - simplified wrapper and interface generator BioLib Development Report (BOSC 2009) – p. 1
  11. 11. Power trio (1) GIT Version control on steroids What source control should be Easy branching of development Submodules BioLib Development Report (BOSC 2009) – p. 1
  12. 12. Power trio (2) CMake Generator for make files Very modular approach Resolves complex dependencies Looks like a simple programming language Easy on the eyes and mind BioLib Development Report (BOSC 2009) – p. 1
  13. 13. Power trio (3) SWIG Code generator for mappings done right: Rules for generating code Macros (DRY) Pattern matching Flexible Supports many languages BioLib Development Report (BOSC 2009) – p. 1
  14. 14. Achievements (year one) Affyio: Affymetrix arrays (357 methods; 10K lines) Staden: Sequencer trace files (95; 16K) GSL: GNU Science Library (2702; 200K) Rlib: R routines (> 176; 43K) R/qtl: Quantitative genetics (> 100; 10K)* Libsequence: Sequence analysis (> 1000; 21K)* Bio++: Sequence analysis (> 1000; 52K)* Code base 350K lines USD 10 million R&D BioLib Development Report (BOSC 2009) – p. 1
  15. 15. Source tree |-- clibs | |-- affyio-1.8 | |-- biolib_R | |-- biolib_microarray | |-- libsequence-1.6.6 |-- mappings | ‘-- swig | |-- perl | | |-- affyio | | |-- staden_io_lib | | ‘-- test | |-- python | |-- ruby 104 directories, 668 files BioLib Development Report (BOSC 2009) – p. 1
  16. 16. Adding a C lib Unpack C/C++ library in ./src/clibs/modulename Add CMake file - compiles into .so shared library Create Perl mapping in ./src/mapping/swig/perl/module Add SWIG .i file Add CMake file - compiles into .pm and .so shared library BioLib Development Report (BOSC 2009) – p. 1
  17. 17. CMake goodies # Defining a C library build in Biolib: SET (M_NAME staden_io_lib) SET (M_VERSION 1.11.6) FIND_PACKAGE(ZLIB REQUIRED) BUILD_CLIB() ADD_LIBRARY(${LIBNAME} SHARED array.c compress.c compression.c ctfCompress.c (...) INSTALL_CLIB() BioLib Development Report (BOSC 2009) – p. 1
  18. 18. CMake for Perl # Defining a C library mapping for Perl SET (USE_ZLIB TRUE) SET (USE_INCLUDEPATH io_lib) FIND_PACKAGE(MapPerl) POST_BUILD_PERL_BINDINGS() TEST_PERL_BINDINGS() INSTALL_PERL_BINDINGS() BioLib Development Report (BOSC 2009) – p. 1
  19. 19. SWIG Map %include <Read.h> #define TT_ANY 0 #define TT_ZTR 7 typedef struct { int format; char *trace_name; int NPoints; int NBases; (...) } Read; Read *read_reading(char *fn, int format); BioLib Development Report (BOSC 2009) – p. 1
  20. 20. Perl use biolib::staden_io_lib; $result = staden_io_lib::read_reading($fn, $staden_io_lib::TT_ANY); print("format=",staden_io_libc::Read_format_get($result)); print("NBases=",$result->{NBases}); print("base=",staden_io_libc::Read_base_get($result)); Outputs: format=7 NBases=766 base=NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT CGGTCCCAACTTAATTGTACA... BioLib Development Report (BOSC 2009) – p. 2
  21. 21. Python import biolib.staden_io_lib as io_lib result = io_lib.read_reading(procsrffn, io_lib.TT_ANY) print result.format print result.NBases print result.base 7 766 NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT CGGTCCCAACTTAATTGTACA... BioLib Development Report (BOSC 2009) – p. 2
  22. 22. For the Perl coder Adding functionality in language of choice Easier deployment - ’install biolib-perl’ Shared correctness testing Generated API documentation BioLib Development Report (BOSC 2009) – p. 2
  23. 23. For the authors Independent source trees Increased exposure (Ruby, Perl. . . ) Added unit/integration testing environment Deployment, multi-platform support (Linux, OSX, Windows) No autoconf pain (./configure and friends) Implicit access to other libraries (GSL, Rlib) Online generated API documentation BioLib Development Report (BOSC 2009) – p. 2
  24. 24. Future work Automated API documentation (with doctests) More libraries (Emboss, NCBI, . . . ) New code (HPC) More languages (JAVA, R, OCaml, . . . ) Bio* integration (CPAN, Ruby gems, Python eggs) Debian/Fedora/OSX/Windows packages More platforms (Windows without Cygwin) BioLib Development Report (BOSC 2009) – p. 2
  25. 25. Credits Ben Bolstad (Affyio), James Bonfield (Staden), Karl Broman (R/qtl) Jonathan Leto (GSL SWIG) Xin Shuai (Google SoC libsequence) Adam Smith (Google SoC Bio++) Oswaldo Trelles, José Manuel Mateos-Duran and Andrés Rodríguez (UMA) Chris Fields (BioPerl), Mark Jensen (BioPerl), Hilmar Lap (Nescent, OBF) Jaap Bakker (WU), Geert Smant (WU), Ritsert Jansen (GBIC) BioLib Development Report (BOSC 2009) – p. 2
  26. 26. BoF BioLib: Birds of a Feather Session (BoF) at 16:50 hours BioLib Development Report (BOSC 2009) – p. 2
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×