SlideShare a Scribd company logo
BioLib Development Report (BOSC
             2009)
 C and C++ libraries for BioPerl, BioJAVA,
         BioPython, BioRuby. . .
                      Pjotr Prins (pjotr.prins at wur.nl)


Wageningen University, Dept. of Nematology; Groningen Bioinformatics Center




                                                              BioLib Development Report (BOSC 2009) – p.
The stated problem

Many high-level languages used in Biology
(Perl, R, Java. . . )
Duplication of effort in all Bio* efforts -
BioPerl, BioConductor, BioJAVA. . .
in particular for data IO/parsing/interpretation
(Alan’s keynote)




                                         BioLib Development Report (BOSC 2009) – p.
What if?

What if you need some functionality (e.g. linear
regression) in Perl, you can
   Roll your own in Perl (performance?)
   Bind against existing clib using Perl-XS (ugh)
   Bind using SWIG (better, but one-off like
   Perl::GSL)
   Bind using SWIG with Biolib (all languages)
   In fact, it may already be there (GSL or Rlib)

                                         BioLib Development Report (BOSC 2009) – p.
DRY-DRO

Do not repeat yourself (DRY)
Do not repeat ourselves (DRO)
Bio*: BioPerl, BioPython, BioRuby, BioJAVA,
BioConductor, BioHaskell, BioCPP, . . .
Limited pool of programmers in bioinformatics
Usually 2 or 3 competing implementations
Use existing implementations


                                   BioLib Development Report (BOSC 2009) – p.
Why bother?

Open Source Software is about eyes




                               BioLib Development Report (BOSC 2009) – p.
Eyes!

Eyes like these!




                   BioLib Development Report (BOSC 2009) – p.
Eyes (3)

Eyes like these!. . .




                        BioLib Development Report (BOSC 2009) – p.
Eyes (5)

Well, realistically. . .




                           BioLib Development Report (BOSC 2009) – p.
BioLib project

Objectives:
   Utilize existing C/C++ libraries
   Create mappings to all Bio* languages
   Focus on correctness and
   performance
   A central place (plumbing)
   An OBF affiliated project



                                      BioLib Development Report (BOSC 2009) – p.
Power Trio

Plumbing power trio:
   Git - modular version control
   Cmake - make file generator
   SWIG - simplified wrapper and interface
   generator




                                     BioLib Development Report (BOSC 2009) – p. 1
Power trio (1)

GIT
  Version control on steroids
  What source control should be
   Easy branching of development
   Submodules




                                   BioLib Development Report (BOSC 2009) – p. 1
Power trio (2)

CMake
  Generator for make files
  Very modular approach
  Resolves complex dependencies
  Looks like a simple
  programming language
  Easy on the eyes and mind



                                  BioLib Development Report (BOSC 2009) – p. 1
Power trio (3)

SWIG
  Code generator for mappings done right:
    Rules for generating code
    Macros (DRY)
    Pattern matching
    Flexible
    Supports many languages




                                    BioLib Development Report (BOSC 2009) – p. 1
Achievements (year one)

  Affyio: Affymetrix arrays (357 methods; 10K lines)
  Staden: Sequencer trace files (95; 16K)
  GSL: GNU Science Library (2702; 200K)
  Rlib: R routines (> 176; 43K)
  R/qtl: Quantitative genetics (> 100; 10K)*
  Libsequence: Sequence analysis (> 1000; 21K)*
  Bio++: Sequence analysis (> 1000; 52K)*

Code base 350K lines USD 10 million R&D
                                               BioLib Development Report (BOSC 2009) – p. 1
Source tree

|--   clibs
|     |-- affyio-1.8
|     |-- biolib_R
|     |-- biolib_microarray
|     |-- libsequence-1.6.6
|--   mappings
|     ‘-- swig
|         |-- perl
|         |    |-- affyio
|         |    |-- staden_io_lib
|         |    ‘-- test
|         |-- python
|         |-- ruby
104   directories, 668 files




                                        BioLib Development Report (BOSC 2009) – p. 1
Adding a C lib

Unpack C/C++ library in
./src/clibs/modulename
Add CMake file - compiles into .so shared
library
Create Perl mapping in
./src/mapping/swig/perl/module
Add SWIG .i file
Add CMake file - compiles into .pm and .so
shared library

                                  BioLib Development Report (BOSC 2009) – p. 1
CMake goodies

# Defining a C library build in Biolib:
SET (M_NAME staden_io_lib)
SET (M_VERSION 1.11.6)
FIND_PACKAGE(ZLIB REQUIRED)
BUILD_CLIB()

ADD_LIBRARY(${LIBNAME} SHARED
array.c
compress.c
compression.c
ctfCompress.c
(...)

INSTALL_CLIB()




                                          BioLib Development Report (BOSC 2009) – p. 1
CMake for Perl

# Defining a C library mapping for Perl
SET (USE_ZLIB TRUE)
SET (USE_INCLUDEPATH io_lib)

FIND_PACKAGE(MapPerl)

POST_BUILD_PERL_BINDINGS()
TEST_PERL_BINDINGS()
INSTALL_PERL_BINDINGS()




                                          BioLib Development Report (BOSC 2009) – p. 1
SWIG Map

%include <Read.h>

#define TT_ANY 0
#define TT_ZTR 7

typedef struct
{
    int         format;
    char       *trace_name;
    int         NPoints;
    int         NBases;
    (...)
} Read;

Read *read_reading(char *fn, int format);



                                            BioLib Development Report (BOSC 2009) – p. 1
Perl

use biolib::staden_io_lib;

$result = staden_io_lib::read_reading($fn,
                                      $staden_io_lib::TT_ANY);
print("format=",staden_io_libc::Read_format_get($result));
print("NBases=",$result->{NBases});
print("base=",staden_io_libc::Read_base_get($result));

Outputs:

format=7
NBases=766
base=NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT
     CGGTCCCAACTTAATTGTACA...




                                                     BioLib Development Report (BOSC 2009) – p. 2
Python

import biolib.staden_io_lib as io_lib

result = io_lib.read_reading(procsrffn,
                             io_lib.TT_ANY)
print result.format
print result.NBases
print result.base

7
766
NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT
CGGTCCCAACTTAATTGTACA...




                                              BioLib Development Report (BOSC 2009) – p. 2
For the Perl coder

Adding functionality in language of choice
Easier deployment - ’install biolib-perl’
Shared correctness testing
Generated API documentation




                                       BioLib Development Report (BOSC 2009) – p. 2
For the authors

Independent source trees
Increased exposure (Ruby, Perl. . . )
Added unit/integration testing environment
Deployment, multi-platform support (Linux,
OSX, Windows)
No autoconf pain (./configure and friends)
Implicit access to other libraries (GSL, Rlib)
Online generated API documentation

                                        BioLib Development Report (BOSC 2009) – p. 2
Future work

Automated API documentation (with doctests)
More libraries (Emboss, NCBI, . . . )
New code (HPC)
More languages (JAVA, R, OCaml, . . . )
Bio* integration (CPAN, Ruby gems, Python
eggs)
Debian/Fedora/OSX/Windows packages
More platforms (Windows without Cygwin)

                                        BioLib Development Report (BOSC 2009) – p. 2
Credits

Ben Bolstad (Affyio), James Bonfield (Staden), Karl Broman (R/qtl)

Jonathan Leto (GSL SWIG)

Xin Shuai (Google SoC libsequence)

Adam Smith (Google SoC Bio++)

Oswaldo Trelles, José Manuel Mateos-Duran and Andrés Rodríguez (UMA)

Chris Fields (BioPerl), Mark Jensen (BioPerl), Hilmar Lap (Nescent, OBF)

Jaap Bakker (WU), Geert Smant (WU), Ritsert Jansen (GBIC)




                                                               BioLib Development Report (BOSC 2009) – p. 2
BoF

BioLib: Birds of a Feather Session (BoF) at 16:50 hours




                                                          BioLib Development Report (BOSC 2009) – p. 2

More Related Content

Similar to Prins Bio Lib Bosc 2009

Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
BOSC 2010
 
Biopython Project Update 2013
Biopython Project Update 2013Biopython Project Update 2013
Biopython Project Update 2013
pjacock
 
Antao Biopython Bosc2008
Antao Biopython Bosc2008Antao Biopython Bosc2008
Antao Biopython Bosc2008
bosc_2008
 
The Parrot VM
The Parrot VMThe Parrot VM
The Parrot VM
François Perrad
 
Talk6 biopython bosc2011
Talk6 biopython bosc2011Talk6 biopython bosc2011
Talk6 biopython bosc2011
Bioinformatics Open Source Conference
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
Mark Jensen
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
Mark Jensen
 
Biopython
BiopythonBiopython
Biopython
bosc
 
Open Source .NET
Open Source .NETOpen Source .NET
Open Source .NET
Onyxfish
 
BOSC 2008 Biopython
BOSC 2008 BiopythonBOSC 2008 Biopython
BOSC 2008 Biopython
tiago
 
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
ngotogenome
 
E Talevich - Biopython project-update
E Talevich - Biopython project-updateE Talevich - Biopython project-update
E Talevich - Biopython project-update
Jan Aerts
 
Biopython Project Update (BOSC 2012)
Biopython Project Update (BOSC 2012)Biopython Project Update (BOSC 2012)
Biopython Project Update (BOSC 2012)
Eric Talevich
 
Cape Cod Web Technology Meetup - 3
Cape Cod Web Technology Meetup - 3Cape Cod Web Technology Meetup - 3
Cape Cod Web Technology Meetup - 3
Asher Martin
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
Samuel Lampa
 
From Java to Kotlin - The first month in practice v2
From Java to Kotlin - The first month in practice v2From Java to Kotlin - The first month in practice v2
From Java to Kotlin - The first month in practice v2
StefanTomm
 
R Dz7.5 Overview
R Dz7.5 OverviewR Dz7.5 Overview
R Dz7.5 Overview
CICS ROADSHOW
 
Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native Environments
Gergely Szabó
 
Protelis: Practical Aggregate Programming - Symposium on Applied Computing (S...
Protelis: Practical Aggregate Programming - Symposium on Applied Computing (S...Protelis: Practical Aggregate Programming - Symposium on Applied Computing (S...
Protelis: Practical Aggregate Programming - Symposium on Applied Computing (S...
Danilo Pianini
 
biopython, doctest and makefiles
biopython, doctest and makefilesbiopython, doctest and makefiles
biopython, doctest and makefiles
Giovanni Marco Dall'Olio
 

Similar to Prins Bio Lib Bosc 2009 (20)

Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Biopython Project Update 2013
Biopython Project Update 2013Biopython Project Update 2013
Biopython Project Update 2013
 
Antao Biopython Bosc2008
Antao Biopython Bosc2008Antao Biopython Bosc2008
Antao Biopython Bosc2008
 
The Parrot VM
The Parrot VMThe Parrot VM
The Parrot VM
 
Talk6 biopython bosc2011
Talk6 biopython bosc2011Talk6 biopython bosc2011
Talk6 biopython bosc2011
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)BioPerl (Poster T02, ISMB 2010)
BioPerl (Poster T02, ISMB 2010)
 
Biopython
BiopythonBiopython
Biopython
 
Open Source .NET
Open Source .NETOpen Source .NET
Open Source .NET
 
BOSC 2008 Biopython
BOSC 2008 BiopythonBOSC 2008 Biopython
BOSC 2008 Biopython
 
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
 
E Talevich - Biopython project-update
E Talevich - Biopython project-updateE Talevich - Biopython project-update
E Talevich - Biopython project-update
 
Biopython Project Update (BOSC 2012)
Biopython Project Update (BOSC 2012)Biopython Project Update (BOSC 2012)
Biopython Project Update (BOSC 2012)
 
Cape Cod Web Technology Meetup - 3
Cape Cod Web Technology Meetup - 3Cape Cod Web Technology Meetup - 3
Cape Cod Web Technology Meetup - 3
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
 
From Java to Kotlin - The first month in practice v2
From Java to Kotlin - The first month in practice v2From Java to Kotlin - The first month in practice v2
From Java to Kotlin - The first month in practice v2
 
R Dz7.5 Overview
R Dz7.5 OverviewR Dz7.5 Overview
R Dz7.5 Overview
 
Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native Environments
 
Protelis: Practical Aggregate Programming - Symposium on Applied Computing (S...
Protelis: Practical Aggregate Programming - Symposium on Applied Computing (S...Protelis: Practical Aggregate Programming - Symposium on Applied Computing (S...
Protelis: Practical Aggregate Programming - Symposium on Applied Computing (S...
 
biopython, doctest and makefiles
biopython, doctest and makefilesbiopython, doctest and makefiles
biopython, doctest and makefiles
 

More from bosc

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
bosc
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
bosc
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
bosc
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
bosc
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
bosc
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
bosc
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
bosc
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
bosc
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
bosc
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
bosc
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
bosc
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
bosc
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
bosc
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
bosc
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
bosc
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
bosc
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
bosc
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
bosc
 
Trelles_QnormBOSC2009
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009
bosc
 
Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009
bosc
 

More from bosc (20)

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
 
Trelles_QnormBOSC2009
Trelles_QnormBOSC2009Trelles_QnormBOSC2009
Trelles_QnormBOSC2009
 
Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009
 

Recently uploaded

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 

Recently uploaded (20)

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 

Prins Bio Lib Bosc 2009

  • 1. BioLib Development Report (BOSC 2009) C and C++ libraries for BioPerl, BioJAVA, BioPython, BioRuby. . . Pjotr Prins (pjotr.prins at wur.nl) Wageningen University, Dept. of Nematology; Groningen Bioinformatics Center BioLib Development Report (BOSC 2009) – p.
  • 2. The stated problem Many high-level languages used in Biology (Perl, R, Java. . . ) Duplication of effort in all Bio* efforts - BioPerl, BioConductor, BioJAVA. . . in particular for data IO/parsing/interpretation (Alan’s keynote) BioLib Development Report (BOSC 2009) – p.
  • 3. What if? What if you need some functionality (e.g. linear regression) in Perl, you can Roll your own in Perl (performance?) Bind against existing clib using Perl-XS (ugh) Bind using SWIG (better, but one-off like Perl::GSL) Bind using SWIG with Biolib (all languages) In fact, it may already be there (GSL or Rlib) BioLib Development Report (BOSC 2009) – p.
  • 4. DRY-DRO Do not repeat yourself (DRY) Do not repeat ourselves (DRO) Bio*: BioPerl, BioPython, BioRuby, BioJAVA, BioConductor, BioHaskell, BioCPP, . . . Limited pool of programmers in bioinformatics Usually 2 or 3 competing implementations Use existing implementations BioLib Development Report (BOSC 2009) – p.
  • 5. Why bother? Open Source Software is about eyes BioLib Development Report (BOSC 2009) – p.
  • 6. Eyes! Eyes like these! BioLib Development Report (BOSC 2009) – p.
  • 7. Eyes (3) Eyes like these!. . . BioLib Development Report (BOSC 2009) – p.
  • 8. Eyes (5) Well, realistically. . . BioLib Development Report (BOSC 2009) – p.
  • 9. BioLib project Objectives: Utilize existing C/C++ libraries Create mappings to all Bio* languages Focus on correctness and performance A central place (plumbing) An OBF affiliated project BioLib Development Report (BOSC 2009) – p.
  • 10. Power Trio Plumbing power trio: Git - modular version control Cmake - make file generator SWIG - simplified wrapper and interface generator BioLib Development Report (BOSC 2009) – p. 1
  • 11. Power trio (1) GIT Version control on steroids What source control should be Easy branching of development Submodules BioLib Development Report (BOSC 2009) – p. 1
  • 12. Power trio (2) CMake Generator for make files Very modular approach Resolves complex dependencies Looks like a simple programming language Easy on the eyes and mind BioLib Development Report (BOSC 2009) – p. 1
  • 13. Power trio (3) SWIG Code generator for mappings done right: Rules for generating code Macros (DRY) Pattern matching Flexible Supports many languages BioLib Development Report (BOSC 2009) – p. 1
  • 14. Achievements (year one) Affyio: Affymetrix arrays (357 methods; 10K lines) Staden: Sequencer trace files (95; 16K) GSL: GNU Science Library (2702; 200K) Rlib: R routines (> 176; 43K) R/qtl: Quantitative genetics (> 100; 10K)* Libsequence: Sequence analysis (> 1000; 21K)* Bio++: Sequence analysis (> 1000; 52K)* Code base 350K lines USD 10 million R&D BioLib Development Report (BOSC 2009) – p. 1
  • 15. Source tree |-- clibs | |-- affyio-1.8 | |-- biolib_R | |-- biolib_microarray | |-- libsequence-1.6.6 |-- mappings | ‘-- swig | |-- perl | | |-- affyio | | |-- staden_io_lib | | ‘-- test | |-- python | |-- ruby 104 directories, 668 files BioLib Development Report (BOSC 2009) – p. 1
  • 16. Adding a C lib Unpack C/C++ library in ./src/clibs/modulename Add CMake file - compiles into .so shared library Create Perl mapping in ./src/mapping/swig/perl/module Add SWIG .i file Add CMake file - compiles into .pm and .so shared library BioLib Development Report (BOSC 2009) – p. 1
  • 17. CMake goodies # Defining a C library build in Biolib: SET (M_NAME staden_io_lib) SET (M_VERSION 1.11.6) FIND_PACKAGE(ZLIB REQUIRED) BUILD_CLIB() ADD_LIBRARY(${LIBNAME} SHARED array.c compress.c compression.c ctfCompress.c (...) INSTALL_CLIB() BioLib Development Report (BOSC 2009) – p. 1
  • 18. CMake for Perl # Defining a C library mapping for Perl SET (USE_ZLIB TRUE) SET (USE_INCLUDEPATH io_lib) FIND_PACKAGE(MapPerl) POST_BUILD_PERL_BINDINGS() TEST_PERL_BINDINGS() INSTALL_PERL_BINDINGS() BioLib Development Report (BOSC 2009) – p. 1
  • 19. SWIG Map %include <Read.h> #define TT_ANY 0 #define TT_ZTR 7 typedef struct { int format; char *trace_name; int NPoints; int NBases; (...) } Read; Read *read_reading(char *fn, int format); BioLib Development Report (BOSC 2009) – p. 1
  • 20. Perl use biolib::staden_io_lib; $result = staden_io_lib::read_reading($fn, $staden_io_lib::TT_ANY); print("format=",staden_io_libc::Read_format_get($result)); print("NBases=",$result->{NBases}); print("base=",staden_io_libc::Read_base_get($result)); Outputs: format=7 NBases=766 base=NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT CGGTCCCAACTTAATTGTACA... BioLib Development Report (BOSC 2009) – p. 2
  • 21. Python import biolib.staden_io_lib as io_lib result = io_lib.read_reading(procsrffn, io_lib.TT_ANY) print result.format print result.NBases print result.base 7 766 NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT CGGTCCCAACTTAATTGTACA... BioLib Development Report (BOSC 2009) – p. 2
  • 22. For the Perl coder Adding functionality in language of choice Easier deployment - ’install biolib-perl’ Shared correctness testing Generated API documentation BioLib Development Report (BOSC 2009) – p. 2
  • 23. For the authors Independent source trees Increased exposure (Ruby, Perl. . . ) Added unit/integration testing environment Deployment, multi-platform support (Linux, OSX, Windows) No autoconf pain (./configure and friends) Implicit access to other libraries (GSL, Rlib) Online generated API documentation BioLib Development Report (BOSC 2009) – p. 2
  • 24. Future work Automated API documentation (with doctests) More libraries (Emboss, NCBI, . . . ) New code (HPC) More languages (JAVA, R, OCaml, . . . ) Bio* integration (CPAN, Ruby gems, Python eggs) Debian/Fedora/OSX/Windows packages More platforms (Windows without Cygwin) BioLib Development Report (BOSC 2009) – p. 2
  • 25. Credits Ben Bolstad (Affyio), James Bonfield (Staden), Karl Broman (R/qtl) Jonathan Leto (GSL SWIG) Xin Shuai (Google SoC libsequence) Adam Smith (Google SoC Bio++) Oswaldo Trelles, José Manuel Mateos-Duran and Andrés Rodríguez (UMA) Chris Fields (BioPerl), Mark Jensen (BioPerl), Hilmar Lap (Nescent, OBF) Jaap Bakker (WU), Geert Smant (WU), Ritsert Jansen (GBIC) BioLib Development Report (BOSC 2009) – p. 2
  • 26. BoF BioLib: Birds of a Feather Session (BoF) at 16:50 hours BioLib Development Report (BOSC 2009) – p. 2