Debian Med in 2011




            Steffen Möller
                for the
       Debian Med Community



Bioinformatics Open Source Conference
              Vienna, 2011
The Challenge – a technical view
●   Increasing specialisation of tools and databases
    ●   Access to larger number of resources
    ●   Frequent external updates
●   Increase in local administrative work
    ●   Project-specific installations
    ●   Differential or Meta-analyses
●   Increasing platform diversity
    ●   Local vs cloud
    ●   Local vs mobile
The Challenge – a community
                    perspective
●   Work sharing
    ●   Somebody somewhere has installed what you are about to install
    ●   Packaging is tremendously helpful even when you do it just for yourself
    ●   The distribution of such packages allows other researchers profit from your effort
●   Skill sharing
    ●   Not everyone knows how to install everything correctly
    ●   Tutorials / Mailing list discussions shall refer to identically installed packages
●   Collaborative Outreach
    ●   Computational biologists get instant access to software packages
        –   Working out of the box
        –   Compiling out of the box
    ●   Debian Med has professional non-scientists sending patches to the researchers
Science and the role of a Linux
                 Distribution
●   Publication of findings
    ●   World-wide distribution of methods
    ●   Authorships and publications are forwarded in package
        descriptions
●   Collaboration
    ●   Software packages “meet” in the distribution
    ●   Availability of source code as invitation to contribute
●   Education
    ●   Reach out to students at every level
    ●   No black boxes
“What is special about Debian (Med)?”
 ●   We are not special
     ●   active support of downstream distributions, e.g. Ubuntu
     ●   the converse
 ●   Debian can be you!
     ●   open to everyone
     ●   training “on the job” to get your packages in
     ●   finding volunteers to maintain packages for you
 ●   Med Community support
     ●   shared package maintenance via subversion / git
     ●   portal to bug reports, biological packages
     ●   “ontology-like” tagging of packages
What happened since BOSC 2010
●   Many new packages
        NGS (qiime), Ensembl, Blast+, gbrowse
●   Many updates
        Autodocktools, BALLView, Bio*
●   Many new contributors
●   Many new users
●   Bio Cloud environments using Debian
    ●   GSoC project for “Cluster in Cloud” worked with Torque
    ●   Basic cloud images with Debian/Ubuntu became a commodity
●   Debian Med Sprint on Bioinformatics (January)
    ●   Closed loop for packaging with NERC Bio-Linux
    ●   Close ties with Taverna and Eagle Genomics
Steady increase in new users
●   Showing Debian graphs from popcon.debian.org, for
    Ubuntu multiply by 8
●   An average package is installed more frequently than
    there are participants at BOSC+ISMB together
●   Separate listing of recent installations / recent use

                                                  R/qtl ~700




                                                clustalw ~240


                                                       mafft ~240
Inter-institutional team building
●   Maintenance by active contributor to source code
    ●   true interest in bug reports
    ●   barrier free talking to providers of external libraries
    ●   immediate feedback on incompatibilities
●   Community
    ●   influx of skills with every package supported
    ●   volunteers address the details in packaging work
         –   Translations
         –   Format changes ...
    ●   fluent transition between power user and developer
    ●   appeal to volunteers
         –   Packages can be immediately modified and rebuilt
         –   Students at all levels and non-scientists may contribute
Ongoing development
●   Packaging of classical Java developments
    ●   Taverna's build dependencies still are not all in Debian – not in the right
        version, this means
    ●   similarly for Jalview – Java is difficult because of jar files shipping with
        source code
●   Establishment of complete workflows
    ●   closing gaps on tutorials
    ●   packages missing for
        –   In silico docking
        –   Automated genome assembly and annotation
●   Data management
    ●   BioMaj – very nice GUI application
    ●   getData – may be first to link data with Debian packages
“Not even source” packages
●   Some tools like VMD or Rosetta are surprisingly not in Debian, even
    though they allow the download of source code
●   But they don't allow to redistribute anything, which is sad since these
    tools are not straight-forward to compile
       –   Compiler version likely to be newer than expected
       –   Libraries like BOOST may have deprecated a function
       –   Source code may require patches that are only found in discussion forums
●   Debian Med hence hosts only the automated build instructions
●   Ironically, this “not even source” concept was first adopted by CERN,
    where they have Debian packages for their particle analysis tools - in
    reaction to the BOSC abstract
What's next
●   We need those problem solving workflows running
    smoothly
●   Hoping to bring the Medical and Biological parts of
    Debian Med closer together
    ●   Ontologies (nothing in Debian yet, again it's Java) and
        reasoning
    ●   Phenotyping of individuals and respective data
        management
●   Finding more bioinformatics groups already using
    Debian (or its derivatives) to help reducing
    redundancies and work with us all
Visit Debian Med on
http://debian-med.alioth.debian.org
http://wiki.debian.org/DebianMed

or just use Debian, Ubuntu or any of its derived
distributions
    at home or work
    directly, virtually or in the cloud

G04-Misc-Debianmed

  • 1.
    Debian Med in2011 Steffen Möller for the Debian Med Community Bioinformatics Open Source Conference Vienna, 2011
  • 2.
    The Challenge –a technical view ● Increasing specialisation of tools and databases ● Access to larger number of resources ● Frequent external updates ● Increase in local administrative work ● Project-specific installations ● Differential or Meta-analyses ● Increasing platform diversity ● Local vs cloud ● Local vs mobile
  • 3.
    The Challenge –a community perspective ● Work sharing ● Somebody somewhere has installed what you are about to install ● Packaging is tremendously helpful even when you do it just for yourself ● The distribution of such packages allows other researchers profit from your effort ● Skill sharing ● Not everyone knows how to install everything correctly ● Tutorials / Mailing list discussions shall refer to identically installed packages ● Collaborative Outreach ● Computational biologists get instant access to software packages – Working out of the box – Compiling out of the box ● Debian Med has professional non-scientists sending patches to the researchers
  • 4.
    Science and therole of a Linux Distribution ● Publication of findings ● World-wide distribution of methods ● Authorships and publications are forwarded in package descriptions ● Collaboration ● Software packages “meet” in the distribution ● Availability of source code as invitation to contribute ● Education ● Reach out to students at every level ● No black boxes
  • 5.
    “What is specialabout Debian (Med)?” ● We are not special ● active support of downstream distributions, e.g. Ubuntu ● the converse ● Debian can be you! ● open to everyone ● training “on the job” to get your packages in ● finding volunteers to maintain packages for you ● Med Community support ● shared package maintenance via subversion / git ● portal to bug reports, biological packages ● “ontology-like” tagging of packages
  • 6.
    What happened sinceBOSC 2010 ● Many new packages NGS (qiime), Ensembl, Blast+, gbrowse ● Many updates Autodocktools, BALLView, Bio* ● Many new contributors ● Many new users ● Bio Cloud environments using Debian ● GSoC project for “Cluster in Cloud” worked with Torque ● Basic cloud images with Debian/Ubuntu became a commodity ● Debian Med Sprint on Bioinformatics (January) ● Closed loop for packaging with NERC Bio-Linux ● Close ties with Taverna and Eagle Genomics
  • 7.
    Steady increase innew users ● Showing Debian graphs from popcon.debian.org, for Ubuntu multiply by 8 ● An average package is installed more frequently than there are participants at BOSC+ISMB together ● Separate listing of recent installations / recent use R/qtl ~700 clustalw ~240 mafft ~240
  • 8.
    Inter-institutional team building ● Maintenance by active contributor to source code ● true interest in bug reports ● barrier free talking to providers of external libraries ● immediate feedback on incompatibilities ● Community ● influx of skills with every package supported ● volunteers address the details in packaging work – Translations – Format changes ... ● fluent transition between power user and developer ● appeal to volunteers – Packages can be immediately modified and rebuilt – Students at all levels and non-scientists may contribute
  • 9.
    Ongoing development ● Packaging of classical Java developments ● Taverna's build dependencies still are not all in Debian – not in the right version, this means ● similarly for Jalview – Java is difficult because of jar files shipping with source code ● Establishment of complete workflows ● closing gaps on tutorials ● packages missing for – In silico docking – Automated genome assembly and annotation ● Data management ● BioMaj – very nice GUI application ● getData – may be first to link data with Debian packages
  • 10.
    “Not even source”packages ● Some tools like VMD or Rosetta are surprisingly not in Debian, even though they allow the download of source code ● But they don't allow to redistribute anything, which is sad since these tools are not straight-forward to compile – Compiler version likely to be newer than expected – Libraries like BOOST may have deprecated a function – Source code may require patches that are only found in discussion forums ● Debian Med hence hosts only the automated build instructions ● Ironically, this “not even source” concept was first adopted by CERN, where they have Debian packages for their particle analysis tools - in reaction to the BOSC abstract
  • 11.
    What's next ● We need those problem solving workflows running smoothly ● Hoping to bring the Medical and Biological parts of Debian Med closer together ● Ontologies (nothing in Debian yet, again it's Java) and reasoning ● Phenotyping of individuals and respective data management ● Finding more bioinformatics groups already using Debian (or its derivatives) to help reducing redundancies and work with us all
  • 12.
    Visit Debian Medon http://debian-med.alioth.debian.org http://wiki.debian.org/DebianMed or just use Debian, Ubuntu or any of its derived distributions at home or work directly, virtually or in the cloud