SlideShare a Scribd company logo
1 of 20
Download to read offline
Community-driven computational
  biology with Debian and Taverna

          Steffen Möller, Hajo Krabbenhöft (Lübeck)
Alan Williams, Katy Wolstencroft, Carole Goble (Manchester)
   Andreas Tille, Charles Plessy, David Paleino (Debian)




                       BOSC 2010, Boston
        2010, Boston
Motivation
●   Open Source Bioinformatics continues to grow and improve
    ●   steadily increasing number of tools and databases
    ●   addressing more and more complex issues
●   Bioinformatics found entry into wet-lab routine
    ●   strong service units with many diverse projects
    ●   single deeply embedded individuals
●   Wanted:
    ●   Exchange of bioinformatics recipes, as a database or eventually
        linked from papers' method sections
    ●   Reliable, instant-available powerful external resources to perform
        analysis


             2010, Boston
Dual role of Cloud technologies
●   Sharing of physical resources
    ●   Computation
    ●   Storage
●   Sharing of management resources
    ●   Reference Images
    ●   Pre-downloaded, pre-indexed data
        –   Amazon public data sets
        –   “whatever BOSC 2010 agrees on” for our Eucalyptus
            playground


            2010, Boston
How to Co-Maintain Cloud Images
●   Cloud images can be maintained just like regular machines
●   The installation of many tools by many people
    ●   works, you get somewhere, but then you don't want to touch it again
    ●   Is error prone because of inter-dependencies of packages (shared
        files, version incompatibilities)
●   The partial update of such co maintained images
    ●   will most likely break something somewhere → modularity
    ●   you want to know what has been done to an image without a
        dependency on external web pages → introspection




             2010, Boston
How to Co-Maintain Cloud Images
Wanted:
●   Mechanism to allow the individual upgrading of
    software tools and integrity checks
●   Sharing of the effort
    –   to compile the source code – one wants to install the
        binaries only whenever possible
    –   to describe the packages – should be of little overhead or
        be already available
    This is basically what Linux distributions do.



        2010, Boston
Dual role of Debian
●   Package provider
    ●   many tens of thousands packages are offered
        –   directly as a Linux distribution
        –   indirectly via descendents Ubuntu or BioLinux
    ●   technical excellence
        –   coherent builds across many platforms (PowerPC, Intel 32 and 64 bit, AMD,
            MIPS) and Kernels (Linux, HURD, BSD, OpenSolaris)
        –   separation of documentation from binaries, GUI from command line, ...
●   Community
    ●   bug reports
    ●   mailing Lists, special interest groups, you may discuss
        –   packages that are missing
        –   problems that many of us have that are yet unsolved


              2010, Boston
bioinformatics blend
●   subversion and git repositories for packages
●   friendly and open community
●   keen on close links with upstream
●   Series of tasks within Debian Med – not only bioinformatics:
          Biology - Debian Med micro-biology packages
          Biology development - Debian Med packages for development of micro-biology applications
          Content management - Debian Med content management systems
          Medical data - Debian Med suggestions for medical databases
          Dental - Debian Med packages related to dental practice
          Epidemiology - Debian Med epidemiology related packages
          Hospital information systems - Debian Med suggestions for Hospital Information Systems
          Imaging - Cross-platform for visualizing, processing and analysing of bioimages
          Imaging development - Debian Med packages for medical image development
          Laboratory - Debian Med suggestions for medical laboratories
          Pharmacy - Debian Med packages for pharmaceutical research
          Physics - Debian Med packages for medical physicists
          Practice - Debian Med packages for practice management
          Psychology - Debian Med packages for psychology
          Statistics - Debian Med statistics
          Tools - Debian Med several tools
          Typesetting - Debian Med support for typesetting and publishing
           2010, Boston
How to Co-Maintain a Debian Package
 ●   Technically
     ●   Do not touch the original source tree
     ●   Create folder “debian” with files
         –   “control” - description of package + build deps
         –   “changelog” - version of package and what's new
         –   “rules” - how to say “make” and “make install”
         –   “install” - to split documentation from the rest
         Should not be more difficult than executing “make all” directly, contact me or
         the list when running into problems.
     ●   FTP-upload of package to distribution's server
     ●   Sharing of “debian” folder with community with subversion/git/bazaar
 ●   Community-driven security
     ●   Web of trust: Creator of package signs with his GPG key prior to upload,
         GPG key is signed by others
     ●   Bug reports may block transition of package to “stable” release

                 2010, Boston
Something's missing
●   We now have the resources.
    ●   packages that auto-transform into Cloud images
    ●   machines and disk to compute and store in-/output
●   We have quite some Bio* community
●   Wanted:
    ●   Linking of cloud resources with the desktop
    ●   Linking of web resources into it
    ●   Exchange and reference of
        –   Inter-package
        –   Inter-resource
        processes that (have) work(ed for someone) and may be adapted


              2010, Boston
Dual role of Taverna
●   Technology:
    ●   Connects files, web services and applications to
        workflows
    ●   Workflows may comprise other workflows
●   Community:
        Portal to complete
        and partial solutions
        as workflows on
        myExperiment.org



          2010, Boston
Taverna integrates command line
●   Any command executed in the shell can be
    integrated
    ●   local execution, remote execution with ssh or grid
    ●   nicely links clouds, packages and web
●   Introduction of UseCases as workflow elements
    ●   Database with XML-specification of
        –   Inputs, Outputs and their MIME types
        –   Commmand line and tools it needs
    ●   Purpose-specific wrappers around binaries or scripts

            2010, Boston     Krabbenhöft et al., Bioinformatics, 2008
Shared UseCase management




 2010, Boston
Example: Clustering many sequences
 ●      Compute times of several hours are generally
        not acceptable for public web services
 ●      Not a problem with integrated clouds

                                                     Inform
           Cloud
Local




                               Start                Taverna                   Results
           Image
                             instance                 about                Interpretation
          Selection
                                                   IP number
                                                               Workflow
                                                               Execution

                                        apt-get
Cloud




                                         install
                                        t-coffee



              2010, Boston
Remaining challenge:
                  sharing public data
●   Could work like the management of software, but
    ●   Often large with frequent updates
          users differ in their demands for latest versions
    ●   Involves post-processing
          users differ in their demand to perform such
●   Clouds could help, but
    ●   one would not want to pay for everything all the time
    ●   the installation process would need to be transparent to locally
        recreate or update or … improve the data



            2010, Boston
Proposal: getData, a shared Perl script
 ●   The script is a large hash table
     ●   extendable by configuration files that may be contributed from
         various packages, like EMBOSS
     ●   Every entry comprises another hash table with attributes
         –   Name – full name of database
         –   Source – how to retrieve it
         –   Post-download – what to do once it has arrived
         –   Recommends – tools suggested to install with the data
 ●   All very simple and extendable
     ●   Direct mirroring of effort performed on the command line
     ●   The community can co-maintain this script more easily than
         some cloud instance
     ●   More on http://wiki.debian.org/getData
              2010, Boston
Summary
●   Debian as community and repository for
    bioinformatics software
    ●   Mailing lists, source code management
    ●   FTP servers
●   Clouds introduce dynamics into the collaboration
    ●   Data flow between packages
    ●   Usability
    ●   Shared maintenance of public data
●   Taverna
    ●   Connects web, grid, cloud instances and local machine
    ●   Fosters exchange of experiences with various workflows
            2010, Boston
References and Acknowledgements
[1] Debian-Med       http://debian-med.alioth.debian.org
[2] getData          http://wiki.debian.org/getData
[3] Eucalyptus       http://www.eucalyptus.com
[4] Taverna          http://www.taverna.org.uk
[5] Taverna UseCases http://taverna.nordugrid.org
[6] myExperiment     http://www.myExperiment.org
[7] Eucalyptus       http://www.eucalyptus.com
The development of the UseCass plugin to Taverna was funded by the
“KnowARC” EU project.




          2010, Boston
Debian/Ubuntu contributes
●   Impressive number of packages
    ●   Bioinformatics (Bio*, EMBOSS, clustering, ...)
    ●   Cheminformatics (autodock, gromacs, ballview, …)
    ●   General scientific computing tools and libraries
        –   Clustering (Torque, Sun Grid Engine, ...)
        –   Eucalyptus Cloud environment
●   Automation of database updates and indexing
    with the “getData” script



            2010, Boston
Concept: Distro+Workflows+Cloud
●   Debian/Ubuntu Linux Distribution
    ●   Chem- + Bioinformatics packages
    ●   Friendly Community
●   Taverna Workflow Suite
    ●   Access to services in the web
    ●   Access to command line tools via ssh or grids
    ●   Exchange of ideas via myExperiment.org
●   Eucalyptus or Amazon Clouds
    ●   Sharing of databases and indices
    ●   Readily available or customized images to instantiate

           2010, Boston
The Cloud contributes
A platform for individuals to share
●   Data (“download only once”)
●   Its management (“update and index only once”)
●   Experiences (“I show you”)
Physical resources
●   To be shared in community (“common cluster”)
●   To be bought on demand (“run at Amazon.com”)
Solutions
●   Readily usable images – by community or industry
●   Adaptability to local demands

       2010, Boston

More Related Content

Viewers also liked

2 de versie 4de lesdag kindfactoren
2 de versie 4de lesdag kindfactoren2 de versie 4de lesdag kindfactoren
2 de versie 4de lesdag kindfactorenCVO-SSH
 
mobility programs for education
mobility programs for educationmobility programs for education
mobility programs for educationRosario Outes
 
iTunesU: iGlue for iPad Learning
iTunesU: iGlue for iPad LearningiTunesU: iGlue for iPad Learning
iTunesU: iGlue for iPad LearningKevin Amboe
 
Drupal theming intro
Drupal theming introDrupal theming intro
Drupal theming introtlattimore
 
Westweaves Profile
Westweaves ProfileWestweaves Profile
Westweaves Profileanantdamani
 
Library Resources in Health Informatics
Library Resources in Health InformaticsLibrary Resources in Health Informatics
Library Resources in Health InformaticsNaz Torabi
 
Primera visita de la Fundacion a Tetouan
Primera visita de la Fundacion a TetouanPrimera visita de la Fundacion a Tetouan
Primera visita de la Fundacion a TetouanNuriajimenez
 
CUEBC 2013: Authentic assessment: digital storytelling - apps that transform ...
CUEBC 2013: Authentic assessment: digital storytelling - apps that transform ...CUEBC 2013: Authentic assessment: digital storytelling - apps that transform ...
CUEBC 2013: Authentic assessment: digital storytelling - apps that transform ...Kevin Amboe
 
iPad integration through an assessment lens
iPad integration through an assessment lensiPad integration through an assessment lens
iPad integration through an assessment lensKevin Amboe
 
Finesse 12 18th aug 2013
Finesse 12 18th aug 2013Finesse 12 18th aug 2013
Finesse 12 18th aug 2013Rishi Kashyap
 
Cuenca Move On 2015: Innovación en el aprendizaje bilingüe/CLIL by L. Davison...
Cuenca Move On 2015: Innovación en el aprendizaje bilingüe/CLIL by L. Davison...Cuenca Move On 2015: Innovación en el aprendizaje bilingüe/CLIL by L. Davison...
Cuenca Move On 2015: Innovación en el aprendizaje bilingüe/CLIL by L. Davison...Rosario Outes
 
Sv code camp-slides-2011
Sv code camp-slides-2011Sv code camp-slides-2011
Sv code camp-slides-2011Todd Davies
 
NRTEE: David McLaughlin
NRTEE: David McLaughlinNRTEE: David McLaughlin
NRTEE: David McLaughlinIzabela Popova
 

Viewers also liked (20)

2 de versie 4de lesdag kindfactoren
2 de versie 4de lesdag kindfactoren2 de versie 4de lesdag kindfactoren
2 de versie 4de lesdag kindfactoren
 
mobility programs for education
mobility programs for educationmobility programs for education
mobility programs for education
 
iTunesU: iGlue for iPad Learning
iTunesU: iGlue for iPad LearningiTunesU: iGlue for iPad Learning
iTunesU: iGlue for iPad Learning
 
Era digital
Era digitalEra digital
Era digital
 
Drupal theming intro
Drupal theming introDrupal theming intro
Drupal theming intro
 
Westweaves Profile
Westweaves ProfileWestweaves Profile
Westweaves Profile
 
Sales transformation management
Sales transformation managementSales transformation management
Sales transformation management
 
Library Resources in Health Informatics
Library Resources in Health InformaticsLibrary Resources in Health Informatics
Library Resources in Health Informatics
 
Primera visita de la Fundacion a Tetouan
Primera visita de la Fundacion a TetouanPrimera visita de la Fundacion a Tetouan
Primera visita de la Fundacion a Tetouan
 
Portfolio acadêmico
Portfolio acadêmicoPortfolio acadêmico
Portfolio acadêmico
 
CUEBC 2013: Authentic assessment: digital storytelling - apps that transform ...
CUEBC 2013: Authentic assessment: digital storytelling - apps that transform ...CUEBC 2013: Authentic assessment: digital storytelling - apps that transform ...
CUEBC 2013: Authentic assessment: digital storytelling - apps that transform ...
 
1.1 Manuele Margni
1.1 Manuele Margni1.1 Manuele Margni
1.1 Manuele Margni
 
Terrestrial Support of Aquatic Food Webs
Terrestrial Support of Aquatic Food WebsTerrestrial Support of Aquatic Food Webs
Terrestrial Support of Aquatic Food Webs
 
iPad integration through an assessment lens
iPad integration through an assessment lensiPad integration through an assessment lens
iPad integration through an assessment lens
 
Finesse 12 18th aug 2013
Finesse 12 18th aug 2013Finesse 12 18th aug 2013
Finesse 12 18th aug 2013
 
Cuenca Move On 2015: Innovación en el aprendizaje bilingüe/CLIL by L. Davison...
Cuenca Move On 2015: Innovación en el aprendizaje bilingüe/CLIL by L. Davison...Cuenca Move On 2015: Innovación en el aprendizaje bilingüe/CLIL by L. Davison...
Cuenca Move On 2015: Innovación en el aprendizaje bilingüe/CLIL by L. Davison...
 
Ibe presentation sept 2011
Ibe presentation sept 2011Ibe presentation sept 2011
Ibe presentation sept 2011
 
Sv code camp-slides-2011
Sv code camp-slides-2011Sv code camp-slides-2011
Sv code camp-slides-2011
 
Grammar
GrammarGrammar
Grammar
 
NRTEE: David McLaughlin
NRTEE: David McLaughlinNRTEE: David McLaughlin
NRTEE: David McLaughlin
 

Similar to Moeller bosc2010 debian_taverna

Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuseMatthew Vaughn
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with condaTravis Oliphant
 
Large-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopLarge-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopEvert Lammerts
 
Bigdata ready reference
Bigdata ready referenceBigdata ready reference
Bigdata ready referenceHelly Patel
 
Desktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDesktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDavid Wallom
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systemsSri Prasanna
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014Hojoong Kim
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes vty
 
Puppet Keynote by Ralph Luchs
Puppet Keynote by Ralph LuchsPuppet Keynote by Ralph Luchs
Puppet Keynote by Ralph LuchsNETWAYS
 
Reproducibility in artificial intelligence
Reproducibility in artificial intelligenceReproducibility in artificial intelligence
Reproducibility in artificial intelligenceCarlos Toxtli
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformaticsStephen Turner
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009bosc
 
The Source Control Landscape
The Source Control LandscapeThe Source Control Landscape
The Source Control LandscapeLorna Mitchell
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 

Similar to Moeller bosc2010 debian_taverna (20)

G04-Misc-Debianmed
G04-Misc-DebianmedG04-Misc-Debianmed
G04-Misc-Debianmed
 
Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuse
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
 
Large-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopLarge-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with Hadoop
 
Bigdata ready reference
Bigdata ready referenceBigdata ready reference
Bigdata ready reference
 
Desktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDesktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omics
 
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinuxF02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systems
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 
Puppet Keynote by Ralph Luchs
Puppet Keynote by Ralph LuchsPuppet Keynote by Ralph Luchs
Puppet Keynote by Ralph Luchs
 
Reproducibility in artificial intelligence
Reproducibility in artificial intelligenceReproducibility in artificial intelligence
Reproducibility in artificial intelligence
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Worldwide Deployment
Worldwide DeploymentWorldwide Deployment
Worldwide Deployment
 
Bosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-fullBosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-full
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
 
The Source Control Landscape
The Source Control LandscapeThe Source Control Landscape
The Source Control Landscape
 
Collabograte
CollabograteCollabograte
Collabograte
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 

More from BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsBOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesBOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 embossBOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evokerBOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorBOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisBOSC 2010
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorBOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfBOSC 2010
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsBOSC 2010
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perlBOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopythonBOSC 2010
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaBOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytowebBOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloBOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptxBOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiBOSC 2010
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitBOSC 2010
 

More from BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Moeller bosc2010 debian_taverna

  • 1. Community-driven computational biology with Debian and Taverna Steffen Möller, Hajo Krabbenhöft (Lübeck) Alan Williams, Katy Wolstencroft, Carole Goble (Manchester) Andreas Tille, Charles Plessy, David Paleino (Debian) BOSC 2010, Boston 2010, Boston
  • 2. Motivation ● Open Source Bioinformatics continues to grow and improve ● steadily increasing number of tools and databases ● addressing more and more complex issues ● Bioinformatics found entry into wet-lab routine ● strong service units with many diverse projects ● single deeply embedded individuals ● Wanted: ● Exchange of bioinformatics recipes, as a database or eventually linked from papers' method sections ● Reliable, instant-available powerful external resources to perform analysis 2010, Boston
  • 3. Dual role of Cloud technologies ● Sharing of physical resources ● Computation ● Storage ● Sharing of management resources ● Reference Images ● Pre-downloaded, pre-indexed data – Amazon public data sets – “whatever BOSC 2010 agrees on” for our Eucalyptus playground 2010, Boston
  • 4. How to Co-Maintain Cloud Images ● Cloud images can be maintained just like regular machines ● The installation of many tools by many people ● works, you get somewhere, but then you don't want to touch it again ● Is error prone because of inter-dependencies of packages (shared files, version incompatibilities) ● The partial update of such co maintained images ● will most likely break something somewhere → modularity ● you want to know what has been done to an image without a dependency on external web pages → introspection 2010, Boston
  • 5. How to Co-Maintain Cloud Images Wanted: ● Mechanism to allow the individual upgrading of software tools and integrity checks ● Sharing of the effort – to compile the source code – one wants to install the binaries only whenever possible – to describe the packages – should be of little overhead or be already available This is basically what Linux distributions do. 2010, Boston
  • 6. Dual role of Debian ● Package provider ● many tens of thousands packages are offered – directly as a Linux distribution – indirectly via descendents Ubuntu or BioLinux ● technical excellence – coherent builds across many platforms (PowerPC, Intel 32 and 64 bit, AMD, MIPS) and Kernels (Linux, HURD, BSD, OpenSolaris) – separation of documentation from binaries, GUI from command line, ... ● Community ● bug reports ● mailing Lists, special interest groups, you may discuss – packages that are missing – problems that many of us have that are yet unsolved 2010, Boston
  • 7. bioinformatics blend ● subversion and git repositories for packages ● friendly and open community ● keen on close links with upstream ● Series of tasks within Debian Med – not only bioinformatics: Biology - Debian Med micro-biology packages Biology development - Debian Med packages for development of micro-biology applications Content management - Debian Med content management systems Medical data - Debian Med suggestions for medical databases Dental - Debian Med packages related to dental practice Epidemiology - Debian Med epidemiology related packages Hospital information systems - Debian Med suggestions for Hospital Information Systems Imaging - Cross-platform for visualizing, processing and analysing of bioimages Imaging development - Debian Med packages for medical image development Laboratory - Debian Med suggestions for medical laboratories Pharmacy - Debian Med packages for pharmaceutical research Physics - Debian Med packages for medical physicists Practice - Debian Med packages for practice management Psychology - Debian Med packages for psychology Statistics - Debian Med statistics Tools - Debian Med several tools Typesetting - Debian Med support for typesetting and publishing 2010, Boston
  • 8. How to Co-Maintain a Debian Package ● Technically ● Do not touch the original source tree ● Create folder “debian” with files – “control” - description of package + build deps – “changelog” - version of package and what's new – “rules” - how to say “make” and “make install” – “install” - to split documentation from the rest Should not be more difficult than executing “make all” directly, contact me or the list when running into problems. ● FTP-upload of package to distribution's server ● Sharing of “debian” folder with community with subversion/git/bazaar ● Community-driven security ● Web of trust: Creator of package signs with his GPG key prior to upload, GPG key is signed by others ● Bug reports may block transition of package to “stable” release 2010, Boston
  • 9. Something's missing ● We now have the resources. ● packages that auto-transform into Cloud images ● machines and disk to compute and store in-/output ● We have quite some Bio* community ● Wanted: ● Linking of cloud resources with the desktop ● Linking of web resources into it ● Exchange and reference of – Inter-package – Inter-resource processes that (have) work(ed for someone) and may be adapted 2010, Boston
  • 10. Dual role of Taverna ● Technology: ● Connects files, web services and applications to workflows ● Workflows may comprise other workflows ● Community: Portal to complete and partial solutions as workflows on myExperiment.org 2010, Boston
  • 11. Taverna integrates command line ● Any command executed in the shell can be integrated ● local execution, remote execution with ssh or grid ● nicely links clouds, packages and web ● Introduction of UseCases as workflow elements ● Database with XML-specification of – Inputs, Outputs and their MIME types – Commmand line and tools it needs ● Purpose-specific wrappers around binaries or scripts 2010, Boston Krabbenhöft et al., Bioinformatics, 2008
  • 12. Shared UseCase management 2010, Boston
  • 13. Example: Clustering many sequences ● Compute times of several hours are generally not acceptable for public web services ● Not a problem with integrated clouds Inform Cloud Local Start Taverna Results Image instance about Interpretation Selection IP number Workflow Execution apt-get Cloud install t-coffee 2010, Boston
  • 14. Remaining challenge: sharing public data ● Could work like the management of software, but ● Often large with frequent updates users differ in their demands for latest versions ● Involves post-processing users differ in their demand to perform such ● Clouds could help, but ● one would not want to pay for everything all the time ● the installation process would need to be transparent to locally recreate or update or … improve the data 2010, Boston
  • 15. Proposal: getData, a shared Perl script ● The script is a large hash table ● extendable by configuration files that may be contributed from various packages, like EMBOSS ● Every entry comprises another hash table with attributes – Name – full name of database – Source – how to retrieve it – Post-download – what to do once it has arrived – Recommends – tools suggested to install with the data ● All very simple and extendable ● Direct mirroring of effort performed on the command line ● The community can co-maintain this script more easily than some cloud instance ● More on http://wiki.debian.org/getData 2010, Boston
  • 16. Summary ● Debian as community and repository for bioinformatics software ● Mailing lists, source code management ● FTP servers ● Clouds introduce dynamics into the collaboration ● Data flow between packages ● Usability ● Shared maintenance of public data ● Taverna ● Connects web, grid, cloud instances and local machine ● Fosters exchange of experiences with various workflows 2010, Boston
  • 17. References and Acknowledgements [1] Debian-Med http://debian-med.alioth.debian.org [2] getData http://wiki.debian.org/getData [3] Eucalyptus http://www.eucalyptus.com [4] Taverna http://www.taverna.org.uk [5] Taverna UseCases http://taverna.nordugrid.org [6] myExperiment http://www.myExperiment.org [7] Eucalyptus http://www.eucalyptus.com The development of the UseCass plugin to Taverna was funded by the “KnowARC” EU project. 2010, Boston
  • 18. Debian/Ubuntu contributes ● Impressive number of packages ● Bioinformatics (Bio*, EMBOSS, clustering, ...) ● Cheminformatics (autodock, gromacs, ballview, …) ● General scientific computing tools and libraries – Clustering (Torque, Sun Grid Engine, ...) – Eucalyptus Cloud environment ● Automation of database updates and indexing with the “getData” script 2010, Boston
  • 19. Concept: Distro+Workflows+Cloud ● Debian/Ubuntu Linux Distribution ● Chem- + Bioinformatics packages ● Friendly Community ● Taverna Workflow Suite ● Access to services in the web ● Access to command line tools via ssh or grids ● Exchange of ideas via myExperiment.org ● Eucalyptus or Amazon Clouds ● Sharing of databases and indices ● Readily available or customized images to instantiate 2010, Boston
  • 20. The Cloud contributes A platform for individuals to share ● Data (“download only once”) ● Its management (“update and index only once”) ● Experiences (“I show you”) Physical resources ● To be shared in community (“common cluster”) ● To be bought on demand (“run at Amazon.com”) Solutions ● Readily usable images – by community or industry ● Adaptability to local demands 2010, Boston