Phenotype Foundation Why? What? How? Kees van Bochove, 6 september 2011
Multiple instances: NMC, NuGO, TNO…
A few screenshots…
Example of a studytimeline
Visualization – PPS1 paper figure
Galaxy (toolbox / visualization)
Core reasons Governance of dbNP - now dbXP Communication – many project, many partners Professionalize open source development Guarantee sustainability of software and data Represent dbNP to outside
Mission statement The Phenotype Foundation mission is to empower scientists, within various communities of practice, with standardized software and knowledge stores to store, manage, and retrieve information and data on genotypes and phenotypes.
My reasons To find more testers! Engage more biologists. One stop place forourefforts – currently all over the web Professionalize open sourcedevelopment – itshouldbe easy, not hard, to join in Deployment issues – configuring servers, databases… who is responsible? Sustainability!
Testing time Estimates of testing time formoderately complex GUI applications as percentage of project time range from 30% to 60% Whichmeansthatjusthiringdevelopers is notnearlyenough to accomplish a succesful software development project And… testing is the most complex part of the project to manage!
Current state ofdbXPdocumentation
Current state of documentation Source code: http://trac.nbic.nl Infrastructure: http://nmcdsp.org (Nexus server) Website http://dbnp.org, http://trac.nbic.nl, http://wiki.nbic.nl
Advantages of open source- in theory Lowers adaptation barrier, thereby improving collaboration and standardization Transparency and clarity presumably lead to higher quality of code Actually, open source is the only legally feasible way of sharing program code across multiple short-lived projects, organizations and institutions
Advantages of open source- the reality in bioinformatics The 'open source hype' has lead to countless bioinformatics open source tools, some of them mere 'code dumps', others highly active collaborative projects, with or without a (large) user base, in many different programming languages Many projects are interesting, but have no documentation, or the source code is in bad shape (lack of software engineering skills among bioinformaticians) Abundance of non-interoperable web-enabled databases further complicates this picture
Case study: Plone An open source CMS (Content Management System) written in Python, nowadays backing thousands of production grade websites Started by 2 developers in 2000, now an active open source project with hundreds of active developers In 2004, the Plone Foundation was formed to formalize IP and secure the future of Plone
Plone - how it works, legally The Plone Foundation is owner of code and serves to protect and promote Plone. It guarantees that Plone will always be available under an OSI approved license Developers willing to contribute to the Plone core need to sign a contributors agreement, transferring IP of their contributions to the Foundation. In return, they get irrevocable rights to use and distribute their contributions The Foundation is governed by a Board of Directors, which is elected yearly by the members of the Foundation Only 1 person is employed by the Foundation: Release Manager
Geir Baekholt,past president of the Plone Foundation "The Plone Foundation is a meritocracy"
Plone, in practice The active community of developers is the basis of the success of Plone. Some are hobbyists, but many also work for companies There is a large Plone Collective (marketplace for Plone add-ons). Add-ons are not subject to the contributors agreement (although of course should comply with the Plone license if they import Plone code) Contrary to popular belief, developers in open source projects such as Apache or Plone meet regularly face to face: ApacheCon, Plone Developer Days, sprints..
Plone? Plone Foundation, Apache Software Foundation, Open Bioinformatics Foundation etc.all have onecrucial detail: all members and the board are programmers In Phenotype Foundation, we have 2 or even3 different types of peoplethat are bothstakeholders: programmers and scientists (and senior management)
And how does Galaxy do it? Penn State University + Emoryuniversity have the lead and are the only core committers! Ifyou have suggestionsforimprovement of the code, youcansuggestthemusing a ‘pull request’ Creation of a ‘toolshed’ wherebioinformaticianscanuploadtheirtools
Beware the Empty Chair! Especially if we start integrating different projects, we have to make sure someone is responsible for maintaining a consistent end user experience. The EmptyChair is a metaphorfrom the book "Adrenaline Junkies and Template Zombies"
Beware mañana Any goal furtherawaythan 90 days is not urgent enough to do anythingabouttoday. Sowhat is ournext step? From "Adrenaline Junkies and Template Zombies"
caBIG – WG recommendations Create a ScientificAdvisory Group of scientific, technology and informatics experts Focus onscientificmission, notoncreating a ‘software brand’ Semantic data integration (interoperability) is very important Create a legalframeworkfor data sharing
Conclusions – programmerperspective At this moment, communicationbetweenprogrammersgoesreasonablywell, because the samepeople are involved But we are notreadyfor the future For outsiders, it is hard to findourcommunity and at this moment impossible to join without a lot of insideknowledge We need to improvethisif we don’t want to end up on the ‘graveyard’!
Foundation outline Board of Directors TechnicalCommunity Board member(s) (liason to board) Release manager Database managers Transfer of code ownership to the Foundation? ScientificCommunity Board members(s) Product Liaisons Frequent meetings