Mobyle2: why? ● Important requests from scientists for new functionalities: ○ data integration ○ a more flexible and dynamic user interface ○ sharing, collaboration: groupware ● And also requests from Mobyle server administrators: ○ easier administration and configuration ● To enable these evolutions: ○ a need for an overall modernization of the current technical framework
Complex bioinformatics datahandling● Currently a data item in Mobyle is: ○ a "simple type" data (e.g. Integer, String, etc.), ○ a single file● These possibilities are sufficient to describe many data and services, e.g.: ○ a sequence stored in a FASTA file ○ a structure stored in a PDB file ○ the e-value of a BLAST run
Complex bioinformatics datahandlingBut this is not sufficient to describe more complex cases:Example ● a BLAST bank ○ its a set of files containing the data and some indexes ○ currently in Mobyle, a BLAST bank is a string (its name) that is printed on the command line, and the local configuration (environment variables) is used to locate the bank. ● but: ○ users cannot create custom BLAST banks, save them, and search them multiple times. ○ the configuration of the available banks on the server cannot be updated easily.
Complex bioinformatics datahandlingIf we describe a bank as Mobyle data: ● in BLAST a bank is another parameter, and we can enable the search in server-owned but also in user-owned banks, ● it is possible not only to browse which banks are available in tool X, but also to list the services which can be used to search in bank Y. ● it is possible for a same "client" program to apply different access restrictions depending on the bank used.
Complex bioinformatics datahandlingOther use cases for "complex" data: ● Folder: velvet ● Collection of files ● Complex structures linking files and "simple types": ● server-side hosted data: mandatory to manage data import of large data, e.g., NGS pipelines
Enhanced semantic descriptionof dataThe current typing mechanism is Mobyle-specific: ● its core is maintained by the Mobyle authors and contributors: ○ it is easy to use ○ it is adapted ● flexibility is achieved by adding the possibility to define new types "on-the-fly"
Enhanced semantic descriptionof dataIssues: ● hard and costly to maintain ● on-the-fly typing is a consistency issue at the MobyleNet level ● there are confusions in this existing system between semantic and syntactic levels: ○ e.g., a Sequence data can only be text-based.
Enhanced semantic descriptionof dataSolution: use an existing ontology to describe the data andparameters - EDAM? ○ easier to use/contribute to an existing effort ○ some programs are already described with this ontology: ■ EMBOSS, ■ BioCatalogue, ■ DRCAT Resource Catalogue
User interface enhancements● more dynamic: adapt the service interface according to the choices of the user. ○ e.g., if parameter A has been set to X, parameter B is relevant and should be shown, otherwise no.● include the possibility to load complete/multiple example sets for a service●
Context-dependent compositionof services● The execution of the services, based on the data provided by users, may require the transformation of a single task into a composition of services: ○ input data format detection ○ input data format conversions ○ retrieval of data from databanks ○ implicit iteration on user-provided data
Context-dependent compositionof services● Some of these tasks are already handled in Mobyle, but: ○ these helpers are limited to a specific set ○ these helpers are executed synchronously in the web server and thus limited to "small data"
Enable collaboration: groupwareenvironmentSharing possibilities: multiple users can work as a team on ashared project ● Share data and analyses ● comment and annotate them ● Publish them?
Enable collaboration: groupwareenvironment ○ A user can work in multiple projects ○ A project includes: ■ data ■ analyses ■ workflows ○ projects can be shared: ■ between multiple users ■ permissions can be set per-user for each user
Easier maintenance● Functional tests in the description of the services: ○ automated tests to monitor the status of the services● Web-based administration interface to supervise and maintain the server ○ list jobs and their status ○ modify server configuration ○●
New architectureAchieving these goals requires the revision of the currentarchitecture: ● The revision of the data model (complex structures, ontology-based typing mechanisms) requires extensive modifications in ○ the server code. ○ the service descriptions. ● The CGI-based server architecture is not adapted.
New architecture● Pyramid ○ python-based ○ The revision of the data model (complex structures, ontology-based typing mechanisms) requires extensive modifications in ○ the server code. ○ the service descriptions.● A NoSQL storage solution ○ adapted to the nature of the data structures handled: deeply nested ○ easily integrated with a web-based system