Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The BlueBRIDGE approach to collaborative research


Published on

Gianpaolo Coro, ISTI-CNR, at BlueBRIDGE workshop on "Data Management services to support stock assessement", held during the Annual ICES Science conference 2016

Published in: Data & Analytics
  • Be the first to comment

The BlueBRIDGE approach to collaborative research

  1. 1. BlueBRIDGE receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 675680 The BlueBRIDGE approach to collaborative research Gianpaolo Coro CNR, Italy
  2. 2. Context Progress in Information Technology has changed the paradigms of Science  The large and fast increase of volume and complexity of data requires new approaches to collect-curate-analyse the data  This requires new tools to guarantee exchange and longevity of the data and of the reapplication of the experiments
  3. 3. Big Data • Large volume • High generation velocity • Large variety • Untrustworthy (veracity) • High complexity (variability) Big Data: a dataset with large volume, variety, generation velocity, containing complex and untrustworthy information that requires nonconventional methods to extract, manage and process information within a reasonable time.
  4. 4. New Science Paradigms  Open Science: make scientific research, data and dissemination accessible to all levels of an inquiring society, amateur or professional. Keywords: Open Access, Open research, Open Notebook Science  E-Science: computationally intensive science is carried out in highly distributed network environments that use large data sets and require distributed computing and collaborative tools. Keywords: Provenance of the scientific process, Scientific workflows  Science 2.0: process and publish large data sets using a collaborative approach. Share from raw data to experimental results and processes. Support collaborative experiments and Reproducibility-Repeatability-Reusability (R-R-R) of Science. Keywords: collaborative and repeatable Science
  5. 5. Requirements for IT systems • Support collaborative research and experimentation • Implement Reproducibility-Repeatability-Reusability of Science • Allow sharing data, processes and findings • Grant free access to the produced scientific knowledge • Tackle Big Data challenges • Sustainability: low operational costs, low maintenance prices • Manage heterogeneous data/processes access policies • Meet industrial processes requirements
  6. 6. e-Infrastructures e-Infrastructures enable researchers at different locations across the world to collaborate in the context of their home institutions or in national or multinational scientific initiatives. • People can work together having shared access to unique or distributed scientific facilities (including data, instruments, computing and communications). Examples: Belief, OpenAire, i-Marine, EU-Brazil OpenBio,
  7. 7. Virtual Research Environments • Define sub-communities • Allow temporary dedicated assignment of computational, storage, and data resources • Manage policies • Support data and information sharing Integrates e-Infrastructure Unified Resource Space Enables VRE VRE VRE WPS External e-Infrastructures
  8. 8. Virtual Research Environments Innovative, web-based, community-oriented, comprehensive, flexible, and secure working environments. • Communities are provided with applications to interact with the VRE services • Client services are provided both with APIs (Java, R) and simple HTTP-REST interfaces
  9. 9. VREs Example The D4Science e-Infrastructure D4Science supports scientists in several domains 1. More than 25 000 taxonomic studies per month 2. More than 60 000 species distribution maps produced and hosted 3. Used to build a pan- European geothermal energy map 4. Processing and management of heterogeneous environmental and Earth system data 5. Enhances communication and exchange in Linguistic Studies, Humanities, Cultural Heritage, History and Archaeology
  10. 10. BlueBRIDGE VREs Stock Assessment assess the health status of fisheries stocks. assessment CMSY model Marine Protected Areas reduce adverse impact of human activities (e.g. fishing, aquaculture, tourism) on ecosystems, and ensure these activities are properly embedded in policy frameworks. impact-maps
  11. 11. Education VREs Lecture-style: the course topics stress is different depending on the audience Interactive: after each explained topic, students do experiments Experimental: students reproduce the experiment shown by the teacher and possibly repeat it on their own data Social: students communicate via messaging or VRE discussion panel • 1 course/year In Pisa • 1 course/year In Paris • 12 courses In Copenhagen International Council for the Exploration of the Sea • 38 courses All over the world +1000 attendees
  12. 12. Social networking is key to share information in e-Infrastructure BlueBRIDGE offers a continuously updated list of events / news produced by users and applications User-shared News Application- shared News Share News BlueBRIDGE VREs: Social Networking
  13. 13. A free-of-use folder-based file system allows managing and sharing information objects. Information objects can be • files, dataset, workflows, experiments, etc. • organized into folders • shared • disseminated via public URLs BlueBRIDGE VREs: The Workspace – an online files storage system
  14. 14. Storage Databases Cloud storage Geospatial data Metadata generation and management Harmonisation Sharing Data management Cloud computing Elastic resources assignment Multi-platform: R, Java, Fortran Processing BlueBRIDGE Facilities: Overview
  15. 15. Data Processing
  16. 16. • Experiments on Big Data • Sharing inputs and results • Save the provenance of experiments • Supports R-R-R of experiments WPS REST • Input/Out • Parameters • Provenance Cloud Computing Platform
  17. 17. BlueBRIDGE computational capabilitiesProject resources:  6 Virtual Machines (VM) with 16 virtual CPU cores, 16GB of RAM and 100GB of storage  100 VMs with 2 virtual CPU cores, 8GB of RAM and 20GB of storage Processes:  ~ 200 algorithms hosted in all the VREs  ~ 20 contributing institutes  ~ 30,000 requests per month  ~ 2000 scientists/students in 44 countries using VREs  Programming languages: R, Java, Python, Fortran, Linux-compiled External providers (European Grid Infrastructure):  6 VMs: 8 virtual CPU cores, 16GB of RAM and 100GB of storage  2 VMs: 16 virtual CPU cores, 32GB of RAM and 100GB of storage  24 VMs: 2 virtual CPU cores, 8GB of RAM and 50GB of storage  5VMs: 4 virtual CPUs cores, 8GB of RAM and 80GB of disk
  18. 18. Integrating new processes Integration: putting a script that works offline into the Cloud computing platform. Tools: R script Computing platform Web interface and Web service SAI - Importing tool Automatic
  19. 19. Advantages  The process is available as-a-Service  Invoked via communication standards  Higher computational capabilities  Automatic creation of a Web interface  Provenance management  Storage of results on a high-availability system  Collaboration and sharing  Re-usability, e.g. from other software (e.g. QGIS)
  20. 20. Collaborative experiments WS Shared online folders Inputs Outputs Results Computational system In the e-Infrastructure Through third party software
  21. 21. Ensemble Model Implementation of an ensemble model approach to support advice and management in fisheries. Thorpe et al. (2015). Evaluation and management implications of uncertainty in a multispecies size structured model of population and community responses to fishing. Methods in Ecology and Evolution, 6(1), 49-58.  Diet Information  Life history diet information  Historical fishing scenarios  MSY fishing scenarios  Initial abundance values  Life history prior information  Total Biomass  Stock Spawning Biomass  Life history traits Input Output Process Python script
  22. 22. EM Integration Download the python script and the user’s data Execute script Collect output Destroy local copies of I/O and script Save Output on the User’s Workspace, with provenance info Scientist’s provided script User’s data Infrastructure machine
  23. 23. EM Interface User’s private Workspace
  24. 24. EM Interface
  25. 25. EM Interface
  26. 26. EM Interface
  27. 27. Scientific Workflow Script provider Updates the script on his private Workspace The service downloads the script on-the-fly A user executes an experiment on his/her data The output, the input and the parameters can be shared with another user This user can execute the experiment again and share the computation with the other user 1 2 3 4 5 6 7 89 10
  28. 28. Limitations and requirements Input OutputScript Script Required Provided Issues:  Code is often designed for one precise data set  Often, prototype scripts have code that is not separable from the I/O In the context of e-Infrastructures and Science 2.0:  Modularity is necessary for integration  Scripts should be re-organised in a way they could be re-used on other data without changing the code Vs
  29. 29. WS Self-consistent comp. products RepeatabilityProvenance Prov-O Reusability Use of standards Reproducibility Conclusions  E-Infrastructures endow processes with several Science 2.0 features  BlueBRIDGE offers an e-Infrastructure and resources to host processes and collaborate  Effort is required to algorithms providers to comply with service and generalisation requirements