Description and scope of the Project
Phidias HPC is aimed at developing a consolidated and shared HPC and Data service by building on pre-existing and emerging infrastructure in order to create a federation of "user to infrastructure" services.
To achieve its purpose and to gain a comprehensive picture of the European infrastructure landscape, three data area tests will develop and provide new services to discover, manage and process spatial and environmental data produced by research communities tackling scientific challenges such as atmospheric, marine and earth observation issues.
Webinar: How to improve the cloud services for marine data
Observing the ocean is challenging: missions at sea are costly, different scales of processes interact, and the conditions are constantly changing, which is why scientists say that "a measurement not made today is lost forever". For these reasons, it is fundamental to properly store both the data and metadata, so that their access can be guaranteed for the widest community, in line with the FAIR principles: Findable, Accessible, Inter-operable and Reusable.
PHIDIAS HPC has organised a webinar entitled "PHIDIAS: Boosting the use of cloud services for marine management, services and processing" to be held on 4th June 2020 at 11 AM CEST. The webinar aims to introduce the Phidias HPC initiative, in collaboration with the Blue-Cloud project, to the European HPC and Research community, specifically in the Blue economy, to improve the use of (1) cloud services for marine data management, (2) data services to the user in a FAIR perspective, and (3) data processing on demand.
These objectives will be pursued in coherence with the development of the European Open Science Cloud (EOSC) and the Copernicus Data and Information Access Services (DIAS).
Phidias: Steps forward in detection and identification of anomalous atmospher...Phidias
PHIDIAS is organised a webinar entitled "Steps forward in detection and identification of anomalous atmospheric events" held on 13 October 2020 at 15:00 CEST in collaboration with ESCAPE project. The webinar aimed at showcasing how PHIDIAS is going to improve the usage of HPC and high performance data management services for the development of intelligent screening approaches for the exploitation of large amounts of satellite atmospheric data in an operational context.
Experience in managing service portfolio by Pasquale PaganoBlue BRIDGE
Pasquale Pagano discusses managing a service portfolio including the challenges and the future challenges in managing a service portfolio.
This work is licensed under the Creative Commons CC-BY 4.0 licence.
Cloud for Research and Innovation - UK USA HPC workshop, Oxford, July 205Martin Hamilton
How can public cloud and technologies like Docker and OpenStack help to deliver next generation scientific computing infrastructure? My talk for the UK/USA HPC workshop in July 2015, organized by HPC-SIG (UK) and CASC (USA).
Enabling efficient movement of data into & out of a high-performance analysis...Jisc
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
Science Demonstrator Session: Physics and AstrophysicsEOSCpilot .eu
The main focus of Science Demonstrator sessions is to provide feedback to the EOSC community on the first experience of science demonstrators in the practical use of the emerging EOSC ecosystem.
Each panel will consist of a representative of a Science Demonstrator that will provide an overview of their experiences in the use of emerging EOSC services.
These sessions will help members of the scientific communities understanding the current state of maturity of the EOSC ecosystem and what is obtainable in a field of scientific research. It is also valuable to prospective Service Providers who wish to discover what are the challenges and opportunities that user communities might have to deal with, as a result of the adoption of their services.
This session will focus on Physics and Astrophysics.
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos
Keynote presentation given at "The Emerging Technology Forum – Data Creates Universe - Scientific Data Innovation Conference" of the "Pujiang Innovation Forum 2021" event.
Wielkopolska activities with potential to cluster to cluster collaboration EU...Raul Palma
We introduce the experiences and lessons learned towards the development of a smart agriculture infrastructure in Wielkopolska region, and comment on potential gaps and opportunities for clustering collaborations
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC
This presentation gives an overview on the H2020 INFRAEOSC PaNOSC project, showcasing its activities and expected results, as well as its vision, i.e., to create a PaN scientific commons
Phidias: Steps forward in detection and identification of anomalous atmospher...Phidias
PHIDIAS is organised a webinar entitled "Steps forward in detection and identification of anomalous atmospheric events" held on 13 October 2020 at 15:00 CEST in collaboration with ESCAPE project. The webinar aimed at showcasing how PHIDIAS is going to improve the usage of HPC and high performance data management services for the development of intelligent screening approaches for the exploitation of large amounts of satellite atmospheric data in an operational context.
Experience in managing service portfolio by Pasquale PaganoBlue BRIDGE
Pasquale Pagano discusses managing a service portfolio including the challenges and the future challenges in managing a service portfolio.
This work is licensed under the Creative Commons CC-BY 4.0 licence.
Cloud for Research and Innovation - UK USA HPC workshop, Oxford, July 205Martin Hamilton
How can public cloud and technologies like Docker and OpenStack help to deliver next generation scientific computing infrastructure? My talk for the UK/USA HPC workshop in July 2015, organized by HPC-SIG (UK) and CASC (USA).
Enabling efficient movement of data into & out of a high-performance analysis...Jisc
From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
Science Demonstrator Session: Physics and AstrophysicsEOSCpilot .eu
The main focus of Science Demonstrator sessions is to provide feedback to the EOSC community on the first experience of science demonstrators in the practical use of the emerging EOSC ecosystem.
Each panel will consist of a representative of a Science Demonstrator that will provide an overview of their experiences in the use of emerging EOSC services.
These sessions will help members of the scientific communities understanding the current state of maturity of the EOSC ecosystem and what is obtainable in a field of scientific research. It is also valuable to prospective Service Providers who wish to discover what are the challenges and opportunities that user communities might have to deal with, as a result of the adoption of their services.
This session will focus on Physics and Astrophysics.
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos
Keynote presentation given at "The Emerging Technology Forum – Data Creates Universe - Scientific Data Innovation Conference" of the "Pujiang Innovation Forum 2021" event.
Wielkopolska activities with potential to cluster to cluster collaboration EU...Raul Palma
We introduce the experiences and lessons learned towards the development of a smart agriculture infrastructure in Wielkopolska region, and comment on potential gaps and opportunities for clustering collaborations
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC
This presentation gives an overview on the H2020 INFRAEOSC PaNOSC project, showcasing its activities and expected results, as well as its vision, i.e., to create a PaN scientific commons
On 29 January 2020 ARCHIVER launched its Request for Tender with the purpose to award several Framework Agreements and work orders for the provision of R&D for hybrid end-to-end archival and preservation services that meet the innovation challenges of European Research communities, in the context of the European Open Science Cloud.
The tender was closed on 28 April 2020 and 15 R&D bids were submitted, with consortia that included 43 companies and organisations. The best bids have been selected and will start the first phase of the ARCHIVER R&D (Solution Design) in June 2020.
On Monday 8 June the selected consortia for the ARCHIVER design phase have been announced during a Public Award Ceremony starting at 14.00 CEST.
In light of the COVID-19 outbreak and the and consequent movement restrictions imposed in several countries, the event has been organised as a webinar, virtually hosted by Port d’Informació Científica (PIC), a member of the Buyers Group of the ARCHIVER consortium.
The Kick-off marks the beginning of the Solution Design Phase.
Big Data Europe at eHealth Week 2017: Linking Big Data in HealthBigData_Europe
Of the four V's of big data – Volume, Velocity, Variety and Veracity – the most challenging for the health sector is Variety. Health data comes from many sources, formats and standards – how can we bring these together to reap the benefits of big data technologies?
Big Data Europe is tackling this challenge head-on, building a big data infrastructure flexible enough to tackle all seven Societal Challenges identified by Horizon 2020. Here we demonstrate our pilot implementation of Open PHACTS, which integrates life science data for drug discovery.
12 May 2017
Towards an e-infrastructure in agriculture?Blue BRIDGE
Donatella Castelli, CNR-ISTI & BlueBRIDGE Coordinator, gave an introductive talk in the "Towards an e-infrastructure in agriculture?" session at the Euragri workship in Inra, Paris discussing leading an e-infrastructure project in marine research e-Infrastructure and how it refers to a combination of digital technologies (hardware and software), resources (data, services, digital libraries), communications (protocols, access rights and networks), and the people and organisational structures needed to manage them.
Science Demonstrator Session: Social and Earth SciencesEOSCpilot .eu
The main focus of Science Demonstrator sessions is to provide feedback to the EOSC community on the first experience of science demonstrators in the practical use of the emerging EOSC ecosystem.
Each panel will consist of a representative of a Science Demonstrator that will provide an overview of their experiences in the use of emerging EOSC services.
These sessions will help members of the scientific communities understanding the current state of maturity of the EOSC ecosystem and what is obtainable in a field of scientific research. It is also valuable to prospective Service Providers who wish to discover what are the challenges and opportunities that user communities might have to deal with, as a result of the adoption of their services.
This session will focus on Social and Earth Sciences.
DANS Data Trail Data Management Tools for Archaeologistsariadnenetwork
With the arrival of ARIADNEplus there is a searchable catalogue of datasets that helps archaeological researchers navigate the “maze” of data and archives. Especially for archaeological researchers, support staff and data managers, a set of tools has now been developed that helps in making your data management plan. Hella Holander, Peter Doorn and Paola Ronzino introduced the tools to the participants during the workshop.
The ARIADNEplus online toolset for data management consists of three parts:
a protocol for archaeological data management,
a template for researchers to create a data management plan with archaeological data,
a manual containing all guidelines, recommendations and practical examples of data management.
In just six steps, the protocol takes you through the entire process of making a Data Management Plan (DMP) for archaeological research. By using the templates and the accompanying manual with a clear set of guidelines and advice, it becomes much easier to meet the requirements of organisations that fund research. The DMP is then also in line with standards in the archaeological domain, which ultimately makes the data more findable, accessible, reusable and interoperable (FAIR).
Building earth observation applications with NextGEOSS - webinarterradue
Training taster for the NextGEOSS Workshop to be held in Geneva on September 11th, 2018.
A review of the NextGEOSS components and services available to partners for the integration of their applications on the NextGEOSS Platform.
Announcement: https://nextgeoss.eu/second-nextgeoss-training/
Publishing your research: Research Data Management (Introduction) Jamie Bisset
Publishing your research: Research Data Management (Introduction) (November 2013) slides. Delivered as part of the Durham University Researcher Development Programme. Further Training available at https://www.dur.ac.uk/library/research/training/
Introduction: The Big Data Europe Project at the: CMG-AE Event: Big Data: Strategien, Technologien und Nutzen
19th of May 2015, Expat Center der Wirtschaftsagentur, Vienna, Austria
See: http://www.big-data-europe.eu
A Linked Data Dataset for Madrid Transport Authority's DatasetsOscar Corcho
Presentation done at the CIT2014 conference in Santander, describing the initial work towards providing a Linked Data dataset for Consorcio Regional de Transportes de Madrid
On 29 January 2020 ARCHIVER launched its Request for Tender with the purpose to award several Framework Agreements and work orders for the provision of R&D for hybrid end-to-end archival and preservation services that meet the innovation challenges of European Research communities, in the context of the European Open Science Cloud.
The tender was closed on 28 April 2020 and 15 R&D bids were submitted, with consortia that included 43 companies and organisations. The best bids have been selected and will start the first phase of the ARCHIVER R&D (Solution Design) in June 2020.
On Monday 8 June the selected consortia for the ARCHIVER design phase have been announced during a Public Award Ceremony starting at 14.00 CEST.
In light of the COVID-19 outbreak and the and consequent movement restrictions imposed in several countries, the event has been organised as a webinar, virtually hosted by Port d’Informació Científica (PIC), a member of the Buyers Group of the ARCHIVER consortium.
The Kick-off marks the beginning of the Solution Design Phase.
Big Data Europe at eHealth Week 2017: Linking Big Data in HealthBigData_Europe
Of the four V's of big data – Volume, Velocity, Variety and Veracity – the most challenging for the health sector is Variety. Health data comes from many sources, formats and standards – how can we bring these together to reap the benefits of big data technologies?
Big Data Europe is tackling this challenge head-on, building a big data infrastructure flexible enough to tackle all seven Societal Challenges identified by Horizon 2020. Here we demonstrate our pilot implementation of Open PHACTS, which integrates life science data for drug discovery.
12 May 2017
Towards an e-infrastructure in agriculture?Blue BRIDGE
Donatella Castelli, CNR-ISTI & BlueBRIDGE Coordinator, gave an introductive talk in the "Towards an e-infrastructure in agriculture?" session at the Euragri workship in Inra, Paris discussing leading an e-infrastructure project in marine research e-Infrastructure and how it refers to a combination of digital technologies (hardware and software), resources (data, services, digital libraries), communications (protocols, access rights and networks), and the people and organisational structures needed to manage them.
Science Demonstrator Session: Social and Earth SciencesEOSCpilot .eu
The main focus of Science Demonstrator sessions is to provide feedback to the EOSC community on the first experience of science demonstrators in the practical use of the emerging EOSC ecosystem.
Each panel will consist of a representative of a Science Demonstrator that will provide an overview of their experiences in the use of emerging EOSC services.
These sessions will help members of the scientific communities understanding the current state of maturity of the EOSC ecosystem and what is obtainable in a field of scientific research. It is also valuable to prospective Service Providers who wish to discover what are the challenges and opportunities that user communities might have to deal with, as a result of the adoption of their services.
This session will focus on Social and Earth Sciences.
DANS Data Trail Data Management Tools for Archaeologistsariadnenetwork
With the arrival of ARIADNEplus there is a searchable catalogue of datasets that helps archaeological researchers navigate the “maze” of data and archives. Especially for archaeological researchers, support staff and data managers, a set of tools has now been developed that helps in making your data management plan. Hella Holander, Peter Doorn and Paola Ronzino introduced the tools to the participants during the workshop.
The ARIADNEplus online toolset for data management consists of three parts:
a protocol for archaeological data management,
a template for researchers to create a data management plan with archaeological data,
a manual containing all guidelines, recommendations and practical examples of data management.
In just six steps, the protocol takes you through the entire process of making a Data Management Plan (DMP) for archaeological research. By using the templates and the accompanying manual with a clear set of guidelines and advice, it becomes much easier to meet the requirements of organisations that fund research. The DMP is then also in line with standards in the archaeological domain, which ultimately makes the data more findable, accessible, reusable and interoperable (FAIR).
Building earth observation applications with NextGEOSS - webinarterradue
Training taster for the NextGEOSS Workshop to be held in Geneva on September 11th, 2018.
A review of the NextGEOSS components and services available to partners for the integration of their applications on the NextGEOSS Platform.
Announcement: https://nextgeoss.eu/second-nextgeoss-training/
Publishing your research: Research Data Management (Introduction) Jamie Bisset
Publishing your research: Research Data Management (Introduction) (November 2013) slides. Delivered as part of the Durham University Researcher Development Programme. Further Training available at https://www.dur.ac.uk/library/research/training/
Introduction: The Big Data Europe Project at the: CMG-AE Event: Big Data: Strategien, Technologien und Nutzen
19th of May 2015, Expat Center der Wirtschaftsagentur, Vienna, Austria
See: http://www.big-data-europe.eu
A Linked Data Dataset for Madrid Transport Authority's DatasetsOscar Corcho
Presentation done at the CIT2014 conference in Santander, describing the initial work towards providing a Linked Data dataset for Consorcio Regional de Transportes de Madrid
Coupling HPC and Data Resources and services together - EUDAT Workshop at exd...EUDAT
Giuseppe Fiameni (CINECA)
The goal of this EUDAT workshop is to present the EUDAT services, the results of the collaboration activity achieved so far and deliver a hands-on on how to write a Data Management Plan or DMP. The DMP is a useful instrument for researchers to reflect on and communicate about the way they will deal with their data as it prompts them to think about how they will generate, analyse and share data during their research project and afterwards.
PHIDIAS HPC – Building a prototype for Earth Science Data and HPC ServicesPhidias
High-Performance Computing (HPC) technology is becoming increasingly important as a key driver to push European economic growth and Scientific Research. A comprehensive tool that can support the development of a wide array of scientific domains (like Big Data, earth observation and ocean study) and impact societal challenges as well.
The Webinar aims at introducing the Phidias HPC initiative to the European HPC and Research community, including main features, expected impact and advantages for Research & HPC ecosphere. The project is paving the way to increase the HPC and Data capacities of the European Data Infrastructure by pursuing the following objectives:
- Building a prototype for earth scientific data
- Enabling Open Access to HPC Services
- Strengthening FAIRisation
- Creating a framework combining computing, dissemination and archiving resources.
Bridging the gap to facilitate selection and image analysis activities for la...Phidias
PHIDIAS organised it's third and final PHIDIAS Webinar of the series, this time dedicated to Use Case 2: Big Data Earth Observations (EO), took place on 18 February 2021 at 15:00 CET, showcasing how PHIDIAS is taking advantage of HPC architecture to facilitate selection and image analysis activities for land surface monitoring.
The EGI Federation of clusters and research clouds are components of the European Open Science Cloud, and they offer technical solutions and an infrastructure to support the EuroGEOSS pilots, GEOSS and EO data exploitation platforms.
Learn how, by looking at the collaboration of EGI with NextGEOSS, the production support of the Geohazards TEP of Terradue and the EOSC-hub collaboration with GEOSS.
Data management plans – EUDAT Best practices and case study | www.eudat.euEUDAT
| www.eudat.eu | Presentation given by Stéphane Coutin during the PRACE 2017 Spring School joint training event with the EU H2020 VI-SEEM project (https://vi-seem.eu/) organised by CaSToRC at The Cyprus Institute. Science and more specifically projects using HPC is facing a digital data explosion. Instruments and simulations are producing more and more volume; data can be shared, mined, cited, preserved… They are a great asset, but they are facing risks: we can miss storage, we can lose them, they can be misused,… To start this session, we will review why it is important to manage research data and how to do this by maintaining a Data Management Plan. This will be based on the best practices from EUDAT H2020 project and European Commission recommendation. During the second part we will interactively draft a DMP for a given use case.
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...EUDAT
The main goal of the EGI-EUDAT collaboration is to harmonise the two eInfrastructures, including technical interoperability, authentication, authorisation and identity management, policy and operations. As main objective, this work is to provide end-users with a seamless access to an integrated infrastructure offering both EGI and EUDAT services and then, pairing data and high-throughput computing resources together. Selected user communities are able to bring requirements and help assign the right priorities to each of them. In this way, the integration activity has been driven by the end users from the start. The use case permits a user of either e-infrastructure to instantiate a VM on the EGI Cloud Federation for the execution of a computational job consuming data preserved onto EUDAT resources. The results of such analysis can be staged back to EUDAT storages, and if needed, allocated with Persistent identifiers (PIDs) for future use. To implement all the steps of this use case the following integration activities between the two infrastructures has to be fulfilled: (1) harmonisation between the authentication and authorisation model, (2) definition and implementation of the interfaces between the involved EGI and EUDAT services.
Visit: https://www.eudat.eu/eudat-summer-school
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubBjörn Backeberg
This presentation was given during the Japan Geosciences Union 2019. Session details can be found at http://www.jpgu.org/meeting_e2019/SessionList_en/detail/M-GI31.htm
Similar to PHIDIAS - Boosting the use of cloud services for marine data management, services and processing (20)
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
PHIDIAS - Boosting the use of cloud services for marine data management, services and processing
1. The PHIDIAS project has received funding from the European Union's Connecting Europe Facility under grant agreement n° INEA/CEF/ICT/A2018/1810854.
PHIDIAS: Boosting the use of cloud
services for marine data management,
services and processing
Webinar | June 4, 2020, 11:00 AM CEST
2. PHIDIAS Ocean Use Case
204.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
3. Webinar Agenda
11:00 - 11:05 - Introduction of PHIDIAS project - Francesco Osimanti, Trust-IT
Services, PHIDIAS WP7 Leader
11:05 - 11:15 - PHIDIAS Ocean use case and contribution of HPC to marine
studies - Cecile Nys, IFREMER
11:15 - 11:25 - Exploring advanced cloud services for marine and oceanographic
data access and data management - Gilbert Maudire, IFREMER
11:25 - 11:30 - Q&A Session
11:30 - 11:40 - Passport photos for plankton: new era for marine biology research -
Jukka Seppälä, SYKE
11:40 - 11:50 - Analyzing ocean observations in an HPC infrastructure with
DIVAnd - Alexander Barth, University of Liege
11:50 - 12:00 - Blue-Cloud Platform: marine-thematic EOSC services for Marine
Research and the Blue Economy - Pasquale Pagano, CNR-ISTI & Blue-Cloud Project
12:00 - 12:05 - Q&A Session
12:05 - 12:10 - Closing remarks
04.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 3
5. The PHIDIAS project has received funding from the European Union's Connecting Europe Facility under grant agreement n° INEA/CEF/ICT/A2018/1810854.
PHIDIAS Ocean use case and
contribution of HPC to marine studies
Cécile NYS, IFREMER
Assistant Manager Ocean Data Cluster – ODATIS
Phidias WP6 member
Webinar | June 4, 2020
6. WP6 “Use-case 3 – Ocean” overview
Combine, collocate and process
data from several data sources (in
situ & satellite)
Enhancing data archiving (most
observation cannot be
reproduced) facilitate data
reuse
Facilitate and speed up co-
localisation and process of data
from different sources
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 6
7. WP6 “Use-case 3 – Ocean” overview
Combine and collocate data from several data sources (in situ &
satellite)
Adopting new data structures (based on big-data technologies)
DataCubes
NoSQL databases (numerical data) : Cassandra, MongoDB, etc.
Semantic Web (text data)
Providing on demand data browsing and processing facilities
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 7
8. Surface Salinity in North Atlantic
CTD (SeaDataNet),
Argo Floats (CMEMS),
SMOS satellite.
Chlorophyll in North-East Atlantic and Baltic Sea
CTD and bottles (SeaDataNet),
BGC Argo floats (ARGO GDAC),
Ferrybox,
Sentinel 2 images (DIAS WEkEO).
Case-studies
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 8
9. 904.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
Data Infrastructure Harmonisation
Collections
Data lake
Processing
Data Infrastructure Harmonisation
Data Infrastructure Harmonisation
Data flow
10. 1004.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
Data Infrastructure Harmonisation
Collections
Data lake
Processing
Peter THIJSSE (presented by Gilbert MAUDIRE)
Exploring advanced cloud services for marine and
oceanographic data access and data management
11. 1104.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
Data Infrastructure Harmonisation
Collections
Data lake
Processing
Jukka SEPPÄLÄ
Passport photos for plankton: new era
for marine biology research
12. 1204.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
Data Infrastructure Harmonisation
Collections
Data lake
Processing
Alexander BARTH
Analyzing ocean observations in an HPC
infrastructure with DIVAnd
14. The PHIDIAS project has received funding from the European Union's Connecting Europe Facility under grant agreement n° INEA/CEF/ICT/A2018/1810854.
Cloud services for marine and
oceanographic data access and data
management
Gilbert Maudire (Ifremer) / Peter Thijsse (MARIS)
June 4, 2020, 11:25 AM CEST
15. Outline
Introduction
Data resources in scope
Discovery service
Prototype Data Lake for processing
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 15
16. Main objective recap
to improve the use of cloud services for marine data
management, data service to users in a FAIR perspective, data
processing on demand, taking into account the European Open
Science Cloud (EOSC) challenge and the Copernicus Data and
Information Access Services (DIAS).
1604.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
17. Marine data resources in scope
1704.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
SeaDataNet in-situ
Euro-ARGO in-situ CMEMS in-situ
SMOS and Sentinel-3
Remote sensing
18. Discovery service
Build up metadata indexes of available datasets
Metadata checks during import (completeness/readable/correct
vocabularies)
Include DOI’s/PID’s of the original datasets
New DOI’s will be assigned for newly processed datasets (SEANOE)
Use elastic search to support fast response on searches
1804.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
19. Metadata is important
The PHIDIAS catalogue metadata model will be based on Dublin Core
element (extended with ISO19115 if necessary):
compliant with the Dublin Core standards. If relevant, for example for geo-referenced
data, metadata are made compatible with ISO 19115 standard (e.g. by the addition of
geographical extend…). Main managed information are:
General metadata (Dublin Core)
Title | Author(s) and affiliations (link with ORC ID) | Publication date | Abstract | References | Use Conditions (Possible limitations…) |
Reference to data user’s manual (if any)
Access conditions
Data License (Creative Commons license, ...) | Provided data citation in DataCite format | Access service(s) | Data format and size
Keywords (CodeLists provided):
Variables (link with the Essential Ocean Variables Code List) | Method(s) | Instrument(s) | Project(s)
Geographical extends
Min and Max latitudes and longitudes | Location map
Temporal extends
Data preview(s)
List of citing publication
…
1904.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
20. Prototype Data Lake for processing
Two data types:
In-situ datasets:
not extremely large, but in many small files.
managed data types are heterogeneous: vertical profiles, times series, underway
data...
Satellite datasets:
may be very large (> several tens of petabytes at total), that leads to difficulties to
transfer them over networks.
The “Data Lake” will be periodically synchronized (e.g. daily)
2004.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
21. Different use cases, different storage (1)
For in-situ datasets - Online selection and vizualization of data using a two-
step discovery service via a common catalogue:
1) Selection of “Data collections” / Datasets , and then
2) selection of the subset of data of interest.
Example: Exploring SeaDataNet (Common Data Index) and Copernicus Marine Services
data collections including fast detection of co-localized data
Access to data will have to be optimized to select and retrieve a small
amount of data among a large number of metadata records, using
different selection criterions : geographical, temporal...
Prototype: Elastic Search on top of (No)SQL database, in order to allow
faceting of the web selection portal, with optimized response time.
2104.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
22. Different use cases, different storage (2)
Facilitate and improved access to data (especially for in-situ data)
for fast and interoperable access for visualization and subsetting
purposes (web portal) : “access few data among many data”.
Output: Small” extracted data subsets and web-based maps and
diagrams (representation of time-series and of vertical profiles).
Prototype: set up of the Data Lake by implementing NoSQL Data
base (e.g. Cassandra). This includes the synchronization
procedures from distributed data sources to the adopted data
structure within the Data Lake.
2204.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
23. Different use cases, different storage (3)
Support on- demand data processing of large data subsets using
DIVA or Pangeo
Requires high performance browsing and processing of large
amount of data (e.g. salinity and chlorophyll), preferably in
parrallel: “access many data among many data”.
Output : Gridded fields of Salinity and Chlorophyll.
Data lake prototype: “Data Cubes” which are used to access data
using Pangeo software components suite : e.g. zarr format,
Xarray, Parquet, Arrow.
2304.06.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
24. Thank-you
Gilbert Maudire (Ifremer), PHIDIAS WP6 Leader
Peter Thijsse (peter@maris.nl) and the PHIDIAS WP6 group
13.02.2020 PHIDIAS Webinar | 13.02.2020 | www.phidias-hpc.eu | @PhidiasHpc 24
25. The PHIDIAS project has received funding from the European Union's Connecting Europe Facility under grant agreement n° INEA/CEF/ICT/A2018/1810854.
PHIDIAS: Boosting the use of cloud
services for marine data management,
services and processing
Passport photos for plankton:
new era for marine biology research
Jukka Seppälä, Seppo Kaitala,
Kaisa Kraft, Otso Velhonoja SYKE
Webinar | June 4, 2020, 11:00 AM CEST
26. Phytoplankton abundance is typically estimated
using ocean colour, in situ sensors or lab analysis
Phytoplankton contribute 50% of the global photosynthesis: CO2 fixation and O2 production.
Due to measurement uncertainties and undersampling, the role of oceans – and phytoplankton
– is one of the key unknowns in global carbon-budget
We may observe the abundance of phytoplankton using Chlorophyll a as a proxy
26
Long-term average concentration of chlorophyll at the
ocean’s surface in milligrams per cubic meter of water.
The data in this map were provided by the Joint Research
Centre (JRC). Source EMODnet.
Seasonal concentration of chlorophyll in the Baltic Sea,
between Helsinki (FI) and Travemünde (DE), measured
with the ferrybox. Source Alg@line project, SYKE.
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
27. Species/group –specific information is crucial to
understand the biogeochemical fluxes
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 27
Bulk biomass estimates by Chlorophyll a do not reflect the diversity of phytoplankton
Phytoplankton community composition is largely affected by environmental and anthropogenic
forcing (light, nutrients, temperature)
Phytoplankton community composition responds very quickly to chaotic rhytms of aquatic
environments
Phytoplankton community composition (and functional types) largely affects the aquatic
elemental fluxes (carbon and nutrients) and structure of the food web (up to fish)
Photos of phytoplankton, taken by Imaging FlowCytobot at Utö station, Gulf of Finland
28. Why plankton imaging
Trad. microscopy is slow and costly (though
accurate and important reference method!)
New technologies based on optics, fluidics and
imaging offer rapid, automated, unattended,
quantitative, and cost-efficient analysis of individual
cells and colonies of plankton organisms
Possibility to permanently store the digital raw data
gathered, which allows re-analyses, and creation of
open data archives within the international scientific
community
28
Cyanobacterial bloom in the Baltic 2018 - with 3
main species recorded at 20 min intervals.
Kraft et al in prep.
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
29. Plankton imaging – state of art
13.02.2020 29
Various technologies available, many in the beta-
version/demonstration phase. Some forerunner
technologies (e.g. Cytosense) have well established
user communities and common vocabularies for
metadata.
Machine learning algorithms available but
optimising/developments ongoing
Central data storage not available, no agreed way to
connect to data aggregators
EcoTaxa web application an European forerunner for
visual exploration and the taxonomic annotation of
images. Initiated by Laboratoire d'Océanographie de
Villefranche (LOV) https://ecotaxa.obs-vlfr.fr/
PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
30. Imaging technology
13.02.2020 PHIDIAS Webinar | 13.02.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc 30
IMAGING FLOWCYTOBOT at SYKE
Images of phytoplankton cells (range 10-150µm)
Operate remotely on Utö island flow through system
Samples of 5ml with approx. 20 min interval
Camera triggered by chlorophyll-a fluorescence
Up to 30 000 high resolution images / hour
Random Forest algorithm for image regocnition –
moving towards Convolutional Neural Networks
31. Plankton imaging – PHIDIAS
31
Demonstration: from
image to information
Imaging FlowCytobot (Finnish
Environment Institute, Utö)
Finnish Meteorological
Institute's server
CSC (Center for Scientific
Computing, FI) Allas object
storage
- Data storage and sharing
during the project's
duration
Data aggregators / other
users
- EcoTaxa
- Long time data storage
cPouta (Cloud computing)
- Development of CNN-models
- GPU flavor is needed
Puhti (high performance
computing)
- CNN in production mode
(classification of new images)
- GPU or CPU flavor
- Potential realtime usage
Images
Labels
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
32. 3204/06/2020
PHIDIAS, at the focal point for multiplatform detection
of phytoplankton:
EO algorithms – sensor validation – ML, CNN – DIVA
PictureLauriLaaksoFMI
PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
33. Thank-you, stay tuned,
and see you again!
Jukka Seppälä, SYKE
jukka.seppala@ymparisto.fi
13.02.2020 PHIDIAS Webinar | 13.02.2020 | www.phidias-hpc.eu | @PhidiasHpc 33
Special Thanks to SYKE,
FMI, LUT and CSC staff
supporting the various steps
of plankton imaging!!!
34. The PHIDIAS project has received funding from the European Union's Connecting Europe Facility under grant agreement n° INEA/CEF/ICT/A2018/1810854.
Analyzing ocean observations in a
HPC infrastructure with DIVAnd
Alexander Barth, Charles Troupin University of
Liège
35. ● Many ocean
processes are
present
simultaneously
● Non-linear
Interaction between
them
● Wide time/space
spectrum of scales
● → High diversity of
ocean observations
The ocean is complex...
2
Image creation: Center for Environmental Visualization, University of Washington
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
36. … and is complex to observe
The types of observations
observations is quite diverse
Ocean observations are
sparse (because expensive)
Yet scientifically very valuable
(a measurement not taken it
lost forever, the state of the
climate and ocean in
particular changes)Image credits: ICTS SOCIB
3604.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
37. Challenges in ocean data analysis
37
Fast access to data, multitude of formats, general trend towards
netCDF
Different programming environments/languages used by
scientists:
•Fortran (still used in numerical models)
•Matlab (very widespread ~10 years ago, but less use today)
•Python
•R
But also Julia, C, C++, shell scripts,...
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
38. Switching to Julia language
● At GHER, ULiège: started to use Julia to use in 2017
● Julia version 1.0 was released on 8 August 2018
3804.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
39. DIVAnd
● DIVA: Data Interpolating Variational
Analysis
● Objective: derive a gridded
climatology from in situ
observations
● The variational inverse methods aim
to derive a continuous field which is:
○ close to the observations (it should not
necessarily pass through all
observations because observations
have errors)
○ "smooth"
● Spline interpolation
3904.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
40. ● Workshops
● Virtual Research Environment
(VRE) in SeaDataCloud
● Jupyter Notebooks
● CI (Continuous Integration)
testing (Linux, Mac OS,
Windows)
● Docker and Singularity
images with preconfigured
software
DIVAnd
4004.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
41. DIVAnd in a virtual research environment
https://vre.seadatanet.org/
4104.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
42. BlueCloud VRE
BlueCloud VRE will
also include DIVAnd
4204.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
43. Computing resource
● DIVAnd needs to solve a large matrix system
● The solvers:
○ direct solver (SuiteSparse, Cholmod) requiring a significant amount of
memory but a very fast
○ iterative solvers (preconditioned conjugate gradient) are more memory
efficient but slower
● In practice: the direct solver is preferred as long as the problems fits
into the available memory
● But having access to computing resources with sufficient
memory has been a problem for our users (SeaDataCloud, EMODnet
Chemistry)
● Code portability via Singularity container
4304.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
44. DINCAE
44
● Paper: Data INterpolating Convolutional Auto-Encoder
● Neural network to reconstruct missing data in satellite images
(in particular clouds in remotely sensed Sea Surface Temperature)
● Originally written in Python using TensorFlow 1
● Many changes in TensorFlow 2 -> better alternatives?
● Use Julia and with the Knet library
● Training time of the network was reduced from 3.5 hours to 1.9
hours (on a NVidia 1080 GPU)
● We use “data augmentation” (in particular perturbing input
data, add additional clouds,...) using vectorized numpy code,
but it could be made significantly faster by using Julia instead.
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
45. ● Sea Surface
Temperature (SST)
reconstruction with
DINCAE
● Some data is
withheld during the
reconstruction (i.e.
additional clouds)
● SST is reconstructed
and a reliable the
expected error
standard deviation
is computed
Some results with DINCAE
DINCAE reconstruction using MODIS sea surface temperature in theAdriatic
4504.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
46. Conclusions
46
● The types of available ocean data is quite diverse
● Fortran is still widely used in the oceanographic HPC community
○ But there are significant challenges to support users outside of a typical
HPC environment
○ Julia has been a good fit for us for data analysis
● The original Fortran tool DIVA has been rewritten in Julia
(DIVAnd)
● Jupyter notebooks provide the users a convenient interface
that can also be used in a Virtual Research Environment
(especially for data exploration)
● In future: adapt existing tools or adopt new algorithm able to
leverage GPUs (or other accelerators)
04.06.2020 PHIDIAS Webinar | 04.06.2020 | https://www.phidias-hpc.eu/ | @PhidiasHpc
49. The mission
Blue-Cloud aims to pilot a
cyber platform
bringing together and
providing access to
49Boosting the use of cloud services for marine data management, services and processing4 June 2020
1.
multidisciplinary
data from
observations and
models
2. analytical
tools
3. computing
facilities
to support research
to better understand
and manage the many
aspects of
ocean sustainability
50. The Leading Concepts
Developing and deploying a cloud platform with a
Virtual Research Environment (VRE) with an array of
services for configuring Virtual Labs for specific
analytical workflows, use cases and demonstrators
Applying common standards and interoperability
solutions for providing harmonized data and metadata
Developing and deploying harmonised discovery and
access to a series of established European marine
data management and processing infrastructures, that
are dealing with major marine and ocean data
collections, related data centres, and their data
providers
Discovery and access
to datasets from many
sources
Upstream
Services
Downstream
Services
Added-value services
and applications
VRE – Cloud Platform
Standards
OGC, ISO, W3C
& Vocabularies
Boosting the use of cloud services for marine data management, services and processing4 June 2020 50
51. The Technical Framework
a component to serve federated discovery and access
• Bridging blue data infrastructures and their multi-disciplinary data
from observations, in-situ and remote sensing, data products and
outputs of numerical models
a component to serve as Blue Cloud Virtual Research
Environment (VRE)
• Federating computing platforms and analytical services; this will
include Virtual Labs for each of the use case Demonstrators
Boosting the use of cloud services for marine data management, services and processing4 June 2020 51
52. Blue-Cloud federation of major infrastructures
Blue Data infrastructures E-infrastructures
Boosting the use of cloud services for marine data management, services and processing4 June 2020 52
53. Boosting the use of cloud services for marine data management, services and processing4 June 2020 53
Blue-Cloud Virtual Research Environment
Exploits Blue-Cloud data discovery and
access service
Federates computing platforms and
algorithms
Interacts with external systems
Exposes all repositories, algorithms, and
computing platforms as a common unified
space of resources
Serves diverse communities of
researchers
54. Boosting the use of cloud services for marine data management, services and processing4 June 2020 54
Support collaborative research and experimentation
Implement Reproducibility-Repeatability-Reusability of Science
Allow sharing of data, processes and findings
Grant open access to the produced scientific knowledge
Tackle Big Data challenges
Manage heterogeneous data/processes access policies
Sustainability: low operational costs, low maintenance prices
Blue-Cloud Framework satisfies
Open Science Requirements
55. Boosting the use of cloud services for marine data management, services and processing4 June 2020 55
Tuning, testing and promoting with five
demonstrators
Zoo- and Phytoplankton EOV products
Plankton Genomics
Marine Environmental Indicators
Fish, a matter of scales
Aquaculture Monitor
Biodiversity
Environment
Fishery
Aquaculture
Genomics
56. Boosting the use of cloud services for marine data management, services and processing4 June 2020 56
Function of Demonstrators
Demonstrate how the Services developed contribute to unlocking
innovation potential
• to derive requirements and specifications for the Pilot Blue Cloud platform development
• to demonstrate the potential of cloud-based open science in the marine community
• to serve as a catalyst for wider community engagement, identifying longer term challenges,
and planning future developments from pilot to a full-scale Blue-Cloud infrastructure.
Identify the scientific communities requirements
• Storage (repositories, warehouses, …)
• Multidisciplinary data access and harmonisation
• Analytical processes
• Computing requirements
57. Boosting the use of cloud services for marine data management, services and processing4 June 2020 57
Piloting an EOSC ”thematic cloud”
58. Boosting the use of cloud services for marine data management, services and processing4 June 2020 58
Blue-Cloud project
• Funding: H2020: The ‘Future of Seas and Oceans Flagship
Initiative’ (BG-07-2019-2020) topic: [A] 2019 - Blue Cloud
services
• Timing: 36 Months (start October 2019)
• Budget: 5.9 Million Euro
• Partnership: 20 partners
59. Boosting the use of cloud services for marine data management, services and processing4 June 2020 59
Any questions?
https://blue-cloud.org
- Marine data from different sources,
- Diversity of data implies having good description of them : metadata, catalogues, common vocabularies, ... in a FAIR principles perspective (introduction to your Presentation Peter),
- Diversity of data implies having good description of them : metadata, catalogues, common vocabularies, ... in a FAIR principles perspective (introduction to your Presentation Peter),
- Some of datasets are quite large or includes numerous observations (such as plankton images),in addition, having different data collections stored in different locations such as satellite (Ocean color), plankton, ...imposes to improve data access for better processing performance (introduction to Jukka presentation)
- and then processing data requires data analyses software and powerful IT infrastructures (HPC, HPDA) available to users
(introduction to your presentation Charles).
Diversity of data implies having good description of them : metadata, catalogues, common vocabularies, ... in a FAIR principles perspective
Some of datasets are quite large or includes numerous observations (such as plankton images),in addition, having different data collections stored in different locations such as satellite (Ocean color), plankton, ...imposes to improve data access for better processing performance
and then processing data requires data analyses software and powerful IT infrastructures (HPC, HPDA) available to users
Some of datasets are quite large or includes numerous observations (such as plankton images),in addition, having different data collections stored in different locations such as satellite (Ocean color), plankton, ...imposes to improve data access for better processing performance
Diversity of data implies having good description of them : metadata, catalogues, common vocabularies, ... in a FAIR principles perspective
and then processing data requires data analyses software and powerful IT infrastructures (HPC, HPDA) available to users
and then processing data requires data analyses software and powerful IT infrastructures (HPC, HPDA) available to users
Diversity of data implies having good description of them : metadata, catalogues, common vocabularies, ... in a FAIR principles perspective
Some of datasets are quite large or includes numerous observations (such as plankton images),in addition, having different data collections stored in different locations such as satellite (Ocean color), plankton, ...imposes to improve data access for better processing performance
We focus in this presentation on the data access and storage to support processing