This document summarizes an presentation about automating CI/CD testing, installation, and deployment of Dataverse in the European Open Science Cloud. It discusses using Docker and Kubernetes for deployment, a community-driven QA plan using pyDataverse for test automation, and providing quality assurance as a service. The presentation also covers topics like the CESSDA maturity model, integrating Dataverse on Google Cloud, and using serverless computing for some Dataverse applications and services.
Ontologies, controlled vocabularies and Dataversevty
Presentation on Semantic Web technologies for Dataverse Metadata Working Group running by Institute for Quantitative Social Science (IQSS) of Harvard University.
The presentation for the W3C Semantic Web in Health Care and Life Sciences community group by Slava Tykhonov, DANS-KNAW, the Royal Netherlands Academy of Arts and Sciences (October 2020). The recording is available https://www.youtube.com/watch?v=G9oiyNM_RHc
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
CLARIN CMDI use case and flexible metadata schemes vty
Presentation for CLARIAH IG Linked Open Data on the latest developments for Dataverse FAIR data repository. Building SEMAF workflow with external controlled vocabularies support and Semantic API. Using the theory of inventive problem solving TRIZ for the further innovation in Linked Data.
Ontologies, controlled vocabularies and Dataversevty
Presentation on Semantic Web technologies for Dataverse Metadata Working Group running by Institute for Quantitative Social Science (IQSS) of Harvard University.
The presentation for the W3C Semantic Web in Health Care and Life Sciences community group by Slava Tykhonov, DANS-KNAW, the Royal Netherlands Academy of Arts and Sciences (October 2020). The recording is available https://www.youtube.com/watch?v=G9oiyNM_RHc
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
CLARIN CMDI use case and flexible metadata schemes vty
Presentation for CLARIAH IG Linked Open Data on the latest developments for Dataverse FAIR data repository. Building SEMAF workflow with external controlled vocabularies support and Semantic API. Using the theory of inventive problem solving TRIZ for the further innovation in Linked Data.
Controlled vocabularies and ontologies in Dataverse data repositoryvty
External controlled vocabularies support implementation is one of the most asked features by research communities. Slides for the Dataverse Community Meeting 2021 at Harvard University
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
Presentation at ISKO Knowledge Organisation Research Observatory. RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Building collaborative Machine Learning platform for Dataverse network. Lecture by Slava Tykhonov (DANS-KNAW, the Netherlands), DANS seminar series, 29.03.2022
Presentation for CLARIAH IG Linked Open Data on the latest developments for Dataverse FAIR data repository. Building SEMAF workflow with external controlled vocabularies support and Semantic API.
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Andrea Scharnhorst
Presentation given at ISKO UK: research observatory, November 24, 2021
RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Vyacheslav Tykhonov, Jerry de Vries, Eko Indarto, Femmy Admiraal, Mike Priddy, and Andrea Scharnhorst: Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the DANS EASY Research Data Repository
Abstract:
The development of metadata schemes in data repositories (and other content providers) has always been a process of negotiation between the needs of the designated user communities and the content of the collection on the one side and standards developed in the field. Automatisation has both enabled and enforced standardisation and alignment of metadata schemes (see as an example). But, while designated user communities turned from being local users to global ones (due to web services), their specific needs have not vanished. Technology offers possibilities to give the aforementioned negotiation a new form. In this presentation, we present the Dataverse platform, used by many data repositories. We show - using the case of the CMDI metadata and the CLARIN (Common Language Resources and Technology Infrastructure)community - how the Dataverse common core set of metadata called Citation Block can be extended with custom fields defined as a discipline specific metadata block. In particular, we show how these custom fields can be connected to a distributed network of authoritative controlled vocabularies. So, that at the end semantic search is possible. The presentation highlights opportunities and challenges, based on our own experiences. Related work has been presented at the CLARIN Annual Conference 2021 (see Proceedings).
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
Open Source project failure often stems from not setting clear objectives or having a shared vision from the start. That said there are many success stories, including two well known Statistical examples: Demetra; and Eurostat SDMX tools (SDMX-RI). However, in all these examples there was at first a founding organisation/entity that created the right environment for its successful path into a new paradigm. In the context of my presentation this being the Statistical Information System Collaboration Community (SIS-CC / http://siscc.oecd.org).
Presented at the International Marketing and Output DataBase Conference, Gozd Martuljek, September 18 - 22, 2016.
Dataverse repository for research data in the COVID-19 Museumvty
The Covid-19 Museum has an ambition to create a platform to deposit, consult, aggregate and study heterogeneous data about the pandemics using features of a distributed web service. To achieve this purpose, Dataverse has been selected as a reliable FAIR data repository with built-in search engine and functionality that allows adding computing resources to explore archived resources both on data and metadata. Presentation by
Slava Tykhonov, DANS-KNAW (The Royal Netherlands Academy of Arts and Sciences). Université Paris Cité, 19 April 2022.
Controlled vocabularies and ontologies in Dataverse data repositoryvty
External controlled vocabularies support implementation is one of the most asked features by research communities. Slides for the Dataverse Community Meeting 2021 at Harvard University
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
Presentation at ISKO Knowledge Organisation Research Observatory. RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Building collaborative Machine Learning platform for Dataverse network. Lecture by Slava Tykhonov (DANS-KNAW, the Netherlands), DANS seminar series, 29.03.2022
Presentation for CLARIAH IG Linked Open Data on the latest developments for Dataverse FAIR data repository. Building SEMAF workflow with external controlled vocabularies support and Semantic API.
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Andrea Scharnhorst
Presentation given at ISKO UK: research observatory, November 24, 2021
RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Vyacheslav Tykhonov, Jerry de Vries, Eko Indarto, Femmy Admiraal, Mike Priddy, and Andrea Scharnhorst: Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the DANS EASY Research Data Repository
Abstract:
The development of metadata schemes in data repositories (and other content providers) has always been a process of negotiation between the needs of the designated user communities and the content of the collection on the one side and standards developed in the field. Automatisation has both enabled and enforced standardisation and alignment of metadata schemes (see as an example). But, while designated user communities turned from being local users to global ones (due to web services), their specific needs have not vanished. Technology offers possibilities to give the aforementioned negotiation a new form. In this presentation, we present the Dataverse platform, used by many data repositories. We show - using the case of the CMDI metadata and the CLARIN (Common Language Resources and Technology Infrastructure)community - how the Dataverse common core set of metadata called Citation Block can be extended with custom fields defined as a discipline specific metadata block. In particular, we show how these custom fields can be connected to a distributed network of authoritative controlled vocabularies. So, that at the end semantic search is possible. The presentation highlights opportunities and challenges, based on our own experiences. Related work has been presented at the CLARIN Annual Conference 2021 (see Proceedings).
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
Open Source project failure often stems from not setting clear objectives or having a shared vision from the start. That said there are many success stories, including two well known Statistical examples: Demetra; and Eurostat SDMX tools (SDMX-RI). However, in all these examples there was at first a founding organisation/entity that created the right environment for its successful path into a new paradigm. In the context of my presentation this being the Statistical Information System Collaboration Community (SIS-CC / http://siscc.oecd.org).
Presented at the International Marketing and Output DataBase Conference, Gozd Martuljek, September 18 - 22, 2016.
Dataverse repository for research data in the COVID-19 Museumvty
The Covid-19 Museum has an ambition to create a platform to deposit, consult, aggregate and study heterogeneous data about the pandemics using features of a distributed web service. To achieve this purpose, Dataverse has been selected as a reliable FAIR data repository with built-in search engine and functionality that allows adding computing resources to explore archived resources both on data and metadata. Presentation by
Slava Tykhonov, DANS-KNAW (The Royal Netherlands Academy of Arts and Sciences). Université Paris Cité, 19 April 2022.
Docker Bday #5, SF Edition: Introduction to DockerDocker, Inc.
In celebration of Docker's 5th birthday in March, user groups all around the world hosted birthday events with an introduction to Docker presentation and hands-on-labs. We invited Docker users to recognize where they were on their Docker journey and the goal was to help them take the next step of their journey with the help of mentors. This presentation was done at the beginning of the events (this one is from the San Francisco event in HQ) and gives a run down of the birthday event series, Docker's momentum, a basic explanation of containers, the benefits of using the Docker platform, Docker + Kubernetes and more.
Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...Ashnikbiz
This was presented by Steven Thwaites, Technical Solutions Engineer at Docker at Cloud Expo Asia. Docker is the only Containers-as-a-Service platform for IT that manages and secures diverse applications across disparate infrastructure, both on-premises and in the cloud. It covers topics like:
VMs vs Containers
The Docker Ecosystem
How to Build and Ship your Docker Image
Unique Advantages with Docker EE and more
Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...OCCIware
OCCIware - standard, extensible Cloud consumer platform : an end-to-end demo (IoT, Linked Data, Spark, Docker)
Who uses multi cloud today ? Everybody. Alas, this leads to a lot of "technical glue". Enter OCCIware's Studio and Runtime : manage all layers and domains of the Cloud (XaaS) in a uniform, standard, extensible way - the Cloud consumer platform.
This presentation first introduces the OCCIware platform - the result of 3 years of R&D by French Open Source companies and labs led byb Smile and Inria. It then shows a live demonstration of how its component helps an IoT, Linked & Big Data, containerized Cloud solution to let electricity consumption be monitored across territories by all actors - individuals, utility providers, up to regional public bodies.
The presentation includes demos of OCCIware's visual Docker & Linked Data Studios, OCCInterface web playground.
OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...Marc Dutoo
Who uses multi cloud today ? Everybody. Alas, this leads to a lot of "technical glue". Enter OCCIware's Studio and Runtime : manage all layers and domains of the Cloud (XaaS) in a uniform, standard, extensible way - the Cloud consumer platform.
This presentation first introduces the OCCIware platform - the result of 3 years of R&D by French Open Source companies and labs led byb Smile and Inria. It then shows a live demonstration of how its component helps an IoT, Linked & Big Data, containerized Cloud solution to let electricity consumption be monitored across territories by all actors - individuals, utility providers, up to regional public bodies.
Keywords : nodeMCU/ESP8266, JSON-LD, Spark, react.js, Docker, and obviously Open Cloud Computing Interface (OCCI).
With demos of OCCIware's visual Docker & Linked Data Studios, OCCInterface web playground.
OCCIware @ Cloud Computing World 2016 - year 1 milestone & Linked Data demoMarc Dutoo
OCCIware @ Cloud Computing World 2016 - introduction to OCCI and OCCIware, year 1 main outputs (Docker Studio & Studio Factory, erocci OCCI bus) & Linked Data as a Service (LDaaS)-themed end-to-end demo
This 2nd version of the last year workshop will shed light on a modern solution to solve application portability, building, delivery, packaging, and system dependency issues. Containers especially Docker have seen accelerated adoption in the web, cloud and recently the enterprise. HPC environments are seeing something similar to the introduction of HPC containers Singularity and Shifter. They provide a good use case for solving software portability, not to mention ensure repeatability of results. Not to mention their ECO system provides for the better development, delivery, testing workflows that were alien to most of HPC environments. This workshop will cover the Theory and hands-on of containers and Its ecosystem. Introducing Docker and singularity containers; Docker as a general-purpose container for almost any app, Singularity as the particular container technology for HPC. The workshop will go over the foundations of the containers platform, including an overview of the platform system components: images, containers, repositories, clustering, and orchestration. The strategy is to demonstrate through "live demo, and hands-on exercises." The reuse case of containers in building a portable distributed application cluster running a variety of workloads including HPC workload.
Similar to Automated CI/CD testing, installation and deployment of Dataverse infrastructure on a Cloud (20)
Decentralised identifiers and knowledge graphs vty
Building an Operating System for Open Science: data integration challenges, Dataverse data repository and knowledge graphs. Lecture by Slava Tykhonov, DANS-KNAW, for the Journées Scientifiques de Rochebrune 2023 (JSR'23).
Decentralised identifiers for CLARIAH infrastructure vty
Slides of the presentation for CLARIAH community on the ideas how to make controlled vocabularies sustainable and FAIR (Findable, Accessible, Interoperable, Reusable) with the help of Decentralized Identifiers (DIDs).
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Nucleic Acid-its structural and functional complexity.
Automated CI/CD testing, installation and deployment of Dataverse infrastructure on a Cloud
1. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SSHOC
Dataverse in the
European Open Science Cloud
Automated CI/CD Testing, Installation and Deployment
Slava Tykhonov (DANS-KNAW)
lead software engineer, DANS R&D
Dataverse Community Meeting 2021
14 June 2021
Harvard University
2. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Overview
● SSHOC in the European Open Science Cloud (EOSC)
● CESSDA Maturity Model
● Docker
● Kubernetes
● Serverless
● Community-driven QA plan
● CI/CD and test automation with pyDataverse
● Quality Assurance as a Service, EOSC Synergy
● Q&A
3. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Duration: 40 months
(January 2019 – 30 April 2022)
Partners: 47
(20 beneficiaries + 27 LTPs)
SSH ESFRI Landmarks and Projects
& international SSH data infrastructures
Project budget:
€ 14,455,594.08
Type of action & funding:
Research and Innovation action
(INFRAEOSC-04-2018)
Project website:
www.SSHOpenCloud.eu
Objectives:
• creating the social sciences and humanities (SSH) part of European Open Science Cloud (EOSC)
• maximising re-use through Open Science and FAIR principles (standards, common catalogue, access control, semantic techniques, training)
• interconnecting existing and new infrastructures (clustered cloud infrastructure)
• establishing appropriate governance model for SSH-EOSC
Project:
4. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
EOSC Synergy
5. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
DANS Data Stations - Future Data Services
6. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Interoperability in EOSC
● Technical interoperability defined as the “ability of different information technology
systems and software applications to communicate and exchange data”. It should allow
“to accept data from each other and perform a given task in an appropriate and
satisfactory manner without the need for extra operator intervention”.
● Semantic interoperability is “the ability of computer systems to transmit data with
unambiguous, shared meaning. Semantic interoperability is a requirement to enable
machine computable logic, inferencing, knowledge discovery, and data”.
● Organisational interoperability refers to the “way in which organisations align their
business processes, responsibilities and expectations to achieve commonly agreed and
mutually beneficial goals. Focus on the requirements of the user community by making
services available, easily identifiable, accessible and user-focused”.
● Legal interoperability covers “the broader environment of laws, policies, procedures
and cooperation agreements”
Source: EOSC Interoperability Framework v1.0
7. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SSHOC task 5.2 Hosting and sharing data repositories
Objective
Development of a research data repository service on EOSC, for
SSH institutions currently without such a facility for their
designated communities
Deliverables
After 38 months: Data repository service running on EOSC
After 40 months: Report on principles of governance and
sustainability of the data repository service
8. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Services in European Open Science Cloud (EOSC)
● EOSC requires the level 8 of maturity (at
least)
● we need the highest quality of software to
be accepted as a service
● clear and transparent evaluation of
services is essential
● the evidence of technical maturity is the
key to success
● the limited warranty will allow to stop out-
of-warranty services
9. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse deployment in CESSDA Cloud
Docker Compose for the local development and testing
Kubernetes (K8s) for the Cloud deployment of services
(CI/CD pipeline with Jenkins and Helm)
10. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Docker module (CESSDA Dataverse, 2018)
Source: https://github.com/IQSS/dataverse-docker
11. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Docker introduction
• Extremely powerful configuration tool
• Allows to install software on any platform (Linux, Mac, Windows)
• Any software can be installed from Docker as standalone container or container
delivering Microservices (database, search engine, core service)
• Docker allows to host unlimited amount of the same software tools on different
ports
• Docker can be used to organise multilingual interfaces, for example
• Deployments with orchestrator tools: Docker Swarm, Kubernetes
• Faster development and deployments
• Isolation of running containers allows to scale up apps
• Portability saves time to run the same image on the local computer or in the cloud
• Snapshotting allows to archive Docker images state
• Resource limitation can be adjusted
12. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Kubernetes
Project maintained by Oliver Bertuch (FZ Julich) and available in Global Dataverse
Community Consortium github (GDCC)
Google Cloud, Amazon AWS, Microsoft Azure platforms supported
Open Source, community pull requests are welcome
http://github.com/IQSS/dataverse-kubernetes
13. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
How to deploy Dataverse on Google Cloud Platform
Resources needed to deploy Dataverse:
» GCP VM-instance(s)
» Kubernetes (K8S) cluster
» Docker images
» K8S Services
» K8S Deployments (workloads)
» Load Balancer or Ingress (SSL certificates)
» Mail relay (default port 25 is blocked)
» Persistent volumes (storage) for the Solr, Postgres, Dataverse and SSL services
14. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Integration with Google Cloud
Google Cloud Platform
» GCloud
a command-line (CLI) tool for running commands against Google Cloud Platform.
GCloud is part of the Google Cloud SDK
Kubernetes cluster
» Kubectl
A command line interface (CLI) for running commands against Kubernetes clusters
» Use of .yaml configuration files to describe the desired resources
Note: Kubernetes software is used by all major Cloud providers, the different is the storage
specification!
15. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Cloud Architecture
Ingress
HTTP(S) Load Balancer
Kubernetes Engine
Dataverse Service
Kubernetes Cluster
K8S Cluster Node
Dataverse Deployment Dataverse Service
Solr Deployment Solr Service
PostgreSQL
Service
PostgreSQL Deployment
Users
16. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
How to scale up Kubernetes horizontally
Kubernetes Engine
Compute Engine
Dataverse Service
Kubernetes Cluster
K8S Cluster Node1
Users
K8S Cluster Node2
Docker Hub
Container Registry
17. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
The importance of Persistent Storage
Docker containers write files to disk (I/O) for state or storage, both in /data and
/docroot folders. If a Docker container is restarted for some reason, all data will be
lost.
Solution: mount Persistent storage into the container on external disk hosted in the
Cloud.
18. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SQA process with Selenium tests for Dataverse
Selenium IDE allows
to create and replay
all UI tests in your
browser
Shared tests can be
reused by community
to increase
reproducibility
SQA for the service maturity = unit tests + integration tests
http://github.com/GlobalDataverseCommunityConsortium/dataverse-test-suite
18
19. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
CI/CD pipeline with SQAaaS (S)
1
2
3
git
push
Push GCP
container
registry
webhook
Create
docker
image
Kubernetes
Deployment
git clone
Jenkins pipeline (Jenkinsfile)
9
7
Run SQA
S 8
1. Developer pushes code to GitHub
2. Jenkins receives notification - build trigger
3. Jenkins clones the workspace
4. (S) Runs SQA tests and does FAIRness check
5. (S) Issuing digital badge according to the results
6. (S) SQAaaS API triggers appropriate workflow
7. Creates docker image if success
8. Pushes new docker image to container registry
9. Updates the kubernetes deployment
19
Source: EOSC Synergy project
20. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse App Store
Data preview: DDI Explorer, Spreadsheet/CSV, PDF, Text files, HTML, Images, video render, audio,
JSON, GeoJSON/Shapefiles/Map, XML
Interoperability: external controlled vocabularies (SKOSMOS, CESSDA CV Manager)
Data processing: NESSTAR DDI migration tool
Linked Data: RDF compliance including SPARQL endpoint (FDP)
Multilingual support: Weblate as a Service http://translation.cessda.eu
Federated login: eduGAIN, EGI Check-in, PIONIER ID
CLARIN Switchboard integration: Natural Language Processing tools
Visualization tools (maps, charts, timelines), Apache Superset integration
21. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
GUI translation with Weblate as a service
Source: SSHOC Weblate
22. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse and CLARIN tools integration
23. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Apps in Serverless - Function as a Service
Infrastructure maintenance = human resources and money.
Are you sure you need a server to provide all of your services?
The Serverless application framework allows to upload your code to the Cloud
space and it will provision and run it without any extra efforts! Scaling is done
automatically based on the workfload.
Costs - only pay for the time when your application or computing process is running.
AWS Lambda can be used to spin up application based on triggers/events.
24. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Serverless support by leading Cloud Providers
Function as a Service (FaaS) model:
Main costs:
1. Number of requests: typically billed for every million requests per month
2. Compute time: this is a measurement of the duration of the life of the function and the
amount of memory provisioned, measured in GB-seconds (higher-memory execution will
cost more than lower-memory execution of the same duration)
Source: Serverless Showdown
25. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Serverless Use Cases for Dataverse
Remember: inactive Serverless application will require some additional time to
respond to a request. This initial delay encountered is known as a “cold start”.
● Can we run Dataverse instance as Serverless app?
Yes, but just for demo, Kubernetes setup is for Dataverse as a service
● Data Previewers in Serverless?
Ideal case, cheap and efficient!
● Data Computing tools?
Yes, will fit for services running on demand
● GIS tools integration with Dataverse?
Function as a Service (FaaS) is the best
26. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Practical example - Caching function for CVV
Linked Data Serverless Resolver implemented as Lambda Function and deployed
on Harvard AWS cloud
https://github.com/Dans-labs/ld-serverless-resolver
27. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Microsoft Visual Studio Code
Features:
● source-code editor made by Microsoft
for Windows, Linux and macOS
● The Integrated CLI (Command Line
Interface)
● support for debugging, syntax
highlighting, intelligent code
completion, snippets, code refactoring,
and embedded Git
● cloud-powered development
environments for any activity
Try it here: https://code.visualstudio.com
28. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Maturity of Dataverse core and apps in EOSC
Testing process follows the CESSDA maturity model
https://zenodo.org/record/2591055#.XKR6ny2B2u5
Important: every change of Dataverse functionality should be supplied with
unit tests, changes of external functionality should get Selenium scenarios.
Goal: automate all tests, workflows and integrations, follow the Software
Quality as a Service (SQaaS) framework and score as high as possible
according to the CESSDA maturity model.
29. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Our speakers
Stefan Kasberger is DevOp and Python Developer at AUSSDA (Austria) and involved
in the Dataverse community during last 3 years. He contributes Open Source tools in
Python to support Dataverse. He has done several data migrations into Dataverse.
Lately, he started to focus on creating standardized tests and test data for Dataverse
installations, to support automation/cloud/devops activities in the international
community. Stefan is creator of pyDataverse module.
Samuel Bernardo is Software Engineer at LIP in Portugal. Graduated in Lisbon
University with a degree in Computer Science and Engineering and finishing the
master degree dissertation. Worked on several projects around distributed systems,
data science and computing, Currently working in BigHPC and SQAaaS platforms
development in the EOSC Synergy and other European projects, such as Opencoasts
and WORSICA. Also a open source contributor for several years supporting the
community and free software.
30. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Questions?
vyacheslav.tykhonov@dans.knaw.nl
https://www.sshopencloud.eu
info@sshopencloud.eu
@SSHOpenCloud
/in/sshopencloud
Join our community