Dataverse can be deployed using Docker containers to improve maintainability and portability. The document discusses how Docker can isolate applications and their dependencies into portable containers. It provides an example of deploying Dataverse as a set of microservices within Docker containers. Instructions are included on building Docker images, running containers, and managing the containers and images through commands and tools like Docker Desktop, Docker Hub, and Docker Compose.
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
Ontologies, controlled vocabularies and Dataversevty
Presentation on Semantic Web technologies for Dataverse Metadata Working Group running by Institute for Quantitative Social Science (IQSS) of Harvard University.
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
Ontologies, controlled vocabularies and Dataversevty
Presentation on Semantic Web technologies for Dataverse Metadata Working Group running by Institute for Quantitative Social Science (IQSS) of Harvard University.
The presentation for the W3C Semantic Web in Health Care and Life Sciences community group by Slava Tykhonov, DANS-KNAW, the Royal Netherlands Academy of Arts and Sciences (October 2020). The recording is available https://www.youtube.com/watch?v=G9oiyNM_RHc
Controlled vocabularies and ontologies in Dataverse data repositoryvty
External controlled vocabularies support implementation is one of the most asked features by research communities. Slides for the Dataverse Community Meeting 2021 at Harvard University
Building collaborative Machine Learning platform for Dataverse network. Lecture by Slava Tykhonov (DANS-KNAW, the Netherlands), DANS seminar series, 29.03.2022
CLARIN CMDI use case and flexible metadata schemes vty
Presentation for CLARIAH IG Linked Open Data on the latest developments for Dataverse FAIR data repository. Building SEMAF workflow with external controlled vocabularies support and Semantic API. Using the theory of inventive problem solving TRIZ for the further innovation in Linked Data.
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
Presentation at ISKO Knowledge Organisation Research Observatory. RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Presentation for CLARIAH IG Linked Open Data on the latest developments for Dataverse FAIR data repository. Building SEMAF workflow with external controlled vocabularies support and Semantic API.
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Andrea Scharnhorst
Presentation given at ISKO UK: research observatory, November 24, 2021
RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Vyacheslav Tykhonov, Jerry de Vries, Eko Indarto, Femmy Admiraal, Mike Priddy, and Andrea Scharnhorst: Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the DANS EASY Research Data Repository
Abstract:
The development of metadata schemes in data repositories (and other content providers) has always been a process of negotiation between the needs of the designated user communities and the content of the collection on the one side and standards developed in the field. Automatisation has both enabled and enforced standardisation and alignment of metadata schemes (see as an example). But, while designated user communities turned from being local users to global ones (due to web services), their specific needs have not vanished. Technology offers possibilities to give the aforementioned negotiation a new form. In this presentation, we present the Dataverse platform, used by many data repositories. We show - using the case of the CMDI metadata and the CLARIN (Common Language Resources and Technology Infrastructure)community - how the Dataverse common core set of metadata called Citation Block can be extended with custom fields defined as a discipline specific metadata block. In particular, we show how these custom fields can be connected to a distributed network of authoritative controlled vocabularies. So, that at the end semantic search is possible. The presentation highlights opportunities and challenges, based on our own experiences. Related work has been presented at the CLARIN Annual Conference 2021 (see Proceedings).
This webinar in the course of the LOD2 webinar series will present use cases and live demos of D2R (Free University Berlin) and Sparqlify (University of Leipzig).
D2R Server is a tool for publishing relational databases on the Semantic Web. It enables RDF and HTML browsers to navigate the content of the database, and allows applications to query the database using the SPARQL query language.
Sparqlify is a tool enabling one to define expressive RDF views on relational databases and query them with a subset of the SPARQL query language. By featuring a novel RDF view definition syntax, it aims at simplifying the RDB-RDF mapping process.
more to be found at:
Demystifying Containerization Principles for Data ScientistsDr Ganesh Iyer
Demystifying Containerization Principles for Data Scientists - An introductory tutorial on how Dockers can be used as a development environment for data science projects
Introduction to dockers and kubernetes. Learn how this helps you to build scalable and portable applications with cloud. It introduces the basic concepts of dockers, its differences with virtualization, then explain the need for orchestration and do some hands-on experiments with dockers
The presentation for the W3C Semantic Web in Health Care and Life Sciences community group by Slava Tykhonov, DANS-KNAW, the Royal Netherlands Academy of Arts and Sciences (October 2020). The recording is available https://www.youtube.com/watch?v=G9oiyNM_RHc
Controlled vocabularies and ontologies in Dataverse data repositoryvty
External controlled vocabularies support implementation is one of the most asked features by research communities. Slides for the Dataverse Community Meeting 2021 at Harvard University
Building collaborative Machine Learning platform for Dataverse network. Lecture by Slava Tykhonov (DANS-KNAW, the Netherlands), DANS seminar series, 29.03.2022
CLARIN CMDI use case and flexible metadata schemes vty
Presentation for CLARIAH IG Linked Open Data on the latest developments for Dataverse FAIR data repository. Building SEMAF workflow with external controlled vocabularies support and Semantic API. Using the theory of inventive problem solving TRIZ for the further innovation in Linked Data.
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
Presentation at ISKO Knowledge Organisation Research Observatory. RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Presentation for CLARIAH IG Linked Open Data on the latest developments for Dataverse FAIR data repository. Building SEMAF workflow with external controlled vocabularies support and Semantic API.
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Andrea Scharnhorst
Presentation given at ISKO UK: research observatory, November 24, 2021
RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Vyacheslav Tykhonov, Jerry de Vries, Eko Indarto, Femmy Admiraal, Mike Priddy, and Andrea Scharnhorst: Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the DANS EASY Research Data Repository
Abstract:
The development of metadata schemes in data repositories (and other content providers) has always been a process of negotiation between the needs of the designated user communities and the content of the collection on the one side and standards developed in the field. Automatisation has both enabled and enforced standardisation and alignment of metadata schemes (see as an example). But, while designated user communities turned from being local users to global ones (due to web services), their specific needs have not vanished. Technology offers possibilities to give the aforementioned negotiation a new form. In this presentation, we present the Dataverse platform, used by many data repositories. We show - using the case of the CMDI metadata and the CLARIN (Common Language Resources and Technology Infrastructure)community - how the Dataverse common core set of metadata called Citation Block can be extended with custom fields defined as a discipline specific metadata block. In particular, we show how these custom fields can be connected to a distributed network of authoritative controlled vocabularies. So, that at the end semantic search is possible. The presentation highlights opportunities and challenges, based on our own experiences. Related work has been presented at the CLARIN Annual Conference 2021 (see Proceedings).
This webinar in the course of the LOD2 webinar series will present use cases and live demos of D2R (Free University Berlin) and Sparqlify (University of Leipzig).
D2R Server is a tool for publishing relational databases on the Semantic Web. It enables RDF and HTML browsers to navigate the content of the database, and allows applications to query the database using the SPARQL query language.
Sparqlify is a tool enabling one to define expressive RDF views on relational databases and query them with a subset of the SPARQL query language. By featuring a novel RDF view definition syntax, it aims at simplifying the RDB-RDF mapping process.
more to be found at:
Demystifying Containerization Principles for Data ScientistsDr Ganesh Iyer
Demystifying Containerization Principles for Data Scientists - An introductory tutorial on how Dockers can be used as a development environment for data science projects
Introduction to dockers and kubernetes. Learn how this helps you to build scalable and portable applications with cloud. It introduces the basic concepts of dockers, its differences with virtualization, then explain the need for orchestration and do some hands-on experiments with dockers
Docker Bday #5, SF Edition: Introduction to DockerDocker, Inc.
In celebration of Docker's 5th birthday in March, user groups all around the world hosted birthday events with an introduction to Docker presentation and hands-on-labs. We invited Docker users to recognize where they were on their Docker journey and the goal was to help them take the next step of their journey with the help of mentors. This presentation was done at the beginning of the events (this one is from the San Francisco event in HQ) and gives a run down of the birthday event series, Docker's momentum, a basic explanation of containers, the benefits of using the Docker platform, Docker + Kubernetes and more.
Containers, the next wave of virtualization, are changing everything! As companies learn about the value of DevOps practices and containerization they are flocking to containers. Now with Docker running on Windows and Docker Containers built into both Azure and Windows Server, containers are poised to take over the virtualization landscape. Come to the session to learn all about containers and how you can put these technologies to use in your organization. You will learn about DevOps, Docker Containers, Running Containers on Windows 10, Windows Server 2016 and Linux on-premises or in the Azure cloud. You will learn about the tools and practices for leveraging containers, deploying containers as well as how you can continue on your journey to becoming a container expert as you grow your technical career.
Tampere Docker meetup - Happy 5th Birthday DockerSakari Hoisko
Part of official docker meetup events by Docker Inc.
https://events.docker.com/events/docker-bday-5/
Meetup event:
https://www.meetup.com/Docker-Tampere/events/248566945/
My college ppt on topic Docker. Through this ppt, you will understand the following:- What is a container? What is Docker? Why its important for developers? and many more!
Docker Birthday #3 - Intro to Docker SlidesDocker, Inc.
High level overview of Docker + Birthday #3 overview (app and challenge portion)!
Learn more about Docker Birthday #3 celebrations here: https://www.docker.com/community/docker-birthday-3
Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...Ashnikbiz
This was presented by Steven Thwaites, Technical Solutions Engineer at Docker at Cloud Expo Asia. Docker is the only Containers-as-a-Service platform for IT that manages and secures diverse applications across disparate infrastructure, both on-premises and in the cloud. It covers topics like:
VMs vs Containers
The Docker Ecosystem
How to Build and Ship your Docker Image
Unique Advantages with Docker EE and more
This presentation by Andrew Aslinger discusses best practices and pitfalls of integrating Docker into Continuous Delivery Pipelines. Learn how Andrew and his team used Docker to replace Chef to simplify their development and migration processes.
Agenda
1. The changing landscape of IT Infrastructure
2. Containers - An introduction
3. Container management systems
4. Kubernetes
5. Containers and DevOps
6. Future of Infrastructure Mgmt
About the talk
In this talk, you will get a review of the components & the benefits of Container technologies - Docker & Kubernetes. The talk focuses on making the solution platform-independent. It gives an insight into Docker and Kubernetes for consistent and reliable Deployment. We talk about how the containers fit and improve your DevOps ecosystem and how to get started with containerization. Learn new deployment approach to effectively use your infrastructure resources to minimize the overall cost.
Similar to The world of Docker and Kubernetes (20)
Decentralised identifiers and knowledge graphs vty
Building an Operating System for Open Science: data integration challenges, Dataverse data repository and knowledge graphs. Lecture by Slava Tykhonov, DANS-KNAW, for the Journées Scientifiques de Rochebrune 2023 (JSR'23).
Decentralised identifiers for CLARIAH infrastructure vty
Slides of the presentation for CLARIAH community on the ideas how to make controlled vocabularies sustainable and FAIR (Findable, Accessible, Interoperable, Reusable) with the help of Decentralized Identifiers (DIDs).
Dataverse repository for research data in the COVID-19 Museumvty
The Covid-19 Museum has an ambition to create a platform to deposit, consult, aggregate and study heterogeneous data about the pandemics using features of a distributed web service. To achieve this purpose, Dataverse has been selected as a reliable FAIR data repository with built-in search engine and functionality that allows adding computing resources to explore archived resources both on data and metadata. Presentation by
Slava Tykhonov, DANS-KNAW (The Royal Netherlands Academy of Arts and Sciences). Université Paris Cité, 19 April 2022.
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Predicting property prices with machine learning algorithms.pdf
The world of Docker and Kubernetes
1. dans.knaw.nl
DANS is een instituut van KNAW en NWO
The world of Docker and Kubernetes
How to create, set up and manage
Kubernetes cluster at DANS: Dataverse pilot
Slava Tykhonov, Senior Information Scientist
Wilko Steinhoff, Senior Software Developer
(DANS-KNAW, The Hague, Netherlands)
11.02.2020
2. Why do we need Cloud Computing?
“Cloud computing is a style of computing in which scalable and
elastic IT is delivered as a service using Internet technologies.”
“Cloud Computing is transforming the way organisations
consume computer services.”
“We can run all our workload data of applications and
processes online over the internet remotely instead of using
physical hardware and software.”
“It’s less expensive and more secure.”
Dataverse is our Pilot Cloud Service
3. Dataverse as a FOSS product: good news
• Dataverse is Open Source software
• Great community with more than 100 contributors
• Contributions are coming from all continents
• Maintenance cost reduces as all community members are
using the same software and helping to each other
• Governance models can be reused by different countries
• Innovation in Dataverse community goes very fast
4. Dataverse as a FOSS product: bad news
• Open Source doesn’t mean Free!
• Consider all required resources: both hardware and human
• Building a service is difficult, maintenance is expensive
• Integration with other services requires the management of
changes and sometimes even not possible
• technical development is fast, the expertise isn’t up-to-date
• requires continuous training and very good communication
between all partners
6. Installation problems
Dataverse basic infrastructure seems to be very simple:
- application (Java deployed on Glassfish web server)
- database (postgres)
- search engine (SOLR)
If you’ll follow the guide and will do installation manually…
there is a great chance that it will not work.
Why?!
7. You never know where problem lies...
● OS specific issues
● application specific bugs
● the difference between the
database version(s)
● search engine update(s)
● security patches
● hardware issues
● open/closed ports on your server
It’s even more complicated if you need
to patch the software and update a
working infrastructure every time…
locally, on test/acceptance/production.
8. Typical infrastructure issues
And after it finally works the security
guy is telling you that all microservices
ports on all servers should be closed…
or there is an update of software
pieces that can break the service
or brand new chinese bot is putting
your service down
or something else is happening...
Do you remember? You have to reproduce and fix it
locally, on test/acceptance/production?
10. Maintenance vs development
Typical outcome: hundred/thousands of hours are lost, $$$,
maintenance efforts dominating over development.
Btw, the picture is clickable….
15. Dataverse Unleashed
Dataverse isn’t competing against Figshare, Zenodo,
DSpace, CKAN, EASY or others…
Dataverse is a platform to build new innovative things
together, and to integrate all the other services.
Using Dataverse means you can join the Sharing
Economy in data and speed up own innovation based
on the community developments.
16. Shared economy in the data landscape
● all partners are running the same basic data infrastructure
● source code is Open Source and shared
● community is making decisions about priorities
● new custom requirements can be implemented
independently by anyone and merged with master
(upstream)
● sustainability of software: not maintained components
usually replaced with well-maintained during the evolution
of the product
● two and more technical solutions of the same problem are
more than welcome
● the maturity of community mean the maturity of software
Do you want to join? Use Docker for your software!
17. Sometimes innovation means less communication
“Docker offered a way to create independence between the
application and the infrastructure through a standardized
container format that could be created with easy-to-use
tooling.”
David Messina, CMO at Docker
And now honestly ask yourself: how much time you’re spending to talk
and convince sysadmins to enable or install some tools you need?
To another developer working on the same code?
To reproduce the same bug on test/acceptance/production?
18. Docker features
• Extremely powerful configuration tool
• Allows to install software on any platform (Linux, Mac,
Windows)
• Any software can be installed from Docker as standalone
container or container delivering Microservices (database,
search engine, core service)
• Docker allows to host unlimited amount of the same
software tools on different ports
• Docker can be used to organise multilingual interfaces, for
example
19. Docker advantages
• Faster development and deployments
• Isolation of running containers allows to scale up apps
• Portability saves time to run the same image on the local
computer or in the cloud
• Snapshotting allows to archive Docker images state
• Resource limitation can be adjusted
20. Dataverse Docker module
This module was developed in one-year CESSDA DataverseEU
project and aimed for CESSDA Service Providers who have
limited technical resources. DANS led this project.
The goal was to deploy Dataverse software on CESSDA
Technical Infrastructure (Google Cloud). Project was funded
by the CESSDA 2018 workplan.
DataverseEU partners: ADP (Slovenia), AUSSDA (Austria),
GESIS (Germany), SND (Sweden), TARKI (Hungary),
SiencePro (France), UKDA (UK), UniData (Italy), SODA
(Belgium), LSZDA (Latvia), DANS (Netherlands)
21. Docker deployment with k8s in Clouds
• Google Cloud (policy for CESSDA SaW)
• Microsoft Azure
• Amazon Cloud
• OpenShift Cloud
• local Docker installation (minikube)
24. Docker Desktop (Community Edition)
Ideal for developers and small teams looking to get started
with Docker https://www.docker.com/community-edition
Features:
- docker-for-desktop
- docker-compose support
- integrated kubernetes (minikube)
- kitematic: Visual Docker Container Management
26. Docker concepts
• Containers are runnable artefacts
• Images are copies of containers with filesystems
• Containers can be archived as images and executed in
different clouds
• Images can preserved in repositories
https://act.dataverse.nl/dataset.xhtml?persistentId=hdl:106
95/9VCRBR
• data folders can be hosted outside of containers on
persistent volumes.
27. Hello world app (Flask application)
Dockerfile https://github.com/DANS-KNAW/parthenos-
widget/blob/master/Dockerfile
FROM python:2.7
MAINTAINER Vyacheslav Tykhonov
COPY . /widget
WORKDIR /widget
RUN pip install -r requirements.txt
ENTRYPOINT ["python"]
CMD ["app.py"]
28. Docker command line usage
Command line allows to manage containers and images and
execute Docker commands
$ docker help run
$ docker ps
$ docker login
$ docker pull, push, commit
$ docker build, run
$ docker exec
$ docker stop, rm, rmi
29. Typical Docker pipeline
Install all dependencies and build tool from scratch:
$ docker build -t parthenos:latest .
Run image from command line
$ docker run -p 8081:8081 -name parthenos parthenos
Check if container is running
$ docker ps|grep parthenos
Login inside of the container
$ docker exec -it [CONTAINER_ID] /bin/bash
Copy configuration inside of the container
$ docker cp ./parthenos.config [CONTAINER_ID]:/widget
Copy from container to local folder
$ docker [CONTAINER_ID]:/widget/* ./
Ship “dockerized” app to the world (Docker Hub or another registry)
$ docker push [IMAGE_ID]
31. Docker archiving process
Easy process to archive running software, metadata and data
separately
https://docs.docker.com/engine/reference/commandline/save/
• postgresql database with metadata and users information
• datasets files in separate folder
• software image with some individual settings
$ docker save -o archive.tar [CONTAINER_ID]
Easy to restore complete system with data and metadata by
Docker composer.
$ docker load archive.tar
32. Docker Compose
Management tool for Docker configuration for multicontainer solutions
All connections, networks, containers, port specifications stored in one file
(YML specification)
Example (DataverseEU):
http://github.com/IQSS/dataverse-docker
Tool to turn Docker Compose to Kubernetes config called Kompose:
https://github.com/kubernetes/kompose
Usage:
$ docker-compose [something]
Docker Compose is perfect tool to keep the PROVenance of software
(versions control, etc)
33. Dataverse Docker containers exploration
# Show Docker images
docker images
# Show all running containers
docker ps
# Remove Docker image by container_id (don’t execute)
docker rmi container_id
# Delete old images (don’t execute)
docker rmi `docker images -aq`
# To access Dataverse container, type exit to quit
docker exec -it dataverse /bin/bash
# PostgreSQL container, exit to quit
docker exec -it postgres /bin/bash
# Solr container, exit to quit
docker exec -it solr /bin/bash
# Copy files and folders to the running container
docker cp ./testfile dataverse:/tmp/
# Copy files and folders from the running container to your disk space
docker cp dataverse:/opt/dv/dvinstall.zip /tmp/
# Stop Dataverse container
docker stop dataverse
# Run Dataverse container
docker start dataverse
34. Dataverse maintenance with Docker
# Open the page with latest Dataverse release https://github.com/IQSS/dataverse/releases
# Follow the upgrade instruction containing war and zip, optionally .tsv or .xml schema
docker exec -it dataverse /bin/bash
wget https://github.com/IQSS/dataverse/releases/download/v4.18.1/dataverse-4.18.1.war -
O dataverse.war
asadmin undeploy dataverse
rm -rf glassfish4/glassfish/domains/domain1/generated
asadmin deploy ./dataverse.war
asadmin restart
# After Glassfish will restart go to 0.0.0.0:8085 and check the version of Dataverse
# Remember: you’ll lose all changes in your Docker container after restart!
35. Maintenance of Docker infrastructure
# Go to hub.docker.com and create an account.
# Login with your credentials, remember your_docker_name
docker login
# Let’s create image out of the running Dataverse container
docker commit dataverse
# New image will be available on top
docker images
# Let’s put a tag on image and update internal Docker registry, replace your_docker_name
docker tag new_dataverse_image_id [your_docker_name]/dataverse:4.18.1
# Push new image to Docker Hub
docker push [your_docker_name]/dataverse:4.18.1
# Go to Docker Hub to check if the repo was updated:
https://hub.docker.com/r/[your_docker_name]/dataverse
# Visit the page https://docs.docker.com/docker-hub/repos/#pushing-a-docker-container-
image-to-docker-hub if your need more information about the update of Docker images
36. dans.knaw.nl
DANS is an institute of KNAW and NWO
How to set up, configure and manage Kubernetes clusters managed by
DANS. With emphasis on its architecture, ict-support and devops
POC Azure
management
37. Azure
Best practises in using and managing the DANS Azure-
subscription.
Azure: Cloud computing platform by Microsoft.
Azure@DANS is provided by SURFcumulus.
Cloud resources, like:
⮚-Virtual Machine (VM)
⮚-Storage (disk)
⮚-SQL database
⮚-Kubernetes (AKS)
38. Kubernetes
Open-source container-orchestration system for
automating application deployment, scaling, and
management.
-Docker container Orchestration.
-Infrastructure as Code
-Use of Health checks, restarting applications.
-(Auto)scaling cluster (horizontally and vertically).
-Controlled use of resources (CPU, Memory).
-Setup application stack for local development.
39. Best K8S practices
In this project we’ll look into some best K8S
practices for DANS.
Based on issues raised from earlier POC’s.
-Docker@DANS (2018)
-HUC2 POC (2019)
40. - Cluster Architecture
Application-wide or organisation-wide?
DTAP: Development, Testing, Acceptance and Production.
- How to separate different applications on a cluster.
- Can we separate responsibilities between ICT-Support and
developers?
Supply Persistent Storage classes by ICT-support that can be claimed by
developers.
Use of Role Based Access Control (RBAC).
- Tooling used to develop and deploy to a cluster?
Skaffold (build automation/deployment) and Helm (package manager)
41. - Use Infrastructure as Code (IaC) to provision and manage
"Azure" cloud infrastructure.
Bash scripts or Terraform.
- How to use "external" resources in a cluster.
SURF-object-storage (SWIFT), VANCIS
- Cluster costs management.
Downscaling a (development) cluster. Resource caps.
- Provide cluster-broad services.
Sending email, Auto-SSL certification, Monitoring (Prometheus),
Pipelining, etc.
42. Dataverse Cloud architecture
Ingress
HTTP(S) Load Balancer
Kubernetes Engine
Dataverse Service
Kubernetes Cluster
K8S Cluster Node
Dataverse Deployment Dataverse Service
Solr Deployment
Solr
Service
PostgreSQL
Service
PostgreSQL Deployment
Users
44. How to scale up Kubernetes horizontally
Kubernetes Engine
Compute Engine
Dataverse Service
Kubernetes Cluster
K8S Cluster Node1
Users
K8S Cluster Node2
Docker Hub
Container Registry
45. The importance of Persistent Storage
Docker containers write files to disk (I/O) for state or storage,
both in /data and /docroot folders. If a Docker container is
restarted for some reason, all data will be lost.
Solution: mount Persistent storage into the container on external
disk hosted in the Cloud.
46. Running Dataverse in production
HTTP(S) Load
Balancer Kubernetes Engine
Container Registry
Dataverse Service
Kubernetes Cluster
K8S Cluster Node
Dataverse Deployment
PostgreS
QL
Service
Solr Deployment
PostgreSQL Deployment
Users
Certbot Cronjob
Email Relay Deployment
Certbot
Service
Email
relay
Service
Dataverse Service
Solr
Service
47. Continuous deployment pipeline
1
2
3
git
push
Push GCP
container
registry
webh
ook
Create
docker
image
Kubernetes
Deployment
git clone
Jenkins pipeline
(Jenkinsfile)
75
Run tests
4 6
1. Developer pushes code to Bitbucket
2. Jenkins receives notification - build trigger
3. Jenkins clones the workspace
4. Runs tests
5. Creates docker image
6. Pushes the docker image to GCP
container registry
7. Updates the kubernetes deployment
48. Distributed Dataverse infra on Kubernetes
● Network of Dataverses with central portal to host metadata and
multiple Dataverse nodes
● Testing strategies with Selenium and Cypress
● Unit tests, integration tests and Jenkins CI/CD pipeline
● Running external applications on Kubernetes infrastructure,
OpenAIRE Amnesia tool
● Multiple languages support and maintenance, Weblate as a
service
● Using iRODS to support multiple storages for different datasets
49. Maintenance of distributed networks
● The maintenance of the distributed applications is very
difficult and expensive
● requires the highest level of service maturity
● increasing the code coverage does not necessarily lead to
more functionality coverage
● writing integration tests even more important than adding
more unit tests
● it’s almost not possible to run distributed services without
the help from community
50. Quality Assurance (QA) as a community service
Selenium IDE
allows to create
and replay all
UI tests in your
browser
Shared tests
can be reused
by Dataverse
CI/CD pipeline
Let’s work
together on it!
51. Example of Selenium .side file
● .side is the extension for
the new selenium ide
tests
● json format, every section
describes some action
● template rules can be
used by Selenium
webdriver
● can be easily integrated
in Continuous deployment
pipeline with Jenkins jobs
● running SIDE Runner with
the given parameters can
even test the different
components!