SlideShare a Scribd company logo
DANS is een instituut van KNAW en NWO
The world of Docker and Kubernetes
How to create, set up and manage
Kubernetes cluster at DANS: Dataverse pilot
Slava Tykhonov, Senior Information Scientist
Wilko Steinhoff, Senior Software Developer
(DANS-KNAW, The Hague, Netherlands)
Why do we need Cloud Computing?
“Cloud computing is a style of computing in which scalable and
elastic IT is delivered as a service using Internet technologies.”
“Cloud Computing is transforming the way organisations
consume computer services.”
“We can run all our workload data of applications and
processes online over the internet remotely instead of using
physical hardware and software.”
“It’s less expensive and more secure.”
Dataverse is our Pilot Cloud Service
Dataverse as a FOSS product: good news
• Dataverse is Open Source software
• Great community with more than 100 contributors
• Contributions are coming from all continents
• Maintenance cost reduces as all community members are
using the same software and helping to each other
• Governance models can be reused by different countries
• Innovation in Dataverse community goes very fast
Dataverse as a FOSS product: bad news
• Open Source doesn’t mean Free!
• Consider all required resources: both hardware and human
• Building a service is difficult, maintenance is expensive
• Integration with other services requires the management of
changes and sometimes even not possible
• technical development is fast, the expertise isn’t up-to-date
• requires continuous training and very good communication
between all partners
Dataverse Installation Guide
Before you start: installation
requires preparation!
Installation problems
Dataverse basic infrastructure seems to be very simple:
- application (Java deployed on Glassfish web server)
- database (postgres)
- search engine (SOLR)
If you’ll follow the guide and will do installation manually…
there is a great chance that it will not work.
You never know where problem lies...
● OS specific issues
● application specific bugs
● the difference between the
database version(s)
● search engine update(s)
● security patches
● hardware issues
● open/closed ports on your server
It’s even more complicated if you need
to patch the software and update a
working infrastructure every time…
locally, on test/acceptance/production.
Typical infrastructure issues
And after it finally works the security
guy is telling you that all microservices
ports on all servers should be closed…
or there is an update of software
pieces that can break the service
or brand new chinese bot is putting
your service down
or something else is happening...
Do you remember? You have to reproduce and fix it
locally, on test/acceptance/production?
Software Testing Process
Maintenance vs development
Typical outcome: hundred/thousands of hours are lost, $$$,
maintenance efforts dominating over development.
Btw, the picture is clickable….
Quiet software development
That’s how not maintainable projects are typically dying… R.I.P.
FAIRness of Software
Open Source vs Closed Source
Dark side of the Moon
Source: V. Tykhonov, API economy: transformation from closed to open innovation
Open Source paradigm for Sharing economy
Dataverse Unleashed
Dataverse isn’t competing against Figshare, Zenodo,
DSpace, CKAN, EASY or others…
Dataverse is a platform to build new innovative things
together, and to integrate all the other services.
Using Dataverse means you can join the Sharing
Economy in data and speed up own innovation based
on the community developments.
Shared economy in the data landscape
● all partners are running the same basic data infrastructure
● source code is Open Source and shared
● community is making decisions about priorities
● new custom requirements can be implemented
independently by anyone and merged with master
● sustainability of software: not maintained components
usually replaced with well-maintained during the evolution
of the product
● two and more technical solutions of the same problem are
more than welcome
● the maturity of community mean the maturity of software
Do you want to join? Use Docker for your software!
Sometimes innovation means less communication
“Docker offered a way to create independence between the
application and the infrastructure through a standardized
container format that could be created with easy-to-use
David Messina, CMO at Docker
And now honestly ask yourself: how much time you’re spending to talk
and convince sysadmins to enable or install some tools you need?
To another developer working on the same code?
To reproduce the same bug on test/acceptance/production?
Docker features
• Extremely powerful configuration tool
• Allows to install software on any platform (Linux, Mac,
• Any software can be installed from Docker as standalone
container or container delivering Microservices (database,
search engine, core service)
• Docker allows to host unlimited amount of the same
software tools on different ports
• Docker can be used to organise multilingual interfaces, for
Docker advantages
• Faster development and deployments
• Isolation of running containers allows to scale up apps
• Portability saves time to run the same image on the local
computer or in the cloud
• Snapshotting allows to archive Docker images state
• Resource limitation can be adjusted
Dataverse Docker module
This module was developed in one-year CESSDA DataverseEU
project and aimed for CESSDA Service Providers who have
limited technical resources. DANS led this project.
The goal was to deploy Dataverse software on CESSDA
Technical Infrastructure (Google Cloud). Project was funded
by the CESSDA 2018 workplan.
DataverseEU partners: ADP (Slovenia), AUSSDA (Austria),
GESIS (Germany), SND (Sweden), TARKI (Hungary),
SiencePro (France), UKDA (UK), UniData (Italy), SODA
(Belgium), LSZDA (Latvia), DANS (Netherlands)
Docker deployment with k8s in Clouds
• Google Cloud (policy for CESSDA SaW)
• Microsoft Azure
• Amazon Cloud
• OpenShift Cloud
• local Docker installation (minikube)
DANS is een instituut van KNAW en NWO
Example: Dataverse as set of Docker microservices
Docker Desktop (Community Edition)
Ideal for developers and small teams looking to get started
with Docker
- docker-for-desktop
- docker-compose support
- integrated kubernetes (minikube)
- kitematic: Visual Docker Container Management
Docker Hub
Docker Hub is registry containing images
$ docker pull httpd
Push images to Docker Hub:
$ docker login
$ docker tag my_image $DOCKER_ID_USER/my_image
$ docker push $DOCKER_ID_USER/my_image
Docker concepts
• Containers are runnable artefacts
• Images are copies of containers with filesystems
• Containers can be archived as images and executed in
different clouds
• Images can preserved in repositories
• data folders can be hosted outside of containers on
persistent volumes.
Hello world app (Flask application)
FROM python:2.7
MAINTAINER Vyacheslav Tykhonov
COPY . /widget
WORKDIR /widget
RUN pip install -r requirements.txt
ENTRYPOINT ["python"]
CMD [""]
Docker command line usage
Command line allows to manage containers and images and
execute Docker commands
$ docker help run
$ docker ps
$ docker login
$ docker pull, push, commit
$ docker build, run
$ docker exec
$ docker stop, rm, rmi
Typical Docker pipeline
Install all dependencies and build tool from scratch:
$ docker build -t parthenos:latest .
Run image from command line
$ docker run -p 8081:8081 -name parthenos parthenos
Check if container is running
$ docker ps|grep parthenos
Login inside of the container
$ docker exec -it [CONTAINER_ID] /bin/bash
Copy configuration inside of the container
$ docker cp ./parthenos.config [CONTAINER_ID]:/widget
Copy from container to local folder
$ docker [CONTAINER_ID]:/widget/* ./
Ship “dockerized” app to the world (Docker Hub or another registry)
$ docker push [IMAGE_ID]
Pipeline explanation
Credits: Arun Gupta, Package your Java EE Application using Docker and Kubernetes
Docker archiving process
Easy process to archive running software, metadata and data
• postgresql database with metadata and users information
• datasets files in separate folder
• software image with some individual settings
$ docker save -o archive.tar [CONTAINER_ID]
Easy to restore complete system with data and metadata by
Docker composer.
$ docker load archive.tar
Docker Compose
Management tool for Docker configuration for multicontainer solutions
All connections, networks, containers, port specifications stored in one file
(YML specification)
Example (DataverseEU):
Tool to turn Docker Compose to Kubernetes config called Kompose:
$ docker-compose [something]
Docker Compose is perfect tool to keep the PROVenance of software
(versions control, etc)
Dataverse Docker containers exploration
# Show Docker images
docker images
# Show all running containers
docker ps
# Remove Docker image by container_id (don’t execute)
docker rmi container_id
# Delete old images (don’t execute)
docker rmi `docker images -aq`
# To access Dataverse container, type exit to quit
docker exec -it dataverse /bin/bash
# PostgreSQL container, exit to quit
docker exec -it postgres /bin/bash
# Solr container, exit to quit
docker exec -it solr /bin/bash
# Copy files and folders to the running container
docker cp ./testfile dataverse:/tmp/
# Copy files and folders from the running container to your disk space
docker cp dataverse:/opt/dv/ /tmp/
# Stop Dataverse container
docker stop dataverse
# Run Dataverse container
docker start dataverse
Dataverse maintenance with Docker
# Open the page with latest Dataverse release
# Follow the upgrade instruction containing war and zip, optionally .tsv or .xml schema
docker exec -it dataverse /bin/bash
wget -
O dataverse.war
asadmin undeploy dataverse
rm -rf glassfish4/glassfish/domains/domain1/generated
asadmin deploy ./dataverse.war
asadmin restart
# After Glassfish will restart go to and check the version of Dataverse
# Remember: you’ll lose all changes in your Docker container after restart!
Maintenance of Docker infrastructure
# Go to and create an account.
# Login with your credentials, remember your_docker_name
docker login
# Let’s create image out of the running Dataverse container
docker commit dataverse
# New image will be available on top
docker images
# Let’s put a tag on image and update internal Docker registry, replace your_docker_name
docker tag new_dataverse_image_id [your_docker_name]/dataverse:4.18.1
# Push new image to Docker Hub
docker push [your_docker_name]/dataverse:4.18.1
# Go to Docker Hub to check if the repo was updated:[your_docker_name]/dataverse
# Visit the page
image-to-docker-hub if your need more information about the update of Docker images
DANS is an institute of KNAW and NWO
How to set up, configure and manage Kubernetes clusters managed by
DANS. With emphasis on its architecture, ict-support and devops
POC Azure
Best practises in using and managing the DANS Azure-
Azure: Cloud computing platform by Microsoft.
Azure@DANS is provided by SURFcumulus.
Cloud resources, like:
⮚-Virtual Machine (VM)
⮚-Storage (disk)
⮚-SQL database
⮚-Kubernetes (AKS)
Open-source container-orchestration system for
automating application deployment, scaling, and
-Docker container Orchestration.
-Infrastructure as Code
-Use of Health checks, restarting applications.
-(Auto)scaling cluster (horizontally and vertically).
-Controlled use of resources (CPU, Memory).
-Setup application stack for local development.
Best K8S practices
In this project we’ll look into some best K8S
practices for DANS.
Based on issues raised from earlier POC’s.
-Docker@DANS (2018)
-HUC2 POC (2019)
- Cluster Architecture
Application-wide or organisation-wide?
DTAP: Development, Testing, Acceptance and Production.
- How to separate different applications on a cluster.
- Can we separate responsibilities between ICT-Support and
Supply Persistent Storage classes by ICT-support that can be claimed by
Use of Role Based Access Control (RBAC).
- Tooling used to develop and deploy to a cluster?
Skaffold (build automation/deployment) and Helm (package manager)
- Use Infrastructure as Code (IaC) to provision and manage
"Azure" cloud infrastructure.
Bash scripts or Terraform.
- How to use "external" resources in a cluster.
SURF-object-storage (SWIFT), VANCIS
- Cluster costs management.
Downscaling a (development) cluster. Resource caps.
- Provide cluster-broad services.
Sending email, Auto-SSL certification, Monitoring (Prometheus),
Pipelining, etc.
Dataverse Cloud architecture
HTTP(S) Load Balancer
Kubernetes Engine
Dataverse Service
Kubernetes Cluster
K8S Cluster Node
Dataverse Deployment Dataverse Service
Solr Deployment
PostgreSQL Deployment
Kubernetes Engine
Compute Engine
Kubernetes Cluster
K8S Cluster Node2
Container Registry
K8S Cluster Node1
How to scale up Kubernetes horizontally
Kubernetes Engine
Compute Engine
Dataverse Service
Kubernetes Cluster
K8S Cluster Node1
K8S Cluster Node2
Docker Hub
Container Registry
The importance of Persistent Storage
Docker containers write files to disk (I/O) for state or storage,
both in /data and /docroot folders. If a Docker container is
restarted for some reason, all data will be lost.
Solution: mount Persistent storage into the container on external
disk hosted in the Cloud.
Running Dataverse in production
HTTP(S) Load
Balancer Kubernetes Engine
Container Registry
Dataverse Service
Kubernetes Cluster
K8S Cluster Node
Dataverse Deployment
Solr Deployment
PostgreSQL Deployment
Certbot Cronjob
Email Relay Deployment
Dataverse Service
Continuous deployment pipeline
Push GCP
git clone
Jenkins pipeline
Run tests
4 6
1. Developer pushes code to Bitbucket
2. Jenkins receives notification - build trigger
3. Jenkins clones the workspace
4. Runs tests
5. Creates docker image
6. Pushes the docker image to GCP
container registry
7. Updates the kubernetes deployment
Distributed Dataverse infra on Kubernetes
● Network of Dataverses with central portal to host metadata and
multiple Dataverse nodes
● Testing strategies with Selenium and Cypress
● Unit tests, integration tests and Jenkins CI/CD pipeline
● Running external applications on Kubernetes infrastructure,
OpenAIRE Amnesia tool
● Multiple languages support and maintenance, Weblate as a
● Using iRODS to support multiple storages for different datasets
Maintenance of distributed networks
● The maintenance of the distributed applications is very
difficult and expensive
● requires the highest level of service maturity
● increasing the code coverage does not necessarily lead to
more functionality coverage
● writing integration tests even more important than adding
more unit tests
● it’s almost not possible to run distributed services without
the help from community
Quality Assurance (QA) as a community service
Selenium IDE
allows to create
and replay all
UI tests in your
Shared tests
can be reused
by Dataverse
CI/CD pipeline
Let’s work
together on it!
Example of Selenium .side file
● .side is the extension for
the new selenium ide
● json format, every section
describes some action
● template rules can be
used by Selenium
● can be easily integrated
in Continuous deployment
pipeline with Jenkins jobs
● running SIDE Runner with
the given parameters can
even test the different

More Related Content

What's hot

External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemes
Vyacheslav Tykhonov
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
Vyacheslav Tykhonov
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Andrea Scharnhorst
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Creating Knowledge out of Interlinked Data

What's hot (20)

External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemes
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar: SIREn
LOD2 Webinar: SIREnLOD2 Webinar: SIREn
LOD2 Webinar: SIREn

Similar to The world of Docker and Kubernetes

Demystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data ScientistsDemystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data Scientists
Dr Ganesh Iyer
Dockers and kubernetes
Dockers and kubernetesDockers and kubernetes
Dockers and kubernetes
Dr Ganesh Iyer
Docker In Brief
Docker In BriefDocker In Brief
Docker In Brief
Ritu Kamthan
VS Code tools for docker
VS Code tools for dockerVS Code tools for docker
VS Code tools for docker
Alessandro Melchiori
Docker Bday #5, SF Edition: Introduction to Docker
Docker Bday #5, SF Edition: Introduction to DockerDocker Bday #5, SF Edition: Introduction to Docker
Docker Bday #5, SF Edition: Introduction to Docker
Docker, Inc.
Containers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsContainers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical Solutions
Jules Pierre-Louis
Week 8 lecture material
Week 8 lecture materialWeek 8 lecture material
Week 8 lecture material
Ankit Gupta
DockerCon EU 2015: Day 1 General Session
DockerCon EU 2015: Day 1 General SessionDockerCon EU 2015: Day 1 General Session
DockerCon EU 2015: Day 1 General Session
Docker, Inc.
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific Computing
Peter Bryzgalov
Tampere Docker meetup - Happy 5th Birthday Docker
Tampere Docker meetup - Happy 5th Birthday DockerTampere Docker meetup - Happy 5th Birthday Docker
Tampere Docker meetup - Happy 5th Birthday Docker
Sakari Hoisko
Docker for dev
Docker for devDocker for dev
Docker for dev
Erik Talboom
What is Docker?
What is Docker?What is Docker?
What is Docker?
Shubhrank Rastogi
Docker Birthday #5 Meetup Cluj - Presentation
Docker Birthday #5 Meetup Cluj - PresentationDocker Birthday #5 Meetup Cluj - Presentation
Docker Birthday #5 Meetup Cluj - Presentation
Alex Vranceanu
Docker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - OverviewDocker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - OverviewChris Ciborowski
Docker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker Slides
Docker, Inc.
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Patrick Chanezon
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
Patrick Chanezon
Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...
Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...
Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...
Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14
Simon Storm
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes
zekeLabs Technologies

Similar to The world of Docker and Kubernetes (20)

Demystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data ScientistsDemystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data Scientists
Dockers and kubernetes
Dockers and kubernetesDockers and kubernetes
Dockers and kubernetes
Docker In Brief
Docker In BriefDocker In Brief
Docker In Brief
VS Code tools for docker
VS Code tools for dockerVS Code tools for docker
VS Code tools for docker
Docker Bday #5, SF Edition: Introduction to Docker
Docker Bday #5, SF Edition: Introduction to DockerDocker Bday #5, SF Edition: Introduction to Docker
Docker Bday #5, SF Edition: Introduction to Docker
Containers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsContainers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical Solutions
Week 8 lecture material
Week 8 lecture materialWeek 8 lecture material
Week 8 lecture material
DockerCon EU 2015: Day 1 General Session
DockerCon EU 2015: Day 1 General SessionDockerCon EU 2015: Day 1 General Session
DockerCon EU 2015: Day 1 General Session
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific Computing
Tampere Docker meetup - Happy 5th Birthday Docker
Tampere Docker meetup - Happy 5th Birthday DockerTampere Docker meetup - Happy 5th Birthday Docker
Tampere Docker meetup - Happy 5th Birthday Docker
Docker for dev
Docker for devDocker for dev
Docker for dev
What is Docker?
What is Docker?What is Docker?
What is Docker?
Docker Birthday #5 Meetup Cluj - Presentation
Docker Birthday #5 Meetup Cluj - PresentationDocker Birthday #5 Meetup Cluj - Presentation
Docker Birthday #5 Meetup Cluj - Presentation
Docker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - OverviewDocker Birthday #3 Slides - Overview
Docker Birthday #3 Slides - Overview
Docker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker SlidesDocker Birthday #3 - Intro to Docker Slides
Docker Birthday #3 - Intro to Docker Slides
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...
Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...
Docker Enterprise Edition Overview by Steven Thwaites, Technical Solutions En...
Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14Docker dev ops for cd meetup 12-14
Docker dev ops for cd meetup 12-14
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes

More from vty

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC project
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repository

More from vty (8)

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC project
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repository

Recently uploaded

SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf

Recently uploaded (20)

SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf

The world of Docker and Kubernetes

  • 1. DANS is een instituut van KNAW en NWO The world of Docker and Kubernetes How to create, set up and manage Kubernetes cluster at DANS: Dataverse pilot Slava Tykhonov, Senior Information Scientist Wilko Steinhoff, Senior Software Developer (DANS-KNAW, The Hague, Netherlands) 11.02.2020
  • 2. Why do we need Cloud Computing? “Cloud computing is a style of computing in which scalable and elastic IT is delivered as a service using Internet technologies.” “Cloud Computing is transforming the way organisations consume computer services.” “We can run all our workload data of applications and processes online over the internet remotely instead of using physical hardware and software.” “It’s less expensive and more secure.” Dataverse is our Pilot Cloud Service
  • 3. Dataverse as a FOSS product: good news • Dataverse is Open Source software • Great community with more than 100 contributors • Contributions are coming from all continents • Maintenance cost reduces as all community members are using the same software and helping to each other • Governance models can be reused by different countries • Innovation in Dataverse community goes very fast
  • 4. Dataverse as a FOSS product: bad news • Open Source doesn’t mean Free! • Consider all required resources: both hardware and human • Building a service is difficult, maintenance is expensive • Integration with other services requires the management of changes and sometimes even not possible • technical development is fast, the expertise isn’t up-to-date • requires continuous training and very good communication between all partners
  • 6. Installation problems Dataverse basic infrastructure seems to be very simple: - application (Java deployed on Glassfish web server) - database (postgres) - search engine (SOLR) If you’ll follow the guide and will do installation manually… there is a great chance that it will not work. Why?!
  • 7. You never know where problem lies... ● OS specific issues ● application specific bugs ● the difference between the database version(s) ● search engine update(s) ● security patches ● hardware issues ● open/closed ports on your server It’s even more complicated if you need to patch the software and update a working infrastructure every time… locally, on test/acceptance/production.
  • 8. Typical infrastructure issues And after it finally works the security guy is telling you that all microservices ports on all servers should be closed… or there is an update of software pieces that can break the service or brand new chinese bot is putting your service down or something else is happening... Do you remember? You have to reproduce and fix it locally, on test/acceptance/production?
  • 10. Maintenance vs development Typical outcome: hundred/thousands of hours are lost, $$$, maintenance efforts dominating over development. Btw, the picture is clickable….
  • 11. Quiet software development That’s how not maintainable projects are typically dying… R.I.P.
  • 12. FAIRness of Software Open Source vs Closed Source
  • 13. Dark side of the Moon Source: V. Tykhonov, API economy: transformation from closed to open innovation
  • 14. Open Source paradigm for Sharing economy
  • 15. Dataverse Unleashed Dataverse isn’t competing against Figshare, Zenodo, DSpace, CKAN, EASY or others… Dataverse is a platform to build new innovative things together, and to integrate all the other services. Using Dataverse means you can join the Sharing Economy in data and speed up own innovation based on the community developments.
  • 16. Shared economy in the data landscape ● all partners are running the same basic data infrastructure ● source code is Open Source and shared ● community is making decisions about priorities ● new custom requirements can be implemented independently by anyone and merged with master (upstream) ● sustainability of software: not maintained components usually replaced with well-maintained during the evolution of the product ● two and more technical solutions of the same problem are more than welcome ● the maturity of community mean the maturity of software Do you want to join? Use Docker for your software!
  • 17. Sometimes innovation means less communication “Docker offered a way to create independence between the application and the infrastructure through a standardized container format that could be created with easy-to-use tooling.” David Messina, CMO at Docker And now honestly ask yourself: how much time you’re spending to talk and convince sysadmins to enable or install some tools you need? To another developer working on the same code? To reproduce the same bug on test/acceptance/production?
  • 18. Docker features • Extremely powerful configuration tool • Allows to install software on any platform (Linux, Mac, Windows) • Any software can be installed from Docker as standalone container or container delivering Microservices (database, search engine, core service) • Docker allows to host unlimited amount of the same software tools on different ports • Docker can be used to organise multilingual interfaces, for example
  • 19. Docker advantages • Faster development and deployments • Isolation of running containers allows to scale up apps • Portability saves time to run the same image on the local computer or in the cloud • Snapshotting allows to archive Docker images state • Resource limitation can be adjusted
  • 20. Dataverse Docker module This module was developed in one-year CESSDA DataverseEU project and aimed for CESSDA Service Providers who have limited technical resources. DANS led this project. The goal was to deploy Dataverse software on CESSDA Technical Infrastructure (Google Cloud). Project was funded by the CESSDA 2018 workplan. DataverseEU partners: ADP (Slovenia), AUSSDA (Austria), GESIS (Germany), SND (Sweden), TARKI (Hungary), SiencePro (France), UKDA (UK), UniData (Italy), SODA (Belgium), LSZDA (Latvia), DANS (Netherlands)
  • 21. Docker deployment with k8s in Clouds • Google Cloud (policy for CESSDA SaW) • Microsoft Azure • Amazon Cloud • OpenShift Cloud • local Docker installation (minikube)
  • 22. DANS is een instituut van KNAW en NWO
  • 23. Example: Dataverse as set of Docker microservices
  • 24. Docker Desktop (Community Edition) Ideal for developers and small teams looking to get started with Docker Features: - docker-for-desktop - docker-compose support - integrated kubernetes (minikube) - kitematic: Visual Docker Container Management
  • 25. Docker Hub Docker Hub is registry containing images Example: $ docker pull httpd Push images to Docker Hub: cloud/builds/push-images/ $ docker login $ docker tag my_image $DOCKER_ID_USER/my_image $ docker push $DOCKER_ID_USER/my_image
  • 26. Docker concepts • Containers are runnable artefacts • Images are copies of containers with filesystems • Containers can be archived as images and executed in different clouds • Images can preserved in repositories 95/9VCRBR • data folders can be hosted outside of containers on persistent volumes.
  • 27. Hello world app (Flask application) Dockerfile widget/blob/master/Dockerfile FROM python:2.7 MAINTAINER Vyacheslav Tykhonov COPY . /widget WORKDIR /widget RUN pip install -r requirements.txt ENTRYPOINT ["python"] CMD [""]
  • 28. Docker command line usage Command line allows to manage containers and images and execute Docker commands $ docker help run $ docker ps $ docker login $ docker pull, push, commit $ docker build, run $ docker exec $ docker stop, rm, rmi
  • 29. Typical Docker pipeline Install all dependencies and build tool from scratch: $ docker build -t parthenos:latest . Run image from command line $ docker run -p 8081:8081 -name parthenos parthenos Check if container is running $ docker ps|grep parthenos Login inside of the container $ docker exec -it [CONTAINER_ID] /bin/bash Copy configuration inside of the container $ docker cp ./parthenos.config [CONTAINER_ID]:/widget Copy from container to local folder $ docker [CONTAINER_ID]:/widget/* ./ Ship “dockerized” app to the world (Docker Hub or another registry) $ docker push [IMAGE_ID]
  • 30. Pipeline explanation Credits: Arun Gupta, Package your Java EE Application using Docker and Kubernetes
  • 31. Docker archiving process Easy process to archive running software, metadata and data separately • postgresql database with metadata and users information • datasets files in separate folder • software image with some individual settings $ docker save -o archive.tar [CONTAINER_ID] Easy to restore complete system with data and metadata by Docker composer. $ docker load archive.tar
  • 32. Docker Compose Management tool for Docker configuration for multicontainer solutions All connections, networks, containers, port specifications stored in one file (YML specification) Example (DataverseEU): Tool to turn Docker Compose to Kubernetes config called Kompose: Usage: $ docker-compose [something] Docker Compose is perfect tool to keep the PROVenance of software (versions control, etc)
  • 33. Dataverse Docker containers exploration # Show Docker images docker images # Show all running containers docker ps # Remove Docker image by container_id (don’t execute) docker rmi container_id # Delete old images (don’t execute) docker rmi `docker images -aq` # To access Dataverse container, type exit to quit docker exec -it dataverse /bin/bash # PostgreSQL container, exit to quit docker exec -it postgres /bin/bash # Solr container, exit to quit docker exec -it solr /bin/bash # Copy files and folders to the running container docker cp ./testfile dataverse:/tmp/ # Copy files and folders from the running container to your disk space docker cp dataverse:/opt/dv/ /tmp/ # Stop Dataverse container docker stop dataverse # Run Dataverse container docker start dataverse
  • 34. Dataverse maintenance with Docker # Open the page with latest Dataverse release # Follow the upgrade instruction containing war and zip, optionally .tsv or .xml schema docker exec -it dataverse /bin/bash wget - O dataverse.war asadmin undeploy dataverse rm -rf glassfish4/glassfish/domains/domain1/generated asadmin deploy ./dataverse.war asadmin restart # After Glassfish will restart go to and check the version of Dataverse # Remember: you’ll lose all changes in your Docker container after restart!
  • 35. Maintenance of Docker infrastructure # Go to and create an account. # Login with your credentials, remember your_docker_name docker login # Let’s create image out of the running Dataverse container docker commit dataverse # New image will be available on top docker images # Let’s put a tag on image and update internal Docker registry, replace your_docker_name docker tag new_dataverse_image_id [your_docker_name]/dataverse:4.18.1 # Push new image to Docker Hub docker push [your_docker_name]/dataverse:4.18.1 # Go to Docker Hub to check if the repo was updated:[your_docker_name]/dataverse # Visit the page image-to-docker-hub if your need more information about the update of Docker images
  • 36. DANS is an institute of KNAW and NWO How to set up, configure and manage Kubernetes clusters managed by DANS. With emphasis on its architecture, ict-support and devops POC Azure management
  • 37. Azure Best practises in using and managing the DANS Azure- subscription. Azure: Cloud computing platform by Microsoft. Azure@DANS is provided by SURFcumulus. Cloud resources, like: ⮚-Virtual Machine (VM) ⮚-Storage (disk) ⮚-SQL database ⮚-Kubernetes (AKS)
  • 38. Kubernetes Open-source container-orchestration system for automating application deployment, scaling, and management. -Docker container Orchestration. -Infrastructure as Code -Use of Health checks, restarting applications. -(Auto)scaling cluster (horizontally and vertically). -Controlled use of resources (CPU, Memory). -Setup application stack for local development.
  • 39. Best K8S practices In this project we’ll look into some best K8S practices for DANS. Based on issues raised from earlier POC’s. -Docker@DANS (2018) -HUC2 POC (2019)
  • 40. - Cluster Architecture Application-wide or organisation-wide? DTAP: Development, Testing, Acceptance and Production. - How to separate different applications on a cluster. - Can we separate responsibilities between ICT-Support and developers? Supply Persistent Storage classes by ICT-support that can be claimed by developers. Use of Role Based Access Control (RBAC). - Tooling used to develop and deploy to a cluster? Skaffold (build automation/deployment) and Helm (package manager)
  • 41. - Use Infrastructure as Code (IaC) to provision and manage "Azure" cloud infrastructure. Bash scripts or Terraform. - How to use "external" resources in a cluster. SURF-object-storage (SWIFT), VANCIS - Cluster costs management. Downscaling a (development) cluster. Resource caps. - Provide cluster-broad services. Sending email, Auto-SSL certification, Monitoring (Prometheus), Pipelining, etc.
  • 42. Dataverse Cloud architecture Ingress HTTP(S) Load Balancer Kubernetes Engine Dataverse Service Kubernetes Cluster K8S Cluster Node Dataverse Deployment Dataverse Service Solr Deployment Solr Service PostgreSQL Service PostgreSQL Deployment Users
  • 43. Kubernetes Engine Compute Engine Dataverse Service Kubernetes Cluster Users K8S Cluster Node2 Docker Hub Container Registry K8S Cluster Node1
  • 44. How to scale up Kubernetes horizontally Kubernetes Engine Compute Engine Dataverse Service Kubernetes Cluster K8S Cluster Node1 Users K8S Cluster Node2 Docker Hub Container Registry
  • 45. The importance of Persistent Storage Docker containers write files to disk (I/O) for state or storage, both in /data and /docroot folders. If a Docker container is restarted for some reason, all data will be lost. Solution: mount Persistent storage into the container on external disk hosted in the Cloud.
  • 46. Running Dataverse in production HTTP(S) Load Balancer Kubernetes Engine Container Registry Dataverse Service Kubernetes Cluster K8S Cluster Node Dataverse Deployment PostgreS QL Service Solr Deployment PostgreSQL Deployment Users Certbot Cronjob Email Relay Deployment Certbot Service Email relay Service Dataverse Service Solr Service
  • 47. Continuous deployment pipeline 1 2 3 git push Push GCP container registry webh ook Create docker image Kubernetes Deployment git clone Jenkins pipeline (Jenkinsfile) 75 Run tests 4 6 1. Developer pushes code to Bitbucket 2. Jenkins receives notification - build trigger 3. Jenkins clones the workspace 4. Runs tests 5. Creates docker image 6. Pushes the docker image to GCP container registry 7. Updates the kubernetes deployment
  • 48. Distributed Dataverse infra on Kubernetes ● Network of Dataverses with central portal to host metadata and multiple Dataverse nodes ● Testing strategies with Selenium and Cypress ● Unit tests, integration tests and Jenkins CI/CD pipeline ● Running external applications on Kubernetes infrastructure, OpenAIRE Amnesia tool ● Multiple languages support and maintenance, Weblate as a service ● Using iRODS to support multiple storages for different datasets
  • 49. Maintenance of distributed networks ● The maintenance of the distributed applications is very difficult and expensive ● requires the highest level of service maturity ● increasing the code coverage does not necessarily lead to more functionality coverage ● writing integration tests even more important than adding more unit tests ● it’s almost not possible to run distributed services without the help from community
  • 50. Quality Assurance (QA) as a community service Selenium IDE allows to create and replay all UI tests in your browser Shared tests can be reused by Dataverse CI/CD pipeline Let’s work together on it!
  • 51. Example of Selenium .side file ● .side is the extension for the new selenium ide tests ● json format, every section describes some action ● template rules can be used by Selenium webdriver ● can be easily integrated in Continuous deployment pipeline with Jenkins jobs ● running SIDE Runner with the given parameters can even test the different components!