Open economic data related to public budgeting, spending and prices are characterized by high volume, velocity, variety and veracity.
10 virtual machines with memory and storage capacities that span from 2GB to 8GB RAM and 20GB to 100GB respectively, as well as a non-commodity (physical) server of 12 CPUs, 64GB RAM and a storage capacity of more than 4TB.
This map shows which municipalities are the most expensive on a specific product ie. Milk, fruits, or petrol etc The scale of the color gives a perception of the price of the product to a municipality.. More red more expensive.
Also we are using QGIS in order to display on the map geoinformation of the supermarkets or other POIs
The system consists of : CKAN data portal, Drupal, Virtuoso, MySQLs, QGIS server, CouchDB and many scripts of different technologies and scope. We are using such a system of apps in order to elaborate information from different data sources.
As we mentioned before the system is established on a cloud-based infrastructure ~okeanos. There is a need in some cases to move the system or back it– up on different cloud or physical infrastructures. Here is where Docker came and help us to achieve that , almost very easily and without many efforts.
We started to dockerize the services one by one until we decided use the new Compose 2. Compose creates the entire system with a single command. docker-compose up –d
And not only that, also it creates an internal network and attaches the containers to that automatically.
Policy no Do not automatically restart the container when it exits. This is the default. on-failure[:max-retries] Restart only if the container exits with a non-zero exit status. Optionally, limit the number of restart retries the Docker daemon attempts. always Always restart the container regardless of the exit status. When you specify always, the Docker daemon will try to restart the container indefinitely. The container will also always start on daemon startup, regardless of the current state of the container. unless-stopped Always restart the container regardless of the exit status, but do not start it on daemon startup if the container has been put to a stopped state before. An ever increasing delay (double the previous delay, starting at 100 milliseconds) is added before each restart to prevent flooding the server. This means the daemon will wait for 100 ms, then 200 ms, 400, 800, 1600, and so on until either the on-failure limit is hit, or when you docker stop or docker rm -f the container. If a container is successfully restarted (the container is started and runs for at least 10 seconds), the delay is reset to its default value of 100 ms. You can specify the maximum amount of times Docker will try to restart the container when using the on-failure policy. The default is that Docker will try forever to restart the container. The number of (attempted) restarts for a container can be obtained via docker inspect. For example, to get the number of restarts for container “my-container”;
Cluster management integrated with Docker Engine: Use the Docker Engine CLI to create a Swarm of Docker Engines where you can deploy application services. You don’t need additional orchestration software to create or manage a Swarm.
Decentralized design: Instead of handling differentiation between node roles at deployment time, the Docker Engine handles any specialization at runtime. You can deploy both kinds of nodes, managers and workers, using the Docker Engine. This means you can build an entire Swarm from a single disk image.
Declarative service model: Docker Engine uses a declarative approach to let you define the desired state of the various services in your application stack. For example, you might describe an application comprised of a web front end service with message queueing services and a database backend.
Scaling: For each service, you can declare the number of tasks you want to run. When you scale up or down, the swarm manager automatically adapts by adding or removing tasks to maintain the desired state.
Desired state reconciliation: The swarm manager node constantly monitors the cluster state and reconciles any differences between the actual state your expressed desired state. For example, if you set up a service to run 10 replicas of a container, and a worker machine hosting two of those replicas crashes, the manager will create two new replicas to replace the ones that crashed. The swarm manager assigns the new replicas to workers that are running and available.
Multi-host networking: You can specify an overlay network for your services. The swarm manager automatically assigns addresses to the containers on the overlay network when it initializes or updates the application.
Service discovery: Swarm manager nodes assign each service in the swarm a unique DNS name and load balances running containers. You can query every container running in the swarm through a DNS server embedded in the swarm.
Load balancing: You can expose the ports for services to an external load balancer. Internally, the swarm lets you specify how to distribute service containers between nodes.
Secure by default: Each node in the swarm enforces TLS mutual authentication and encryption to secure communications between itself and all other nodes. You have the option to use self-signed root certificates or certificates from a custom root CA.
Rolling updates: At rollout time you can apply service updates to nodes incrementally. The swarm manager lets you control the delay between service deployment to different sets of nodes. If anything goes wrong, you can roll-back a task to a previous version of the service.
What is Consul? Consul has multiple components, but as a whole, it is a tool for discovering and configuring services in your infrastructure. It provides several key features: Service Discovery: Clients of Consul can provide a service, such as api or mysql, and other clients can use Consul to discover providers of a given service. Using either DNS or HTTP, applications can easily find the services they depend upon. Health Checking: Consul clients can provide any number of health checks, either associated with a given service ("is the webserver returning 200 OK"), or with the local node ("is memory utilization below 90%"). This information can be used by an operator to monitor cluster health, and it is used by the service discovery components to route traffic away from unhealthy hosts. Key/Value Store: Applications can make use of Consul's hierarchical key/value store for any number of purposes, including dynamic configuration, feature flagging, coordination, leader election, and more. The simple HTTP API makes it easy to use. Multi Datacenter: Consul supports multiple datacenters out of the box. This means users of Consul do not have to worry about building additional layers of abstraction to grow to multiple regions.
Dockerizing a multi-component Open Data app
Dockerizing a multi-
component Open Data app
Athens Docker Meetup, June 2016
Dimitris Negkas, Stergios Tsiafoulis
Description and Scope
is a publicly available web platform and linked data
its scope is to transform, curate, aggregate,
interlink and publish economic data in machine-
readable format, to enable
research with unprecedented data
Sources Currently used:
Transparency – DIAVGEIA
Central Electronic Registry of Public Procurement - E-
National Strategic Reference Framework (NSRF)
Central Market of Thessaloniki (CMT)
Municipality of Athens, Municipality of Thessaloniki
Government of Australia
we use Open Link Virtuoso for 15 different sources
of nearly 1B triples
we host 27 datasets in CKAN from 15 organizations
data is increased respectively each month
Each data source is separately handled and processed as its
available data are not uniformly provided or in machine-
Diavgeia, “NSRF” and Observatories for product and fuel
prices provide a rich API interface that can be easily
queried in order to provide machine-readable data in JSON
In the cases of E-Procurement, “CMT” and “Municipalities
of Athens and Thessaloniki” there is no API available.
Thus, we have developed a software module, which gathers
online information in an automated way, storing it in a
Open economic data related to public budgeting,
spending and prices are characterized of high
volume, velocity, variety and veracity
We have to build custom components under the
common logic of transforming static data to
linked open data streams.
Process model: Nucleus
The nucleus of our
approach is semantic
Data are stored in raw
(as harvested from
sources), in RDF and
Process model : Data distribution
Enriched data are
distributed though five
1. Data dumps (CKAN),
2. SPARQL queries,
4. Social media
5. Structured inputs to
Business Intelligence (BI)
Additionally, data can be
further analysed and
exchanged with relevant
platforms (e.g. SPARQL to
Process model : Validation and
throughout the whole
process in order to
safeguard high data
quality by detecting
component works as an
internal messaging and
alert system for all
Save your data !!
Will build the image from
Do not use flag “always”
in your development
Will start the service only
after MySQL service
Will link the container
with MySQL container
#Wherever you want to mount your data from
#Unix socket for X11
Build the system
Clone the repository from github
Create the directories where you are going to link your
Enter docker-compose up -d and that’s it !!
Why Docker ?
o Move to different cloud infrastructures
and to Physical servers
o Run on Virtual Machines for
development and testing
o Easily Scale
o Easy Delivery and deployment
o Run Anywhere (regardless host distro,
physical, cloud or not )
o Run Anything