Dockerizing a multi-component Open Data app

Dockerizing a multi-
component Open Data app
Athens Docker Meetup, June 2016
Dimitris Negkas, Stergios Tsiafoulis
dimneg@gmail.com, s.tsiafoulis@gmail.com

Description and Scope
LinkedEconomy (http://linkedeconomy.org/).
 is a publicly available web platform and linked data
repository.
 its scope is to transform, curate, aggregate,
interlink and publish economic data in machine-
readable format, to enable
 citizens awareness
 research with unprecedented data
 evidence-based policy

Data Sources
 Sources Currently used:
 Transparency – DIAVGEIA
 Central Electronic Registry of Public Procurement - E-
Procurement
 National Strategic Reference Framework (NSRF)
 Central Market of Thessaloniki (CMT)
 e-Prices
 Fuel Prices
 Municipality of Athens, Municipality of Thessaloniki
 Government of Australia

Data growth
 we use Open Link Virtuoso for 15 different sources
of nearly 1B triples
 we host 27 datasets in CKAN from 15 organizations
 data is increased respectively each month

Data processing
 Each data source is separately handled and processed as its
available data are not uniformly provided or in machine-
readable format.
 Diavgeia, “NSRF” and Observatories for product and fuel
prices provide a rich API interface that can be easily
queried in order to provide machine-readable data in JSON
format.
 In the cases of E-Procurement, “CMT” and “Municipalities
of Athens and Thessaloniki” there is no API available.
Thus, we have developed a software module, which gathers
online information in an automated way, storing it in a
machine-readable format.

General Architecture
 Process model
 Open economic data related to public budgeting,
spending and prices are characterized of high
volume, velocity, variety and veracity
 We have to build custom components under the
common logic of transforming static data to
linked open data streams.

Process model: Nucleus
 The nucleus of our
approach is semantic
modelling, data
enrichment and
interconnections.
 Data are stored in raw
(as harvested from
sources), in RDF and
json formats.

Process model : Data distribution
 Enriched data are
distributed though five
channels:
1. Data dumps (CKAN),
2. SPARQL queries,
3. Web,
4. Social media
5. Structured inputs to
Business Intelligence (BI)
systems.
 Additionally, data can be
further analysed and
exchanged with relevant
platforms (e.g. SPARQL to
R).

Process model : Validation and
messenger
 The validation
component runs
throughout the whole
process in order to
safeguard high data
quality by detecting
errors.
 The messaging
component works as an
internal messaging and
alert system for all
components.

Infrastructure
Functionalities /
Components Services / Data sources
VM1 linkedeconomy.org apache, php, mysql, drupal
VM2 SPARQL endpoint, demo site OLV, apache, php, mysql, drupal
VM3 Harvester
CouchDB, Lucene, apache, mysql / CKAN
(Greek Datasets)
VM4 Harvester, Messenger mysql, LinkedEconomy dropbox
VM5 Storage - Secondary triplestore CouchDB, OLV, CouchDB-Lucene, docker
VM6 Harvester
apache, php, mysql, drupal / CKAN (Foreign
Datasets)
VM7 SPARQL endpoint OLV (Foreign graphs)
VM8 Management JIRA, mysql, tomcat
VM9 Dashboard front-end, CMS, INSPINIA
VM10 System administration VPN, firewalls, etc.
Physical Storage - Core triplestore OLV (Greek graphs)
As core infrastructure we use ~okeanos, which is an established cloud-based
service provided for the Greek research and academic community.

“Hottest” Prices per municipality

Application System
Small Applications
Java, Php and UNIX Scripts
Di@vgeia
KHMDHS
Virtuoso
CouchDB
Drupal
MySql
ePrices
CKAN
fuelPricesQGIS

Dockerize the System
Di@vgeia
KHMDHS
ePrices
Virtuoso
Drupal
MySql
QGIS Desktop
CouchDB
QGIS Server
Small Applications
CKAN

Docker MySQL
 version: '2'
 services:
 mysql:
 build: ./mysql-docker/5.6
 container_name: eLodDrupalmySQL
 volumes:
 - /mysql_drupal:/var/lib/mysql
 environment:
 - MYSQL_DATABASE=drupalelod
 - MYSQL_ROOT_PASSWORD=eLodmysqlpass
 restart: on-failure
Save your data !!
Will build the image from
your directory
Do not use flag “always”
in your development
environment!

Docker Drupal
 drupal:
 build: ./docker-drupal
 command:
 - /start.sh
 depends_on:
 - mysql
 container_name: eLodDrupal
 #image: eLodDrupal
 ports:
 - "8081:80"
 volumes:
 - "/data_drupal:/var/www/html"
 links:
 - "mysql"
 environment:
 - MYSQL_DATABASE=drupalelod
 - MYSQL_USER=root
 - MYSQL_PASSWORD=eLodmysqlpass
 - DRUPAL_ADMIN_PW=eLODDR
 - DRUPAL_ADMIN=admin
 - MYSQL_HOST=eLodDrupalmySQL
 - DRUPAL_ADMIN_EMAIL=stetsiafoulis@gmail.com
Will start the service only
after MySQL service
Will link the container
with MySQL container

Docker Virtuoso
 virtuoso:
 build: ./docker-virtuoso
 container_name: eLodVirtuoso
 ports:
 - "8890:8890"
 volumes:
 - /virtuoso/db:/var/lib/virtuoso/db
 environment:
 - DBA_PASSWORD=eLodVir
 - SPARQL_UPDATE=true
 - DEFAULT_GRAPH=http://localhost:8890/DAV

Docker QGIS
 qgisdesktop:
 #image: kartoza/qgis-desktop:2.14
 build: ./qgis-desktop/2.14
 hostname: qgis-server
 volumes:
 #Wherever you want to mount your data from
 - ./gis:/gis
 #Unix socket for X11
 - "/tmp/.X11-unix:/tmp/.X11-unix"
 links:
 - db:db
 environment:
 - DISPLAY=unix:1
 command: /usr/bin/qgis

Build the system
 Clone the repository from github
https://github.com/stetsiafoulis/eLOD
 Create the directories where you are going to link your
data
 Enter docker-compose up -d and that’s it !!

Why Docker ?
o Portable
o Lightweight
o Move to different cloud infrastructures
and to Physical servers
o Run on Virtual Machines for
development and testing
o Easily Scale
o Easy Delivery and deployment
o Run Anywhere (regardless host distro,
physical, cloud or not )
o Run Anything

Scaling per Source
Di@ygeia KHMDHS
Virtuoso
Drupal
MySql
QGIS Desktop
CouchDB
QGIS Server
Small Applications
Virtuoso
Drupal
MySql
CouchDB
QGIS Server
Small ApplicationsQGIS Desktop

Run Small Apps through Docker
API
Small Applications

Next Steps - Swarm
Virtuoso
Drupal
MySql
CouchDB
QGIS Server
Cluster management
Scaling
State reconciliation
Multi-host networking
Service discovery
Load balancing

Next Steps - Consul
Health CheckingService Discovery
Multi Datacenter support

Appendix - Data Sources links
 LinkedEconomy (http://linkedeconomy.org/).
 linkedeconomy@gmail.com
 Sources Currently used:
 Transparency - DIAVGEIA: https://diavgeia.gov.gr
 Central Electronic Registry of Public Procurement - E-Procurement (KHDMHS):
http://www.eprocurement.gov.gr
 National Strategic Reference Framework (NSRF):https://www.espa.gr/en
 Central Market of Thessaloniki (CMT):http://www.kath.gr/
 e-Prices: http://www.e-prices.gr/
 Fuel Prices: http://www.fuelprices.gr/
 Municipality of Athens: https://www.cityofathens.gr/khe/proypologismos
 Municipality of Thessaloniki:
http://www.thessaloniki.gr/portal/page/portal/DioikitikesYpiresies/GenDnsiDioikOikonYpiresion/DnsiDiafanEksipirDimoton/Tmima
Diafaneias/AnoiktiDdiathesiDedomenon/DimosiefsiEktelesisProipologismou/ektelesi-proypologismou
 Government of Australia: http://data.gov.au/

Dockerizing a multi-component Open Data app

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Dockerizing a multi-component Open Data app

Similar to Dockerizing a multi-component Open Data app (20)

Recently uploaded

Recently uploaded (20)

Dockerizing a multi-component Open Data app

Editor's Notes