Analytical platforms for PoCs and evaluation can be built in the cloud in an hour - with ready-made setup scripts. But if you put the services together freely, it gets more difficult. The open-source platform-in-a-box "Platys" (https://github.com/TrivadisPF/platys) shows that it is easier for test and PoC environments. In addition to possible uses and examples, we explain services and "just briefly" set up a data lake with a database, event broker, stream processing, blob store, SQL access and data science notebook.
2. Guido Schmutz
@gschmutz https://guidoschmutz.wordpress.com/
• Working at Trivadis for more than 24 years
• Platform Architect
• Big Data, Fast Data, Data Lake, Kafka, NoSQL, RDBMS,
SOA, Integration
• Oracle Groundbreaker Ambassador & Oracle ACE Director
4. Reasons for platys
For Labs, PoC and PoV projects as well as trainings we wanted a platform, which
1. is easy to deploy
2. supports modular, extensible and configurable set of services
3. supports adding new services quickly
4. requires no infrastructure specialist for provisioning
5. is lightweight, low footprint (infrastructure requirements)
6. runs on-premises and in the cloud
7. can later be replaced by production ready platform with minimal effort
5. Platform for Lab, PoC and PoV projects
Bare Metal VM Cloud PaaS Docker & Docker Compose
1 Easy Deployment + ++ +++ +++
2 Modular, extensible ++ ++ ++ +++
3 Add new services quickly + + ++ (1) +++
4 No infrastructre specialist needed + ++ +++ +++
5 Lightweight footprint + ++ + +++
6 Simple, lightweight transportation - + ++ (2) +++
7 On prem & cloud + +++ ++ +++ (3)
1) If exists as PaaS
2) Within same cloud provider
3) Runs anywhere Docker & Docker Compose runs
6. What is Docker, what is Docker Compose?
Docker – running a single service
Docker Compose – running a set of services belonging together
docker run -d –p 18630:18630 –e SDC_CONF_MONITOR_MEMORY:true -v sdc-data:/data
--name streamsets-dc streamsets/datacollector:3.15.0
docker-compose up -d streamsets-1:
image: streamsets/datacollector:3.15.0
ports:
- 18630:18630
environment:
SDC_CONF_MONITOR_MEMORY: 'true'
SDC_CONF_PIPELINE_MAX_RUNNERS_COUNT: 50
volumes:
- ./sdc-data:/data
kafka-1:
image: confluentinc/cp-enterprise-kafka:5.5.0
depends_on:
- zookeeper-1
ports:
- 9092:9092
...
7. platys – Trivadis Platform in a Box
• open-source toolset, developed by the Trivadis Platform Factory
• Hosted on GitHub: https://github.com/TrivadisPF/platys & https://github.com/TrivadisPF/platys-
modern-data-platform
• Helps provisioning Modern Data Platforms container-based on Docker and Docker Compose
• Its main use is for
• small-scale Data Lab projects
• Proof-of-Concepts (PoC) or Proof-of-value (PoV) projects
• Trivadis Trainings (internal, external, on-site)
• Development environments (each developer can have its own setup)
• Platys is a LABORATORY, not a production system - so it offers primarily functional aspects for
learning, developing and testing. But no (production-ready) security, HA, backup, high performance
etc.
9. Trivadis platys – How to install it?
For example on a freshly installed Ubuntu Linux machine, perform these steps:
# Install Docker
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs)
stable edge"
apt-get install -y docker-ce
# Install Docker Compose
curl -L "https://github.com/docker/compose/releases/download/1.25.3/docker-compose-$(uname -s)-
$(uname -m)" -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
# Install Platys
curl -L
"https://github.com/TrivadisPF/platys/releases/download/2.4.0/platys_2.4.0_linux_x86_64.tar.gz" -o
/tmp/platys.tar.gz
tar zvxf /tmp/platys.tar.gz
mv platys /usr/local/bin/
chown root:root /usr/local/bin/platys
rm platys.tar.gz
10. Trivadis platys – Platform in a Box
Steps for working with platys
1. Init a platform based on a given platform
stack
2. Edit the config.yml files to enable services
needed
3. Generate the docker-compose platform
4. Start the platform
11. Platys – How does it work? (I)
• Let’s see Platys in Action by
generating an Analytics Platform with
the services in green
12. Platys – How does it work? (II)
1. Create a folder as your platform working directory
2. Initialize the platform using the platys init command
3. A config.yml file is created locally, holding all the possible options of the modern-data-
platform 1.12.0 stack. See here for a documentation of the various options.
platys init -n platys-demo-platform
-s trivadis/platys-modern-data-platform -w 1.12.0
mkdir platys-demo
dd platys-demo
13. Platys – How does it work? (III)
4. Use an editor to set the needed services to true.
# Default values for the generator
# this file can be used as a template for a custom configuration
# or to know about the different variables available for the generator
platys:
platform-name: ‘platys-demo'
stack-image-name: 'trivadis/platys-modern-data-platform'
stack-image-version: ‘1.12.0’
structure: ‘flat’
# ===== Apache Zookeeper ========
KAFKA_enable: true
# one of enterprise, community
KAFKA_edition: 'community'
KAFKA_volume_map_data: false
KAFKA_broker_nodes: 3
KAFKA_delete_topic_enable: false
KAFKA_auto_create_topics_enable: false
14. Platys – How does it work? (IV)
5. Run platys gen to generate your platform
6. All the necessary artefacts for the Docker Compose platform have been generated and are ready to
be started
7. Start the platform using docker-compose up
platys gen
docker-compose up -d
15. Summary
• Platys makes it easy to provision analytics platforms in a very agile manner
• Minimal infrastructure requirements (just a “beefy” Linux machine with Docker
& Docker Compose is good enough)
• To be used for PoC, PoV projects and demos, trainings …
• Constantly enhanced by new services with all docker images available publicly
on Docker Hub (except Oracle RDBMS)
• Solutions build on top of the platform can later be migrated to Cloud managed
services or to a Kubernetes-orchestrated platform with minimal efforts