OpenShift as a cloud for Data
Science
Yegor Maksymchuk, Soft Serve, Ukraine
Agenda
• Kubernetes
• Openshift
• Kubernetes vs Openshift
• Apache Spark
• Radanalitics and OSHINKO
whoami
Yegor Maksymchuk
Software engineer,
Soft Serve Ukraine
Telegram: @QAStudy.online
GitHub: YegorMaksymchuk
LinkedIn:ymaksymchuk
Problems
Integration with Apache Spark in the Openshift.
Usability on UI and API level, it should be easy to use.
OSHINKO
Data Science in the Cloud
Kubernetes
Kubernetes:POD
apiVersion: v1
kind: Pod
metadata:
name: pod-demo
labels:
name: pod-demo
spec:
containers:
- name: pod-demo
image: yemax/pod-
demo:1
ports:
- containerPort: 8081
Kubernetes:Namespace
{
"kind": "Namespace",
"apiVersion": "v1",
"metadata": {
"name":
"development",
"labels": {
"name":
"development"
}
}
}
Kubernetes: Replica Sets
K8s: Deployment
Kubernetes: Ingress
K8s: Architecture
Openshift
Openshift: Deployment
Openshift: S2I
s2i-lighttpd/
● Dockerfile – This is a standard Dockerfile where we’ll define the builderimage
● Makefile – a helper script for buildingand testing the builderimage
● test/
○ run – test script, testing if the builder image works correctly
○ test-app/ – directory for your test application
● .s2i/bin/
○ assemble – script responsible for buildingthe application
○ run – script responsible for running the application
○ save-artifacts – script responsible for incremental builds, covered in a
future article
○ usage – script responsible for printing the usage of the builderimage
Openshift vs Kubernetes
K8s:
Orchestration tool
Ingress based on “Ngnix”
Namespace not “secure”
Openshift:
Platform as a Service
Routes based on HAProxy
Namespace “secure”, and more
understandable.
S2I
Builds new images, after push new
source.
Pool of prepared images
Data Science use Spark
Data Science in the Cloud
Apache Spark
Spark on OpenShift
OSHINKO: S2I
OSHINKO: Spark integrator
DEMO
DEMO
1. oc cluster up
2. oc new-project devops-stage-demo
3. oc create -f https://radanalytics.io/resources.yaml
4. oc create -f https://radanalytics.io/assets/zeppelin-example/zeppelin-openshift.yaml
5. oc new-app oshinko-webui
6. oshinko create devops-spark-cluster
7. oshinko get devops-spark-cluster
8. oc new-app --template=$namespace/apache-zeppelin-openshift 
--param=APPLICATION_NAME=apache-zeppelin 
--param=GIT_URI=https://github.com/rimolive/zeppelin-notebooks.git 
--param=ZEPPELIN_INTERPRETERS=md
Questions ?
yegor maksymchuk - open shift as a cloud for data science

yegor maksymchuk - open shift as a cloud for data science