The document discusses the Spark Operator, which allows deploying, managing, and monitoring Spark clusters on Kubernetes. It describes how the operator extends Kubernetes by defining custom resources and reacting to events from those resources, such as SparkCluster, SparkApplication, and SparkHistoryServer. The operator takes care of common tasks to simplify running Spark on Kubernetes and hides the complexity through an abstract operator library.
6. Operator Pattern
• Extends Kubernetes
• Resources and Controllers
• Custom Resource Definitions (CRD)
• Reacts on events when resource is CRUDed
• Sometimes referred as Custom Controllers
6#UnifiedDataAnalytics #SparkAISummit
11. Comparison
Operator can be seen merely as deployment
mechanism, but it can do much more
• Kubernetes manifests
• Helm Chart
• Ansible
• Kustomize
• Ksonnet
11#UnifiedDataAnalytics #SparkAISummit
13. Spark Operator
• Started as toy project
• Adopted by AI-CoE project OpenDataHub.io
• Compatible with Spark operator from Google to
avoid vendor lock-in
• Available also in operatorhub.io or Helm chart
or using ansible role
13#UnifiedDataAnalytics #SparkAISummit
19. Fabric8 Kubernetes client
Fluent API
Type-safety
Takes the credentials from:
• kube config file
• service account token & mounted CA cert
19#UnifiedDataAnalytics #SparkAISummit
20. Abstract Operator Library
• Automates the common tasks
• User has to only extend the class and override
couple of methods.
• Supports JSON schema as the representation
of the configuration.
• CRDs and CMs supported
20#UnifiedDataAnalytics #SparkAISummit
22. Tooling
22#UnifiedDataAnalytics #SparkAISummit
• Soit – Python CLI that verifies if container
image is “operator compliant”
• Ansible role – it supports also deploying
Prometheus together with the operator
• Oshinko-temaki – CLI that produces valid
yamls with custom resources for the operator
All the tools are available in the readme file