SlideShare a Scribd company logo
Running secured Spark job in
Kubernetes compute cluster and
integrating with Kerberized HDFS
Joy Chakraborty
June 21, 2018
Email: joychak1@[yahoo/gmail].com]]
• To run Spark job on Elastic
Compute platform (Kubernetes)
accessing data stored in HDFS
Essentially what we will be doing -
3
Who am I ???
I learn and apply ….
- Design and write software for living
- for last 18 years …
4
Disclaimer
We will be just scratching the Surface !!!
Agenda
5
Kubernetes as Elastic Compute1
HDFS as secured distributed storage2
Configuring Spark to run in Kubernetes & accessing HDFS3
Demo - Build & setup the Kubernetes/Spark/HDFS environment4
6
Why Kubernetes?
7
Compute Requirements
8
Compute Requirements
• Support Elasticity
• Flexibility and variability in work load
• Process massive amount of data in parallel
• Support Reliability & Multitenancy
• Ease in Accessibility without compromising
Security
9
Distributed
Containerized
10
• Container Management
• Scheduling
• Resource Management
Distributed Containerized System
11
• Container Management
• Scheduling
• Resource Management
Kubernetes
Distributed Containerized System
12
what?
13
Kubernetes
An open-source system for automating deployment, scaling,
and management of containerized applications.
• Portable
• Extensible: modular, plug-able, compos-able
• Self-healing: auto-placement, auto-restart, auto-
replication, auto-scaling
• Latest Release: 1.8.13
• Community driven completely
• Written in GoLang
14
Kubernetes High Level Architecture
15
Kubernetes High Level Architecture
• The Kubernetes Master is a collection
of three processes that run on a single
node in your cluster, which is
designated as the master node.
• kube-apiserver
• kube-controller-manager
• kube-scheduler
• Each individual non-master node in
your cluster runs two processes –
• kubelet, which communicates with
the Kubernetes Master
• kube-proxy, a network proxy which
reflects Kubernetes networking
services on each node.
Kube-apiserver
Kube-Controller-manager
16
Kubernetes - basic
components
17
Kubernetes Object/Resource (out of the box)
• Namespace
• Pod (a basic unit of work)
• Service
• Volume
18
Kubernetes Object/Resource (out of the box)
• Namespace
• Pod (a basic unit of work)
• Service
• Volume
• ReplicaSet
• Deployment
• Job
19
Kubernetes Object/Resource (out of the box)
• Namespace
• Pod (a basic unit of work)
• Service
• Volume
• ReplicaSet
• Deployment
• Job
• RBAC
20
Kubernetes Object/Resource (out of the box)
• Namespace
• Pod (a basic unit of work)
• Service
• Volume
• ReplicaSet
• Deployment
• Job
• RBAC
*** Custom Resource
21
Kubernetes Pod
• Pod is the basic building block of Kubernetes–
• the smallest and simplest unit in the Kubernetes object model
that you create or deploy
• represents a running process on your cluster.
• encapsulates an application container (or, in some cases,
multiple containers), storage resources, a unique network IP,
and options that govern how the container(s) should run
• Docker is the most common container runtime used in a Pod,
but Pods support other container runtimes as well
22
How to Interact with Kubernetes
23
How to Interact with Kubernetes
• REST API
• Curl or browser
• Command Line Interface
• Kubectl
• Kube Config
• Created during cluster
creation
• Programmatic
• GoLang, Python, Scala
24
How Secured
(kerberized) HDFS
works?
25
KDC
AS TGS
Active Directory
Client Machine
Client Application
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
Secured/Kerberized HDFS
26
KDC
AS TGS
Active Directory
Client Machine
Client Application
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
2. Client-app requests Ticket
3. KDC sends TGT
1. Service
Principles/Keys
6. Sends Service Ticket and requests for Authentication
5. User Authenticated
using Service Principle/key
Retrieves
User roles/permissions
Secured/Kerberized (keytab) HDFS
27
KDC
AS TGS
Active Directory
Client Machine
Client Application
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
2. Client-app requests Ticket
3. KDC sends TGT
1. Service
Principles/Keys
6. Sends Service Ticket and requests for Delegation Token
7. User Authenticated
using Service Principle/key
Retrieves
User roles/permissions
Secured/Kerberized (delegation token) HDFS
8. Name node sends delegation token
28
Spark + Kubernetes +
HDFS
29
Spark Running in Kubernetes
Kubernetes ClusterKube Master
30
Spark Running in Kubernetes
Kubernetes Cluster
Spark-Submit
(--master k8s://…)
Kube Master
31
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Spark-Submit
Spark
Master
Kube Master
32
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Docker Hub
Spark Docker Images
Spark-Submit
Spark
Master
Kube Master
33
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Docker Hub
Spark Docker Images
Spark-Submit
Spark
Master
Kube Master
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
34
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Docker Hub
Spark Docker Images
Spark-Submit
Spark
Master
Kube Master
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
Kube Secret
Keytab
35
Spark Running in Kubernetes
Kubernetes Cluster
Worker
Pod
Worker
Pod
Worker
Pod
Worker
Pod
Spark Job
Driver
Pod
Docker Hub
Spark Docker Images
Spark-Submit
Spark
Master
Kube Master
HDFS
Namenode
Data
Node
Data
Node
Data
Node
Data
Node
Kube Secret
Keytab
36
Lets build the
Kubernetes/Spark/HDFS
environment !!!
37
What to use
• To install Kubernetes
• vagrant-kubeadm (https://github.com/c9s/vagrant-kubeadm)
• Run local Docker Hub (https://docs.docker.com/registry/deploying )
• Run Docker registry container and map localhost to
some ip (10.0.2.2)
• Build Spark Docker image from Spark download Dockerfile
under Kubernetes directory (spark-2.3.0-bin-hadoop2.7/ kubernetes)
• Publish the image to Docker hub
• Run Spark-submit
38
Demo !!!
Email: joychak1@[yahoo/gmail].com]
Thank You
39

More Related Content

What's hot

Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
DataWorks Summit
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache Ranger
DataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?
DataWorks Summit
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Databricks
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
DataWorks Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
DataWorks Summit
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to Core
DataWorks Summit
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Dinesh Chitlangia
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
DataWorks Summit
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at Uber
DataWorks Summit
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
DataWorks Summit
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Spark Summit
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 

What's hot (20)

Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache Ranger
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to Core
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at Uber
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 

Similar to Running secured Spark job in Kubernetes compute cluster and integrating with Kerberized HDFS

01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
VMUG IT
 
LISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground RunningLISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground Running
Chris McEniry
 
Container orchestration k8s azure kubernetes services
Container orchestration  k8s azure kubernetes servicesContainer orchestration  k8s azure kubernetes services
Container orchestration k8s azure kubernetes services
Rajesh Kolla
 
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and KubelessBuilding Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Bitnami
 
Nodeless and serverless kubernetes
Nodeless and serverless kubernetesNodeless and serverless kubernetes
Nodeless and serverless kubernetes
Nills Franssens
 
Kubernetes Intro @HaufeDev
Kubernetes Intro @HaufeDev Kubernetes Intro @HaufeDev
Kubernetes Intro @HaufeDev
Haufe-Lexware GmbH & Co KG
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
Martin Danielsson
 
Kubernetes-Presentation-Syed-Murtaza-Hassan
Kubernetes-Presentation-Syed-Murtaza-HassanKubernetes-Presentation-Syed-Murtaza-Hassan
Kubernetes-Presentation-Syed-Murtaza-Hassan
Syed Murtaza Hassan
 
Fabio rapposelli pks-vmug
Fabio rapposelli   pks-vmugFabio rapposelli   pks-vmug
Fabio rapposelli pks-vmug
VMUG IT
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
Karol Chrapek
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
DataWorks Summit
 
Kubermatic.pdf
Kubermatic.pdfKubermatic.pdf
Kubermatic.pdf
LibbySchulze
 
Kubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdfKubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdf
LibbySchulze
 
Pro2516 10 things about oracle and k8s.pptx-final
Pro2516   10 things about oracle and k8s.pptx-finalPro2516   10 things about oracle and k8s.pptx-final
Pro2516 10 things about oracle and k8s.pptx-final
Michel Schildmeijer
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJ
EDB
 
RedisConf18 - Redis Enterprise on Cloud Native Platforms
RedisConf18 - Redis Enterprise on Cloud  Native  Platforms RedisConf18 - Redis Enterprise on Cloud  Native  Platforms
RedisConf18 - Redis Enterprise on Cloud Native Platforms
Redis Labs
 
Introduction to Virtual Kubelet
Introduction to Virtual KubeletIntroduction to Virtual Kubelet
Introduction to Virtual Kubelet
Mitchell Pronschinske
 
Big data and Kubernetes
Big data and KubernetesBig data and Kubernetes
Big data and Kubernetes
Anirudh Ramanathan
 
Docker, Atomic Host and Kubernetes.
Docker, Atomic Host and Kubernetes.Docker, Atomic Host and Kubernetes.
Docker, Atomic Host and Kubernetes.
Jooho Lee
 
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless  - Serverless Summit 2017 - Krishna KumarKubernetes for Serverless  - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
CodeOps Technologies LLP
 

Similar to Running secured Spark job in Kubernetes compute cluster and integrating with Kerberized HDFS (20)

01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
 
LISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground RunningLISA2017 Kubernetes: Hit the Ground Running
LISA2017 Kubernetes: Hit the Ground Running
 
Container orchestration k8s azure kubernetes services
Container orchestration  k8s azure kubernetes servicesContainer orchestration  k8s azure kubernetes services
Container orchestration k8s azure kubernetes services
 
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and KubelessBuilding Cloud-Native Applications with Kubernetes, Helm and Kubeless
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
 
Nodeless and serverless kubernetes
Nodeless and serverless kubernetesNodeless and serverless kubernetes
Nodeless and serverless kubernetes
 
Kubernetes Intro @HaufeDev
Kubernetes Intro @HaufeDev Kubernetes Intro @HaufeDev
Kubernetes Intro @HaufeDev
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
Kubernetes-Presentation-Syed-Murtaza-Hassan
Kubernetes-Presentation-Syed-Murtaza-HassanKubernetes-Presentation-Syed-Murtaza-Hassan
Kubernetes-Presentation-Syed-Murtaza-Hassan
 
Fabio rapposelli pks-vmug
Fabio rapposelli   pks-vmugFabio rapposelli   pks-vmug
Fabio rapposelli pks-vmug
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
 
Kubermatic.pdf
Kubermatic.pdfKubermatic.pdf
Kubermatic.pdf
 
Kubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdfKubermatic CNCF Webinar - start.kubermatic.pdf
Kubermatic CNCF Webinar - start.kubermatic.pdf
 
Pro2516 10 things about oracle and k8s.pptx-final
Pro2516   10 things about oracle and k8s.pptx-finalPro2516   10 things about oracle and k8s.pptx-final
Pro2516 10 things about oracle and k8s.pptx-final
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJ
 
RedisConf18 - Redis Enterprise on Cloud Native Platforms
RedisConf18 - Redis Enterprise on Cloud  Native  Platforms RedisConf18 - Redis Enterprise on Cloud  Native  Platforms
RedisConf18 - Redis Enterprise on Cloud Native Platforms
 
Introduction to Virtual Kubelet
Introduction to Virtual KubeletIntroduction to Virtual Kubelet
Introduction to Virtual Kubelet
 
Big data and Kubernetes
Big data and KubernetesBig data and Kubernetes
Big data and Kubernetes
 
Docker, Atomic Host and Kubernetes.
Docker, Atomic Host and Kubernetes.Docker, Atomic Host and Kubernetes.
Docker, Atomic Host and Kubernetes.
 
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless  - Serverless Summit 2017 - Krishna KumarKubernetes for Serverless  - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 

Recently uploaded (20)

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 

Running secured Spark job in Kubernetes compute cluster and integrating with Kerberized HDFS

  • 1. Running secured Spark job in Kubernetes compute cluster and integrating with Kerberized HDFS Joy Chakraborty June 21, 2018 Email: joychak1@[yahoo/gmail].com]]
  • 2. • To run Spark job on Elastic Compute platform (Kubernetes) accessing data stored in HDFS Essentially what we will be doing -
  • 3. 3 Who am I ??? I learn and apply …. - Design and write software for living - for last 18 years …
  • 4. 4 Disclaimer We will be just scratching the Surface !!!
  • 5. Agenda 5 Kubernetes as Elastic Compute1 HDFS as secured distributed storage2 Configuring Spark to run in Kubernetes & accessing HDFS3 Demo - Build & setup the Kubernetes/Spark/HDFS environment4
  • 8. 8 Compute Requirements • Support Elasticity • Flexibility and variability in work load • Process massive amount of data in parallel • Support Reliability & Multitenancy • Ease in Accessibility without compromising Security
  • 10. 10 • Container Management • Scheduling • Resource Management Distributed Containerized System
  • 11. 11 • Container Management • Scheduling • Resource Management Kubernetes Distributed Containerized System
  • 13. 13 Kubernetes An open-source system for automating deployment, scaling, and management of containerized applications. • Portable • Extensible: modular, plug-able, compos-able • Self-healing: auto-placement, auto-restart, auto- replication, auto-scaling • Latest Release: 1.8.13 • Community driven completely • Written in GoLang
  • 14. 14 Kubernetes High Level Architecture
  • 15. 15 Kubernetes High Level Architecture • The Kubernetes Master is a collection of three processes that run on a single node in your cluster, which is designated as the master node. • kube-apiserver • kube-controller-manager • kube-scheduler • Each individual non-master node in your cluster runs two processes – • kubelet, which communicates with the Kubernetes Master • kube-proxy, a network proxy which reflects Kubernetes networking services on each node. Kube-apiserver Kube-Controller-manager
  • 17. 17 Kubernetes Object/Resource (out of the box) • Namespace • Pod (a basic unit of work) • Service • Volume
  • 18. 18 Kubernetes Object/Resource (out of the box) • Namespace • Pod (a basic unit of work) • Service • Volume • ReplicaSet • Deployment • Job
  • 19. 19 Kubernetes Object/Resource (out of the box) • Namespace • Pod (a basic unit of work) • Service • Volume • ReplicaSet • Deployment • Job • RBAC
  • 20. 20 Kubernetes Object/Resource (out of the box) • Namespace • Pod (a basic unit of work) • Service • Volume • ReplicaSet • Deployment • Job • RBAC *** Custom Resource
  • 21. 21 Kubernetes Pod • Pod is the basic building block of Kubernetes– • the smallest and simplest unit in the Kubernetes object model that you create or deploy • represents a running process on your cluster. • encapsulates an application container (or, in some cases, multiple containers), storage resources, a unique network IP, and options that govern how the container(s) should run • Docker is the most common container runtime used in a Pod, but Pods support other container runtimes as well
  • 22. 22 How to Interact with Kubernetes
  • 23. 23 How to Interact with Kubernetes • REST API • Curl or browser • Command Line Interface • Kubectl • Kube Config • Created during cluster creation • Programmatic • GoLang, Python, Scala
  • 25. 25 KDC AS TGS Active Directory Client Machine Client Application HDFS Namenode Data Node Data Node Data Node Data Node Secured/Kerberized HDFS
  • 26. 26 KDC AS TGS Active Directory Client Machine Client Application HDFS Namenode Data Node Data Node Data Node Data Node 2. Client-app requests Ticket 3. KDC sends TGT 1. Service Principles/Keys 6. Sends Service Ticket and requests for Authentication 5. User Authenticated using Service Principle/key Retrieves User roles/permissions Secured/Kerberized (keytab) HDFS
  • 27. 27 KDC AS TGS Active Directory Client Machine Client Application HDFS Namenode Data Node Data Node Data Node Data Node 2. Client-app requests Ticket 3. KDC sends TGT 1. Service Principles/Keys 6. Sends Service Ticket and requests for Delegation Token 7. User Authenticated using Service Principle/key Retrieves User roles/permissions Secured/Kerberized (delegation token) HDFS 8. Name node sends delegation token
  • 29. 29 Spark Running in Kubernetes Kubernetes ClusterKube Master
  • 30. 30 Spark Running in Kubernetes Kubernetes Cluster Spark-Submit (--master k8s://…) Kube Master
  • 31. 31 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Spark-Submit Spark Master Kube Master
  • 32. 32 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Docker Hub Spark Docker Images Spark-Submit Spark Master Kube Master
  • 33. 33 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Docker Hub Spark Docker Images Spark-Submit Spark Master Kube Master HDFS Namenode Data Node Data Node Data Node Data Node
  • 34. 34 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Docker Hub Spark Docker Images Spark-Submit Spark Master Kube Master HDFS Namenode Data Node Data Node Data Node Data Node Kube Secret Keytab
  • 35. 35 Spark Running in Kubernetes Kubernetes Cluster Worker Pod Worker Pod Worker Pod Worker Pod Spark Job Driver Pod Docker Hub Spark Docker Images Spark-Submit Spark Master Kube Master HDFS Namenode Data Node Data Node Data Node Data Node Kube Secret Keytab
  • 37. 37 What to use • To install Kubernetes • vagrant-kubeadm (https://github.com/c9s/vagrant-kubeadm) • Run local Docker Hub (https://docs.docker.com/registry/deploying ) • Run Docker registry container and map localhost to some ip (10.0.2.2) • Build Spark Docker image from Spark download Dockerfile under Kubernetes directory (spark-2.3.0-bin-hadoop2.7/ kubernetes) • Publish the image to Docker hub • Run Spark-submit

Editor's Notes

  1. Artificial intelligence is intelligence exhibited by machines. In computer science, the field of AI defines itself as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of success at some goal