SlideShare a Scribd company logo
1 of 38
© 2017 Google Inc. All rights reserved. Google
and the Google logo are trademarks of Google Inc.
All other company and product names may be
trademarks of the respective companies with
which they are associated.
An Early Evaluation of Running
Spark on Kubernetes
Christopher Crosbie MPH, MS
Product Manager, Open Data Analytics
3
Customers Using Google Cloud
2016
4
Google
Research
20082002 2004 2006 2010 2012 2014 2015
Open
Source
2005
Google
Cloud
Product
s BigQuery Pub/Sub Dataflow Bigtable ML
GFS
Map
Reduce
BigTable Dremel
Flume
Java
Millwheel Tensorflow
Google has 20+ years experience solving Data Problems
Apache Beam
PubSub
Dataproc
GCP Open Data Ecosystem
Cloud Composer
(Apache Airflow)
DataFlow
(Apache Beam)
Cloud Dataproc
(Apache Spark/Hadoop)
What is Cloud Dataproc?
Rapid cluster creation
Familiar open source tools
Google Cloud Platform’s
fully-managed Apache Spark
and Apache Hadoop service
Ephemeral clusters on-
demand
Customizable machines
Tightly Integrated
with other Google Cloud
Platform services
Cloud Dataproc in 2018…
More than 30 features launched
Fast
- Clusters from YAML
- Cloud Storage
connector
optimizations
- Weekly updates
- OSS performance
tuning
Easy
- Custom images
- Stackdriver
monitoring
- Workflow templates
- Workflow
parameters
- Optional components
Cost-effective
- Autoscaling
- Granular IAM
- CMEK support for
multiple products
- Graceful
decommissioning
- Global expansion to 6
new regions
Cloud Dataproc Internals
User Land
Google Borg
Apache Hadoop YARN
Job Dispatcher
Spanner
Dataproc Agent
TaskService
Frontend
CLI
GUI
API
GFE
(dataproc.googleapis.com)
GFE
(dataproc.control.googleapis.com)
JobService
Task Dispatcher
GCS
$ hadoop jar ...
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
What is Kubernetes?
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Kubernetes (a.k.a "k8s")
● An open source project
● Framework for container
management and automation
● Based on Google's systems
● Developing rapidly - complex
○ Only covering the basics in this class
○ Only covering GKE in this class
● More information
○ kuberenetes.io (also, k8s.io)
11
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The Kubernetes Vision
UI
API
Container
Cluster
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
How Kubernetes accomplishes this
kubelet
kubelet
etcd
scheduler
controllers
kubelet
apiserver
users master nodes
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Container Cluster
14
Container Cluster - a group of GCE
instances running Kubernetes
node node nodemaster
Each node runs:
● Docker runtime
● Kubelet agent
○ Manages scheduled
Docker containers
● Network proxy
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Kubernetes master endpoint
15
Kubernetes Master
node node node
● Endpoint -- doorway to the cluster
● Kubernetes API server
○ Services REST requests
○ Schedules pod creation/deletion on nodes
○ Synchs pod info with service info
● Cloud Services integration
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Pods
16
Pod
Containers
10.1.0.100
● A Pod is K8s abstraction
to represent an
application
● It holds one or more
containers
● The containers in the
pod share:
○ A single IP address
○ A single namespace
IP
nginx
Spark
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Pods
17
Pod
Containers
10.1.0.100
Cloud Storage Disk
● A Pods can share other
items
○ Access to storage
IP
Volumes
nginx
Spark
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Pods are scheduled onto nodes
18
Deployment
Pod
"A"
Pod
"B"
nodesmaster
Node
"1"
Node
"2"
Containers
Cluster Container
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
cluster
node
Here's a complete overview of a cluster
networking services
pod
data
storage
services
pod
master
kubelet
apiserver
node
pod pod
kubelet
node
kubelet
node
pod
kubelet
node
pod
kubelet
node
kubeletmaster
apiserver
kubectl
app1
app2
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Operators to the rescue
● Method of packaging,
deploying and managing a
Kubernetes application.
● Deployed on Kubernetes and
managed using the
Kubernetes APIs and kubectl
tooling.
● Set of cohesive APIs to
extend in order to service and
manage your applications
that run on Kubernetes.
YARN translation: *think* Spark Job
Server or Livy
● Integrates with BigQuery,
Google’s Serverless Data
Warehouse
● Provides Google Cloud
Storage as replacement for
HDFS
● Ships logs to Stackdriver
Monitoring
○ via Prometheus server
with the Stackdriver
sidecar
● Contains sparkctl, a
command line tool that
simplifies client-local
application dependencies in a
Kubernetes environment.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Spark Operator Walkthrough
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Deployment options
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Creating Workloads with Deployments
Workloads are controller objects that correspond
to one of the following workload types
● Stateless applications
● Stateful applications
● Batch jobs
● Daemons
The Spark operator uses Deployments to initiate
the Workload
● Describes a desired state
● Deployment controller changes the actual
state to the desired state at a controlled
rate.
● Deployments can create new ReplicaSets,
or to remove existing Deployments and
adopt all their resources with new
Deployments
10.1.0.2 10.1.0.3
Deployment
+1
Autoscaling
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
Deployments rely on ReplicaSets to manage and run pods
deployment
pod pod pod
ReplicaSet
ReplicaSet
- replicas: 3
- selector:
- app: hello
Deployment
- name: hello
Pod
- containers:
- image: hello1
What a ReplicaSet contains
● A selector to specify how to
identify Pods it can aquire
● A number indicating how
many Pods should be
maintained
● A Pod template that
describes what runs on the
pod
What a ReplicaSet does
● Creates Pods using the
template until it reaches the
number of replicas
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
10.1.0.1 10.1.0.2
10.8.244.100
Service
Pod
"A"
Pod
"B"
A Service provides a persistent internal or external IP for pods
SparkUI is exposed as a Service
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
First Impressions
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The Good The Bad
Unified interface to cluster environment (assuming
you have K8 but not YARN)
Another cluster environment (assuming you have
YARN but not K8)
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The Good The Bad
Let’s data scientists and developers tap into unused
resources of existing cluster (assuming you are
running K8 for other applications and not at full
utilization)
Re-tuning all the Spark applications for K8 instead
of YARN.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The Good The Bad
Developers can use custom configurations that may
be at the OS level.
Audit logs - YARN has logs split into resource
manager and node manager logs. Most enterprises
have setup for monitoring and alerting and can look
at different class paths. All of this would need to be
revisited.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The Good The Bad
Package containers with libraries for your
application (rm need for Conda environments)
Allows for more targeted upgrades
Forced into dealing with networking for an
application - usually another team in traditional
clusters
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The Good The Bad
Can break out users easily into distinct workloads
and isolate resources based on max memory, cpu
throttling - can get away from queue management
and mapping users to YARN queues
Distributed stateful data. Spark 2.4 opens up
volumes but not much work has been done with
tying K8 Stateful Sets back to Spark operator.
(mitigated with GCS as data source+sink)
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The Good The Bad
Secrets management for user connections to
various data sources
Security resembling a matryoshka doll - Alpha
Kerberos within K8 RBAC controls, within VM
service account, within cloud IAM, backed by Cloud
Identity often synced to something else.
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
The Good The Bad
Scale to zero Poor shuffle performance (work in flight to address)
© 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other
company and product names may be trademarks of the respective companies with which they are associated.
What are your first impressions?

More Related Content

What's hot

Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
DataWorks Summit
 
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
DataWorks Summit
 

What's hot (20)

Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
 
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefits
 
Pivotal Greenplum Cloud Marketplaces - Greenplum Summit 2019
Pivotal Greenplum Cloud Marketplaces - Greenplum Summit 2019Pivotal Greenplum Cloud Marketplaces - Greenplum Summit 2019
Pivotal Greenplum Cloud Marketplaces - Greenplum Summit 2019
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
 
Performance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storagePerformance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storage
 
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and InteroperabilityImproving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
 
Admiral Group
Admiral GroupAdmiral Group
Admiral Group
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
Distributing big astronomical catalogues with Greenplum - Greenplum Summit 2019
Distributing big astronomical catalogues with Greenplum - Greenplum Summit 2019Distributing big astronomical catalogues with Greenplum - Greenplum Summit 2019
Distributing big astronomical catalogues with Greenplum - Greenplum Summit 2019
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
 
How to Meet the CF Conventions with NcML for NASA HDF/HDF-EOS
How to Meet the CF Conventions with NcML for NASA HDF/HDF-EOSHow to Meet the CF Conventions with NcML for NASA HDF/HDF-EOS
How to Meet the CF Conventions with NcML for NASA HDF/HDF-EOS
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
 
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science ToolkitApache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
 

Similar to An Early Evaluation of Running Spark on Kubernetes

Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...
Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...
Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...
mfrancis
 

Similar to An Early Evaluation of Running Spark on Kubernetes (20)

Mykola Murha "Using Google Cloud Platform for creating of Big Data Analysis ...
Mykola Murha  "Using Google Cloud Platform for creating of Big Data Analysis ...Mykola Murha  "Using Google Cloud Platform for creating of Big Data Analysis ...
Mykola Murha "Using Google Cloud Platform for creating of Big Data Analysis ...
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
Powerful Google Cloud tools for your hack
Powerful Google Cloud tools for your hackPowerful Google Cloud tools for your hack
Powerful Google Cloud tools for your hack
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
Serverless Computing with Google Cloud
Serverless Computing with Google CloudServerless Computing with Google Cloud
Serverless Computing with Google Cloud
 
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud RunDesigning flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
 
Run your code serverlessly on Google's open cloud
Run your code serverlessly on Google's open cloudRun your code serverlessly on Google's open cloud
Run your code serverlessly on Google's open cloud
 
Talend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech OverviewTalend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech Overview
 
Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...
Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...
Creating an all-purpose REST API for Cloud services using OSGi and Sling - C ...
 
Serverless computing with Google Cloud
Serverless computing with Google CloudServerless computing with Google Cloud
Serverless computing with Google Cloud
 
Introduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google CloudIntroduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google Cloud
 
Cloud computing overview & Technical intro to Google Cloud
Cloud computing overview & Technical intro to Google CloudCloud computing overview & Technical intro to Google Cloud
Cloud computing overview & Technical intro to Google Cloud
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIs
 
Gdsc muk - innocent
Gdsc   muk - innocentGdsc   muk - innocent
Gdsc muk - innocent
 
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
PuppetConf 2017 | Adobe Advertising Cloud: A Lean Puppet Workflow to Support ...
 
PuppetConf 2017: Adobe Advertising Cloud: Lean Puppet Workflow to Support Mul...
PuppetConf 2017: Adobe Advertising Cloud: Lean Puppet Workflow to Support Mul...PuppetConf 2017: Adobe Advertising Cloud: Lean Puppet Workflow to Support Mul...
PuppetConf 2017: Adobe Advertising Cloud: Lean Puppet Workflow to Support Mul...
 
Introductory Session.pdf
Introductory Session.pdfIntroductory Session.pdf
Introductory Session.pdf
 
Introduction to serverless computing on Google Cloud
Introduction to serverless computing on Google CloudIntroduction to serverless computing on Google Cloud
Introduction to serverless computing on Google Cloud
 
GDSC Study Jam Session 1
GDSC Study Jam Session 1GDSC Study Jam Session 1
GDSC Study Jam Session 1
 
Google's serverless journey: past to present
Google's serverless journey: past to presentGoogle's serverless journey: past to present
Google's serverless journey: past to present
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

An Early Evaluation of Running Spark on Kubernetes

  • 1. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. An Early Evaluation of Running Spark on Kubernetes Christopher Crosbie MPH, MS Product Manager, Open Data Analytics
  • 2.
  • 4. 2016 4 Google Research 20082002 2004 2006 2010 2012 2014 2015 Open Source 2005 Google Cloud Product s BigQuery Pub/Sub Dataflow Bigtable ML GFS Map Reduce BigTable Dremel Flume Java Millwheel Tensorflow Google has 20+ years experience solving Data Problems Apache Beam PubSub Dataproc
  • 5. GCP Open Data Ecosystem Cloud Composer (Apache Airflow) DataFlow (Apache Beam) Cloud Dataproc (Apache Spark/Hadoop)
  • 6. What is Cloud Dataproc? Rapid cluster creation Familiar open source tools Google Cloud Platform’s fully-managed Apache Spark and Apache Hadoop service Ephemeral clusters on- demand Customizable machines Tightly Integrated with other Google Cloud Platform services
  • 7. Cloud Dataproc in 2018… More than 30 features launched Fast - Clusters from YAML - Cloud Storage connector optimizations - Weekly updates - OSS performance tuning Easy - Custom images - Stackdriver monitoring - Workflow templates - Workflow parameters - Optional components Cost-effective - Autoscaling - Granular IAM - CMEK support for multiple products - Graceful decommissioning - Global expansion to 6 new regions
  • 8. Cloud Dataproc Internals User Land Google Borg Apache Hadoop YARN Job Dispatcher Spanner Dataproc Agent TaskService Frontend CLI GUI API GFE (dataproc.googleapis.com) GFE (dataproc.control.googleapis.com) JobService Task Dispatcher GCS $ hadoop jar ...
  • 9.
  • 10. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. What is Kubernetes?
  • 11. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Kubernetes (a.k.a "k8s") ● An open source project ● Framework for container management and automation ● Based on Google's systems ● Developing rapidly - complex ○ Only covering the basics in this class ○ Only covering GKE in this class ● More information ○ kuberenetes.io (also, k8s.io) 11
  • 12. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. The Kubernetes Vision UI API Container Cluster
  • 13. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. How Kubernetes accomplishes this kubelet kubelet etcd scheduler controllers kubelet apiserver users master nodes
  • 14. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Container Cluster 14 Container Cluster - a group of GCE instances running Kubernetes node node nodemaster Each node runs: ● Docker runtime ● Kubelet agent ○ Manages scheduled Docker containers ● Network proxy
  • 15. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Kubernetes master endpoint 15 Kubernetes Master node node node ● Endpoint -- doorway to the cluster ● Kubernetes API server ○ Services REST requests ○ Schedules pod creation/deletion on nodes ○ Synchs pod info with service info ● Cloud Services integration
  • 16. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Pods 16 Pod Containers 10.1.0.100 ● A Pod is K8s abstraction to represent an application ● It holds one or more containers ● The containers in the pod share: ○ A single IP address ○ A single namespace IP nginx Spark
  • 17. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Pods 17 Pod Containers 10.1.0.100 Cloud Storage Disk ● A Pods can share other items ○ Access to storage IP Volumes nginx Spark
  • 18. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Pods are scheduled onto nodes 18 Deployment Pod "A" Pod "B" nodesmaster Node "1" Node "2" Containers Cluster Container
  • 19. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. cluster node Here's a complete overview of a cluster networking services pod data storage services pod master kubelet apiserver node pod pod kubelet node kubelet node pod kubelet node pod kubelet node kubeletmaster apiserver kubectl app1 app2
  • 20. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Operators to the rescue ● Method of packaging, deploying and managing a Kubernetes application. ● Deployed on Kubernetes and managed using the Kubernetes APIs and kubectl tooling. ● Set of cohesive APIs to extend in order to service and manage your applications that run on Kubernetes. YARN translation: *think* Spark Job Server or Livy
  • 21. ● Integrates with BigQuery, Google’s Serverless Data Warehouse ● Provides Google Cloud Storage as replacement for HDFS ● Ships logs to Stackdriver Monitoring ○ via Prometheus server with the Stackdriver sidecar ● Contains sparkctl, a command line tool that simplifies client-local application dependencies in a Kubernetes environment.
  • 22. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Spark Operator Walkthrough
  • 23. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Deployment options
  • 24. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated.
  • 25. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Creating Workloads with Deployments Workloads are controller objects that correspond to one of the following workload types ● Stateless applications ● Stateful applications ● Batch jobs ● Daemons The Spark operator uses Deployments to initiate the Workload ● Describes a desired state ● Deployment controller changes the actual state to the desired state at a controlled rate. ● Deployments can create new ReplicaSets, or to remove existing Deployments and adopt all their resources with new Deployments 10.1.0.2 10.1.0.3 Deployment +1 Autoscaling
  • 26. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. Deployments rely on ReplicaSets to manage and run pods deployment pod pod pod ReplicaSet ReplicaSet - replicas: 3 - selector: - app: hello Deployment - name: hello Pod - containers: - image: hello1 What a ReplicaSet contains ● A selector to specify how to identify Pods it can aquire ● A number indicating how many Pods should be maintained ● A Pod template that describes what runs on the pod What a ReplicaSet does ● Creates Pods using the template until it reaches the number of replicas
  • 27. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated.
  • 28. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. 10.1.0.1 10.1.0.2 10.8.244.100 Service Pod "A" Pod "B" A Service provides a persistent internal or external IP for pods SparkUI is exposed as a Service
  • 29. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated.
  • 30. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. First Impressions
  • 31. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. The Good The Bad Unified interface to cluster environment (assuming you have K8 but not YARN) Another cluster environment (assuming you have YARN but not K8)
  • 32. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. The Good The Bad Let’s data scientists and developers tap into unused resources of existing cluster (assuming you are running K8 for other applications and not at full utilization) Re-tuning all the Spark applications for K8 instead of YARN.
  • 33. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. The Good The Bad Developers can use custom configurations that may be at the OS level. Audit logs - YARN has logs split into resource manager and node manager logs. Most enterprises have setup for monitoring and alerting and can look at different class paths. All of this would need to be revisited.
  • 34. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. The Good The Bad Package containers with libraries for your application (rm need for Conda environments) Allows for more targeted upgrades Forced into dealing with networking for an application - usually another team in traditional clusters
  • 35. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. The Good The Bad Can break out users easily into distinct workloads and isolate resources based on max memory, cpu throttling - can get away from queue management and mapping users to YARN queues Distributed stateful data. Spark 2.4 opens up volumes but not much work has been done with tying K8 Stateful Sets back to Spark operator. (mitigated with GCS as data source+sink)
  • 36. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. The Good The Bad Secrets management for user connections to various data sources Security resembling a matryoshka doll - Alpha Kerberos within K8 RBAC controls, within VM service account, within cloud IAM, backed by Cloud Identity often synced to something else.
  • 37. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. The Good The Bad Scale to zero Poor shuffle performance (work in flight to address)
  • 38. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated. What are your first impressions?

Editor's Notes

  1. xIntro to my presentation: xStory about boss coming to me and asking for our K8s plan. What is K8s? Anyone else have that happen yet? Got a good thing going with YARN, why am I ripping all that out? Where does this fit? Going to walk through my findings to save you time.
  2. Most of you have probably seen the Google Search box. Behind Google Search is a tremendous amount of infrastructure – and machine learning and analytics foundation – that make those wonderfully simple experiences possible. 20 years of experience in building secure and trusted infrastructure for processing massive volume of data! The universe of data! Google has also been a leader in the field of machine learning and AI to better organize world’s information and make it available to everyone.
  3. From EBC deck: https://docs.google.com/presentation/d/1Am33uS23Hkdew-OsFLAmzB1XjPIQSo3bq4CAtvNaQps/edit#slide=id.g404c2253c6_0_928 Enterprises, planet-scale internet companies, disruptive start-ups are innovating and transforming their businesses with Google Cloud data analytics solutions.
  4. Speaker Notes: Google has a long history of innovating in the data space. Papers on GFS and MapReduce are widely credited for the creation of the first versions of Hadoop. Continued papers have resulted in projects such as HBase, Crunch, Drill, Beam, and TensorFlow. Google has been demoncritizing these internal technologies through GCP projects since ~2011 starting with BigQuery.
  5. Dataproc as a processing engine gives customers a managed cloud experience but without having to re-architect applications and code. It also provides deep integrations with the rest of GCP making it easy to mix open source solutions alongside native GCP services.
  6. https://kubernetes.io/ Kubernetes is greek for "helmsman" or "pilot". Projects started in 2014. Based on experience with Google's internal container management system.
  7. Most users only really care that you provide them with an API. And most operators only really care that they have a running container cluster.
  8. Kubernetes is an open source project (available on kubernetes.io) that can run on many different environments, from laptops to high-availability multi-node clusters, from public clouds to on-premise deployments, from virtual machines to bare metal. At the highest level, it is a set of APIs that you can use to deploy containers on a set of nodes. The system is divided into a set of master components that run as the control plane and a set of nodes that run containers. Users access your API via a command-line interface, HTTP, or a user interface.
  9. A Container Cluster is a Google abstraction that relates Kubernetes to the GCE infrastructure. A collection of GCE VM instances, consisting of the Kubernetes Master Endpoint and one or more node instances.
  10. Now that you understand the physical relationship between GCE and Kubernetes, we need to focus on Kubernetes abstractions and understand how Kubernetes works. Then we will return to the discussion of containers and see how those Kubernetes abstractions interact with nodes.
  11. One purpose of GKE is to enable you to manage applications, not machines. To accomplish this, you need to understand the GKE abstractions for applications. Any data access mounted to a pod, called a Volume, is available to all containers in the pod. Containers that are part of the same pod are guaranteed to be scheduled together on the same VM and can share state via local volumes. Persistent Volumes, using persistent disks in GCE, survive instance and container restarts.
  12. One purpose of GKE is to enable you to manage applications, not machines. To accomplish this, you need to understand the GKE abstractions for applications. Any data access mounted to a pod, called a Volume, is available to all containers in the pod. Containers that are part of the same pod are guaranteed to be scheduled together on the same VM and can share state via local volumes. Persistent Volumes, using persistent disks in GCE, survive instance and container restarts.
  13. Deployments handle the scheduling of the pods onto the machines, which are called Nodes. So now that you understand Pods and Deployments, we will return to the relationship with GCE.
  14. Here's a complete overview of a cluster with its key components. You have a set of master servers and worker nodes. The masters provide the control plane for the cluster. Worker nodes run pods with containers in them. Cluster administrators configure the cluster by sending requests to apiservers on masters using a command-line tool called kubectl. Kubectl can be installed and run anywhere. From there, the apiserver communicates with the cluster in two primary ways: To the kubelet process that runs on each node To any node, pod, or service through the apiserver's proxy functionality (not shown). Then pods are started on various nodes. In this example, there are two types of pods running (shown in yellow and green). There is also a process on each node called kube-proxy (not shown) that sets up networking rules and connection forwarding for services and pods on the host. Although networking and data storage services are shown outside nodes, most functionality resides on nodes. You can also access the apiserver using a web interface called the dashboard via kubectl proxy (not shown).
  15. Image from external Google Slide deck: https://docs.google.com/presentation/d/1lJ2F7e-nYHU1eZq3M9H61rRsIE75s-eQQ_mkoY5a7Ro/edit#slide=id.g2865abe94e_0_1069 You can think of Operators as the runtime that manages this type of application on Kubernetes CoreOS, Bought by Redhat, bought by IBM. Y.
  16. Behind the scenes, a deployment relies on a ReplicaSet to manage and run a given number of pods at a given time. In this example, there is a Deployment named hello. When you create that deployment, it's going to create a ReplicaSet of size 3. You add the label selector of app: hello. Inside of the pod, you have a single image called hello1.
  17. Very first impression is that this is great if your primary business need is having to calculate Pi and you don’t mind that the driver node sometimes fails to start without any error messages. But completely overhauling a cluster scheduler is a lot - we can expect it to improve. When it does, looking ahead, here are some of the tradeoffs as I see them.