Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015

Google Tech Talk.
Containers: What, Why and How
Google Cloud Innovation
Dr. Eric Brewer
VP of Infrastructure
April 27, 2015

Google confidential │ Do not distribute

Management MobileDeveloper
Tools
Compute
Networking
Big Data
Storage

“Cloud Native” Applications
Middle of a great transition
• unlimited “ethereal” resources in the Cloud
• an environment of services not machines
• thinking in APIs and co-designed services
• high availability offered and expected
(this is how Google already works internally)

Google has been developing
and using containers to
manage our applications for
over 10 years.
Images by Connie Zhou
2B launched per week
● simplifies management
● performance isolation
● efficiency

Google confidential │ Do not distributeGoogle confidential │ Do not distribute
What are containers?

VMs vs. Containers
Physical Processor
Virtual Processor
Operating System
Libraries
User Code Private
Copy
Shared
Virtual Machines
Physical Processor
Virtual Processor
Operating System
Libraries
User Code
Containers
ISA
syscall
Containers: less overhead, enable more “magic”

Merging Two Kinds of Containers
Docker
• It’s about packaging
• Control:
• packages
• versions
• (some config)
• Layered file system
• => Prod matches testing
Linux Containers
• It’s about isolation
… performance isolation
• not security isolation
… use VMs for that
• Manage CPUs, memory,
bandwidth, …
• Nested groups

Google Platform Layering
Infrastructure: Machines
App Engine: Language-based
Containers: Process-based
GCE
Kubernetes
GKE
GAE
Easy to use,
Flexible

Why do we use them?

Why Containers
Ease of development:
• User makes jobs based on containers; we schedule those jobs for them
• They don’t worry about machines or the OS
Utilization (much more efficient)
• We pack many containers per machine
• Mix or live services and batch workloads
• Batch fills in the “holes”
Ease of Operation: fewer staff per job
• E.g. all jobs use latest security patches
• More shared code among projects (shared services and code)

How do we use them?

Services Model
Each app lives in an environment of shared services
• storage, monitoring, logging
• master election, deployment, testing
Services are updated independently
• Multiple versions running at a time
• Required for “canary” testing -- deploy new version a little bit
• => Need version numbers
• Usually new versions are backward compatible
• Sometimes not => must eventually update ALL clients
• Running multiple versions gives clients some time to update

Borg Scheduler
Borg is our primary scheduler for the last 10 years
• Schedules over a single very large cluster (10,000+ machines)
• Starts, moves, restarts container-based jobs
• Also handles some logging, monitoring
• Users don’t worry about where their jobs run
Recent paper on Borg:
https://research.google.com/pubs/pub43438.html

Kubernetes
• Container orchestrator
• Runs Docker containers
• Supports multiple cloud and bare-metal environments
• Inspired and informed by Google’s experiences and
internal systems
• Open source, written in Go
Manage applications, not machines
κυβερνήτης: Greek for “pilot” or “helmsman of a ship”
the open source cluster manager from Google

Kubernetes: Higher level of Abstraction
Don’t Worry About
• OS details
• Packages — no conflicts
• Machine sizes (much)
• Mixing languages
• Port conflicts
Think About
• Composition of services
• Load-balancing
• Names of services
• State management
• Monitoring and Logging
• Upgrading

Kubernetes Partners

Open sourced in June 2014
Google launched Google Container Engine (GKE)
• Full managed and hosted Kubernetes
• https://cloud.google.com/container-engine/
Roadmap:
• https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/roadmap.md
Working towards v1.0 over next few months
• Key use cases: Redis, Memcache, MySQL, Mongo, Cassandra
Kubernetes Status & Plans

Evolution is the Real Value
Apps Structured as Independent Microservices
● Encapsulated state with APIs (like “objects”)
● Mixture of languages
● Mixture of teams
Services are Abstract
● A “Service” is just a long-lived abstract name
● Varied implementations over time (versions)
● Kubernetes routes to the right implementation

Pod
/data
Containers:
• Handle package dependencies
• Different versions, same machine
• No “DLL hell”
Managing Dependencies
python 3.4.2
glibc 2.21
MyService
python 2.7.9
glibc 2.19
MySQL
Pods:
• Co-locate containers
• Shared volumes
• IP address, independent port space
• Unit of deployment, migration

Dependencies: Services
Services:
• Replicated pods
• Source pod is a template
• Auto-restart member pods
• Abstract name (DNS)
• IP address for the service
• in addition to the members
• Load balancing among replicas
Load
Balancer
Service IP

Example: Rolling Upgrade with Labels
Servers:
Labels:
frontend
v1.2
frontend
v1.2
frontend
v1.2
frontend
v1.2
frontend
v1.3
frontend
v1.3
frontend
v1.3
frontend
v1.3
frontend
Replication
Controller
replicas: 4
v1.2
Replication
Controller
replicas: 1
v1.3
replicas: 3 replicas: 2replicas: 3replicas: 2replicas: 1 replicas: 4replicas: 0

Summary
A new path for Cloud Native applications:
• Collection of independent (micro) services
• Each service evolves on its own
• Scale as needed
• Update as needed
• Mix versions as needed
• Pods are a building block
• Template for service members
• Group containers and volumes
• Dedicated IP and thus port space
• Containers simplify deployment

2012 20132002 2004 2006 2008 2010
Cloud Dataflow
History of Big Data at Google
Why Cloud Dataflow?
MapReduce
GFS Big Table
Dremel
Pregel
Flume
Colossus
Spanner
MillWheel

Time to answer some Big Data questions
1M
Devices
16.6K Events/sec
43B Events/month
518B Events/year
What was the average viewing time over the
past 7 days, compared to the last year?
How many active viewers did I have in the
last minute?
How many sales were made in the last
hour due to advertising conversion?

“One” approach
1M
Devices
16.6K Events/sec
43B Events/month
518B Events/month

Big Data on Google Cloud
BigQuery
High throughput ingestion,
store and query petabytes
Dataflow
Stream & batch
processing, unified and
simplified
Pub/Sub
Scalable, flexible, and
globally available
messaging
Fully Managed, No-Ops Services
Capture Process Analyze

Cloud Dataflow
Cloud Dataflow is a
collection of SDKs for
building batch or
streaming parallelized
data processing pipelines.
Cloud Dataflow is a fully
managed & elastic service
for executing optimized
parallelized data processing
pipelines.

Benefits of Cloud Dataflow
❯ No Ops - truly elastic data processing for the cloud
• On demand resource allocation w/intelligent auto-scaling
• Automated worker optimization
❯ Unified model for batch & stream based processing
• Flexible I/O support
• Fine grained correctness primitives
❯ Open sourced SDK @ github
• Java 7 today @ /GoogleCloudPlatform/DataflowJavaSDK
• Python 2 in progress
• Scala @/darkjh/scalaflow & /jhlch/scala-dataflow-dsl
• Spark runner@ /cloudera/spark-dataflow
• Flink runner @ /dataArtisans/flink-dataflow

Optimizing Your Time To Answer
More time to dig
into your data
Programming
Resource
provisioning
Performance
tuning
Monitoring
Reliability
Deployment &
configuration
Handling
Growing Scale
Utilization
improvements
Data Processing with Cloud DataflowTypical Data Processing
Programming

• Movement
• Filtering
• Enrichment
• Shaping
• Reduction
• Batch computation
• Continuous
computation
• Composition
• External
orchestration
• Simulation
Where might you use Cloud Dataflow?
AnalysisETL Orchestration

The tension and polarity of Big Data
AccuracySpeed
Cost control Complexity
Time to answer

Let’s build something
1M
Devices
16.6K Events/sec
43B Events/month
518B Events/month
Cloud Pub/Sub Cloud Dataflow BigQuery
How many active viewers did I have in the last
minute?

1M
Devices
16.6K Events/sec
43B Events/month
518B Events/month
Cloud Dataflow BigQuery
minute?

1M
Devices
16.6K Events/sec
43B Events/month
518B Events/month
Cloud Dataflow
minute?

1M
Devices
16.6K Events/sec
43B Events/month
518B Events/month
Cloud Dataflow SDK
+
minute?

Thank you!
http://cloud.google.com

Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015

More Related Content

What's hot

Viewers also liked

Similar to Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015

Recently uploaded

Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015