Google Tech Talk.
Containers: What, Why and How
Google Cloud Innovation
Dr. Eric Brewer
VP of Infrastructure
April 27, 2015
Google confidential │ Do not distribute
Google confidential │ Do not distribute
Google confidential │ Do not distribute
Google confidential │ Do not distribute
Management MobileDeveloper
Tools
Compute
Networking
Big Data
Storage
Google confidential │ Do not distribute
“Cloud Native” Applications
Middle of a great transition
• unlimited “ethereal” resources in the Cloud
• an environment of services not machines
• thinking in APIs and co-designed services
• high availability offered and expected
(this is how Google already works internally)
Google confidential │ Do not distribute
Google has been developing
and using containers to
manage our applications for
over 10 years.
Images by Connie Zhou
2B launched per week
● simplifies management
● performance isolation
● efficiency
Google confidential │ Do not distributeGoogle confidential │ Do not distribute
What are containers?
Google confidential │ Do not distribute
VMs vs. Containers
Physical Processor
Virtual Processor
Operating System
Libraries
User Code Private
Copy
Shared
Virtual Machines
Physical Processor
Virtual Processor
Operating System
Libraries
User Code
Containers
ISA
syscall
Containers: less overhead, enable more “magic”
Google confidential │ Do not distribute
Merging Two Kinds of Containers
Docker
• It’s about packaging
• Control:
• packages
• versions
• (some config)
• Layered file system
• => Prod matches testing
Linux Containers
• It’s about isolation
… performance isolation
• not security isolation
… use VMs for that
• Manage CPUs, memory,
bandwidth, …
• Nested groups
Google confidential │ Do not distribute
Google Platform Layering
Infrastructure: Machines
App Engine: Language-based
Containers: Process-based
GCE
Kubernetes
GKE
GAE
Easy to use,
Flexible
Google confidential │ Do not distributeGoogle confidential │ Do not distribute
Why do we use them?
Google confidential │ Do not distribute
Why Containers
Ease of development:
• User makes jobs based on containers; we schedule those jobs for them
• They don’t worry about machines or the OS
Utilization (much more efficient)
• We pack many containers per machine
• Mix or live services and batch workloads
• Batch fills in the “holes”
Ease of Operation: fewer staff per job
• E.g. all jobs use latest security patches
• More shared code among projects (shared services and code)
Google confidential │ Do not distributeGoogle confidential │ Do not distribute
How do we use them?
Google confidential │ Do not distribute
Services Model
Each app lives in an environment of shared services
• storage, monitoring, logging
• master election, deployment, testing
Services are updated independently
• Multiple versions running at a time
• Required for “canary” testing -- deploy new version a little bit
• => Need version numbers
• Usually new versions are backward compatible
• Sometimes not => must eventually update ALL clients
• Running multiple versions gives clients some time to update
Google confidential │ Do not distribute
Borg Scheduler
Borg is our primary scheduler for the last 10 years
• Schedules over a single very large cluster (10,000+ machines)
• Starts, moves, restarts container-based jobs
• Also handles some logging, monitoring
• Users don’t worry about where their jobs run
Recent paper on Borg:
https://research.google.com/pubs/pub43438.html
Google confidential │ Do not distribute
Kubernetes
• Container orchestrator
• Runs Docker containers
• Supports multiple cloud and bare-metal environments
• Inspired and informed by Google’s experiences and
internal systems
• Open source, written in Go
Manage applications, not machines
κυβερνήτης: Greek for “pilot” or “helmsman of a ship”
the open source cluster manager from Google
Google confidential │ Do not distribute
Kubernetes: Higher level of Abstraction
Don’t Worry About
• OS details
• Packages — no conflicts
• Machine sizes (much)
• Mixing languages
• Port conflicts
Think About
• Composition of services
• Load-balancing
• Names of services
• State management
• Monitoring and Logging
• Upgrading
Google confidential │ Do not distribute
Kubernetes Partners
Google confidential │ Do not distribute
Open sourced in June 2014
Google launched Google Container Engine (GKE)
• Full managed and hosted Kubernetes
• https://cloud.google.com/container-engine/
Roadmap:
• https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/roadmap.md
Working towards v1.0 over next few months
• Key use cases: Redis, Memcache, MySQL, Mongo, Cassandra
Kubernetes Status & Plans
Google confidential │ Do not distribute
Evolution is the Real Value
Apps Structured as Independent Microservices
● Encapsulated state with APIs (like “objects”)
● Mixture of languages
● Mixture of teams
Services are Abstract
● A “Service” is just a long-lived abstract name
● Varied implementations over time (versions)
● Kubernetes routes to the right implementation
Google confidential │ Do not distribute
Pod
/data
Containers:
• Handle package dependencies
• Different versions, same machine
• No “DLL hell”
Managing Dependencies
python 3.4.2
glibc 2.21
MyService
python 2.7.9
glibc 2.19
MySQL
Pods:
• Co-locate containers
• Shared volumes
• IP address, independent port space
• Unit of deployment, migration
Google confidential │ Do not distribute
Dependencies: Services
Services:
• Replicated pods
• Source pod is a template
• Auto-restart member pods
• Abstract name (DNS)
• IP address for the service
• in addition to the members
• Load balancing among replicas
Load
Balancer
Service IP
Google confidential │ Do not distribute
Example: Rolling Upgrade with Labels
Servers:
Labels:
frontend
v1.2
frontend
v1.2
frontend
v1.2
frontend
v1.2
frontend
v1.3
frontend
v1.3
frontend
v1.3
frontend
v1.3
frontend
Replication
Controller
replicas: 4
v1.2
Replication
Controller
replicas: 1
v1.3
replicas: 3 replicas: 2replicas: 3replicas: 2replicas: 1 replicas: 4replicas: 0
Google confidential │ Do not distribute
Summary
A new path for Cloud Native applications:
• Collection of independent (micro) services
• Each service evolves on its own
• Scale as needed
• Update as needed
• Mix versions as needed
• Pods are a building block
• Template for service members
• Group containers and volumes
• Dedicated IP and thus port space
• Containers simplify deployment
Latest Google Innovation
2012 20132002 2004 2006 2008 2010
Cloud Dataflow
History of Big Data at Google
Why Cloud Dataflow?
MapReduce
GFS Big Table
Dremel
Pregel
Flume
Colossus
Spanner
MillWheel
Time to answer some Big Data questions
1M
Devices
16.6K Events/sec
43B Events/month
518B Events/year
What was the average viewing time over the
past 7 days, compared to the last year?
How many active viewers did I have in the
last minute?
How many sales were made in the last
hour due to advertising conversion?
“One” approach
1M
Devices
16.6K Events/sec
43B Events/month
518B Events/month
Big Data on Google Cloud
BigQuery
High throughput ingestion,
store and query petabytes
Dataflow
Stream & batch
processing, unified and
simplified
Pub/Sub
Scalable, flexible, and
globally available
messaging
Fully Managed, No-Ops Services
Capture Process Analyze
Cloud Dataflow
Cloud Dataflow is a
collection of SDKs for
building batch or
streaming parallelized
data processing pipelines.
Cloud Dataflow is a fully
managed & elastic service
for executing optimized
parallelized data processing
pipelines.
Benefits of Cloud Dataflow
❯ No Ops - truly elastic data processing for the cloud
• On demand resource allocation w/intelligent auto-scaling
• Automated worker optimization
❯ Unified model for batch & stream based processing
• Flexible I/O support
• Fine grained correctness primitives
❯ Open sourced SDK @ github
• Java 7 today @ /GoogleCloudPlatform/DataflowJavaSDK
• Python 2 in progress
• Scala @/darkjh/scalaflow & /jhlch/scala-dataflow-dsl
• Spark runner@ /cloudera/spark-dataflow
• Flink runner @ /dataArtisans/flink-dataflow
Optimizing Your Time To Answer
More time to dig
into your data
Programming
Resource
provisioning
Performance
tuning
Monitoring
Reliability
Deployment &
configuration
Handling
Growing Scale
Utilization
improvements
Data Processing with Cloud DataflowTypical Data Processing
Programming
• Movement
• Filtering
• Enrichment
• Shaping
• Reduction
• Batch computation
• Continuous
computation
• Composition
• External
orchestration
• Simulation
Where might you use Cloud Dataflow?
AnalysisETL Orchestration
The tension and polarity of Big Data
AccuracySpeed
Cost control Complexity
Time to answer
Let’s build something
1M
Devices
16.6K Events/sec
43B Events/month
518B Events/month
Cloud Pub/Sub Cloud Dataflow BigQuery
How many active viewers did I have in the last
minute?
Let’s build something
1M
Devices
16.6K Events/sec
43B Events/month
518B Events/month
Cloud Dataflow BigQuery
How many active viewers did I have in the last
minute?
Let’s build something
1M
Devices
16.6K Events/sec
43B Events/month
518B Events/month
Cloud Dataflow
How many active viewers did I have in the last
minute?
Let’s build something
1M
Devices
16.6K Events/sec
43B Events/month
518B Events/month
Cloud Dataflow SDK
+
How many active viewers did I have in the last
minute?
Thank you!
http://cloud.google.com

Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015

  • 1.
    Google Tech Talk. Containers:What, Why and How Google Cloud Innovation Dr. Eric Brewer VP of Infrastructure April 27, 2015
  • 2.
    Google confidential │Do not distribute
  • 3.
    Google confidential │Do not distribute
  • 4.
    Google confidential │Do not distribute
  • 5.
    Google confidential │Do not distribute Management MobileDeveloper Tools Compute Networking Big Data Storage
  • 6.
    Google confidential │Do not distribute “Cloud Native” Applications Middle of a great transition • unlimited “ethereal” resources in the Cloud • an environment of services not machines • thinking in APIs and co-designed services • high availability offered and expected (this is how Google already works internally)
  • 7.
    Google confidential │Do not distribute Google has been developing and using containers to manage our applications for over 10 years. Images by Connie Zhou 2B launched per week ● simplifies management ● performance isolation ● efficiency
  • 8.
    Google confidential │Do not distributeGoogle confidential │ Do not distribute What are containers?
  • 9.
    Google confidential │Do not distribute VMs vs. Containers Physical Processor Virtual Processor Operating System Libraries User Code Private Copy Shared Virtual Machines Physical Processor Virtual Processor Operating System Libraries User Code Containers ISA syscall Containers: less overhead, enable more “magic”
  • 10.
    Google confidential │Do not distribute Merging Two Kinds of Containers Docker • It’s about packaging • Control: • packages • versions • (some config) • Layered file system • => Prod matches testing Linux Containers • It’s about isolation … performance isolation • not security isolation … use VMs for that • Manage CPUs, memory, bandwidth, … • Nested groups
  • 11.
    Google confidential │Do not distribute Google Platform Layering Infrastructure: Machines App Engine: Language-based Containers: Process-based GCE Kubernetes GKE GAE Easy to use, Flexible
  • 12.
    Google confidential │Do not distributeGoogle confidential │ Do not distribute Why do we use them?
  • 13.
    Google confidential │Do not distribute Why Containers Ease of development: • User makes jobs based on containers; we schedule those jobs for them • They don’t worry about machines or the OS Utilization (much more efficient) • We pack many containers per machine • Mix or live services and batch workloads • Batch fills in the “holes” Ease of Operation: fewer staff per job • E.g. all jobs use latest security patches • More shared code among projects (shared services and code)
  • 14.
    Google confidential │Do not distributeGoogle confidential │ Do not distribute How do we use them?
  • 15.
    Google confidential │Do not distribute Services Model Each app lives in an environment of shared services • storage, monitoring, logging • master election, deployment, testing Services are updated independently • Multiple versions running at a time • Required for “canary” testing -- deploy new version a little bit • => Need version numbers • Usually new versions are backward compatible • Sometimes not => must eventually update ALL clients • Running multiple versions gives clients some time to update
  • 16.
    Google confidential │Do not distribute Borg Scheduler Borg is our primary scheduler for the last 10 years • Schedules over a single very large cluster (10,000+ machines) • Starts, moves, restarts container-based jobs • Also handles some logging, monitoring • Users don’t worry about where their jobs run Recent paper on Borg: https://research.google.com/pubs/pub43438.html
  • 17.
    Google confidential │Do not distribute Kubernetes • Container orchestrator • Runs Docker containers • Supports multiple cloud and bare-metal environments • Inspired and informed by Google’s experiences and internal systems • Open source, written in Go Manage applications, not machines κυβερνήτης: Greek for “pilot” or “helmsman of a ship” the open source cluster manager from Google
  • 18.
    Google confidential │Do not distribute Kubernetes: Higher level of Abstraction Don’t Worry About • OS details • Packages — no conflicts • Machine sizes (much) • Mixing languages • Port conflicts Think About • Composition of services • Load-balancing • Names of services • State management • Monitoring and Logging • Upgrading
  • 19.
    Google confidential │Do not distribute Kubernetes Partners
  • 20.
    Google confidential │Do not distribute Open sourced in June 2014 Google launched Google Container Engine (GKE) • Full managed and hosted Kubernetes • https://cloud.google.com/container-engine/ Roadmap: • https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/roadmap.md Working towards v1.0 over next few months • Key use cases: Redis, Memcache, MySQL, Mongo, Cassandra Kubernetes Status & Plans
  • 21.
    Google confidential │Do not distribute Evolution is the Real Value Apps Structured as Independent Microservices ● Encapsulated state with APIs (like “objects”) ● Mixture of languages ● Mixture of teams Services are Abstract ● A “Service” is just a long-lived abstract name ● Varied implementations over time (versions) ● Kubernetes routes to the right implementation
  • 22.
    Google confidential │Do not distribute Pod /data Containers: • Handle package dependencies • Different versions, same machine • No “DLL hell” Managing Dependencies python 3.4.2 glibc 2.21 MyService python 2.7.9 glibc 2.19 MySQL Pods: • Co-locate containers • Shared volumes • IP address, independent port space • Unit of deployment, migration
  • 23.
    Google confidential │Do not distribute Dependencies: Services Services: • Replicated pods • Source pod is a template • Auto-restart member pods • Abstract name (DNS) • IP address for the service • in addition to the members • Load balancing among replicas Load Balancer Service IP
  • 24.
    Google confidential │Do not distribute Example: Rolling Upgrade with Labels Servers: Labels: frontend v1.2 frontend v1.2 frontend v1.2 frontend v1.2 frontend v1.3 frontend v1.3 frontend v1.3 frontend v1.3 frontend Replication Controller replicas: 4 v1.2 Replication Controller replicas: 1 v1.3 replicas: 3 replicas: 2replicas: 3replicas: 2replicas: 1 replicas: 4replicas: 0
  • 25.
    Google confidential │Do not distribute Summary A new path for Cloud Native applications: • Collection of independent (micro) services • Each service evolves on its own • Scale as needed • Update as needed • Mix versions as needed • Pods are a building block • Template for service members • Group containers and volumes • Dedicated IP and thus port space • Containers simplify deployment
  • 26.
  • 27.
    2012 20132002 20042006 2008 2010 Cloud Dataflow History of Big Data at Google Why Cloud Dataflow? MapReduce GFS Big Table Dremel Pregel Flume Colossus Spanner MillWheel
  • 28.
    Time to answersome Big Data questions 1M Devices 16.6K Events/sec 43B Events/month 518B Events/year What was the average viewing time over the past 7 days, compared to the last year? How many active viewers did I have in the last minute? How many sales were made in the last hour due to advertising conversion?
  • 29.
  • 30.
    Big Data onGoogle Cloud BigQuery High throughput ingestion, store and query petabytes Dataflow Stream & batch processing, unified and simplified Pub/Sub Scalable, flexible, and globally available messaging Fully Managed, No-Ops Services Capture Process Analyze
  • 31.
    Cloud Dataflow Cloud Dataflowis a collection of SDKs for building batch or streaming parallelized data processing pipelines. Cloud Dataflow is a fully managed & elastic service for executing optimized parallelized data processing pipelines.
  • 32.
    Benefits of CloudDataflow ❯ No Ops - truly elastic data processing for the cloud • On demand resource allocation w/intelligent auto-scaling • Automated worker optimization ❯ Unified model for batch & stream based processing • Flexible I/O support • Fine grained correctness primitives ❯ Open sourced SDK @ github • Java 7 today @ /GoogleCloudPlatform/DataflowJavaSDK • Python 2 in progress • Scala @/darkjh/scalaflow & /jhlch/scala-dataflow-dsl • Spark runner@ /cloudera/spark-dataflow • Flink runner @ /dataArtisans/flink-dataflow
  • 33.
    Optimizing Your TimeTo Answer More time to dig into your data Programming Resource provisioning Performance tuning Monitoring Reliability Deployment & configuration Handling Growing Scale Utilization improvements Data Processing with Cloud DataflowTypical Data Processing Programming
  • 34.
    • Movement • Filtering •Enrichment • Shaping • Reduction • Batch computation • Continuous computation • Composition • External orchestration • Simulation Where might you use Cloud Dataflow? AnalysisETL Orchestration
  • 35.
    The tension andpolarity of Big Data AccuracySpeed Cost control Complexity Time to answer
  • 36.
    Let’s build something 1M Devices 16.6KEvents/sec 43B Events/month 518B Events/month Cloud Pub/Sub Cloud Dataflow BigQuery How many active viewers did I have in the last minute?
  • 37.
    Let’s build something 1M Devices 16.6KEvents/sec 43B Events/month 518B Events/month Cloud Dataflow BigQuery How many active viewers did I have in the last minute?
  • 38.
    Let’s build something 1M Devices 16.6KEvents/sec 43B Events/month 518B Events/month Cloud Dataflow How many active viewers did I have in the last minute?
  • 39.
    Let’s build something 1M Devices 16.6KEvents/sec 43B Events/month 518B Events/month Cloud Dataflow SDK + How many active viewers did I have in the last minute?
  • 40.