SlideShare a Scribd company logo
1 of 25
Reddit Deployment Infrastructure:
Past, Present, and Future
Ed Ceaser -
/u/heselite
K8s At Reddit
● Multi-year project
● 10 separate k8s clusters
● 11 production services, across 7 teams
● Up to 10 deploys per day on some services
The past
Helm, Helmfile, Drone
● Helm provides basic release
management
● Helmfile provides declarative Helm
chart management
● Helmfiles stored in a central Git repo
● Drone CI runs in each K8s cluster,
runs Helmfile
What did we learn???
Too many moving parts!
Too brittle!
● Tiller sometimes would require
manual intervention
● Drone only reports pass/fail, doesn’t
understand deploys
● Too many layers, hard to debug
● Everything serialized around a single
Git repo
Requirements for Deploys V2
● Self-service deploys for service
owners
● Deploy needs to support Helm
● Deploy status should be directly
visible to service owners
● Deployment system should simplify
things for service owners, not
complicate things
The present
Enter Spinnaker
● Other tools were either CI job executors, or
very immature
● Spinnaker was relatively mature for non-k8s
deployments
● V2 K8s Provider seemed to be a reasonable
approach, under active development
● Had relationships with other users
What are the risks?
Solving Operational Complexity
● We run Spinnaker in k8s
● Use Halyard
● We use Helm to install and manage Halyard. Halyard then deploys Spinnaker
● We set up metrics and alerting immediately out of the gate
Solving Developer Complexity
● Pipelines need to be templated
● Point-and-click style pipeline
configuration needs to be avoided
● Pipeline configuration should be
declarative
● Managed Pipeline Templates 1.0
were already deprecated
● We discovered Sponnet, which
gave us a way to address these
issues
Jsonnet??
Jsonnet Example
{
person1: {
name: "Alice",
welcome: "Hello " + self.name
+ "!",
},
person2: self.person1 { name: "Bob" },
}
{
"person1": {
"name": "Alice",
"welcome": "Hello Alice!"
},
"person2": {
"name": "Bob",
"welcome": "Hello Bob!"
}
}
local Person(name='Alice') = {
name: name,
welcome: 'Hello ' + name + '!',
};
{
person1: Person(),
person2: Person('Bob'),
}
reddit.jsonnet
● ‘Sponnet’ jsonnet library is still very low-level
● Sponnet mainly consists of helpers to construct pipeline primitives
● Users of Sponnet still need to understand how Spinnaker pipelines work
● We created a Jsonnet library that exposes a simple DSL
● That DSL is what is exposed to service owners to configure their pipelines
Deployment Definition Example
local reddit = import 'reddit.libsonnet';
local chart = reddit.helmChart(name=’derp', version='0.1.0')
.withGithubValues(‘reddit/reddit-service-derp’, ‘deploy/values.yaml’);
local pipeline = reddit.pipeline(application='derp', name='Deploy')
.deployHelmChart(chart, namespace=’derp’, cluster=['prod-ue1d', ‘prod-ue1e’])
.notifySlackChannel(‘derp-notifications’);
reddit.render(pipeline)
Which turns into...
{
"expectedArtifacts": [
{
"defaultArtifact": {
"kind": "default.s3",
"name": "s3://helm-charts/derp-
0.1.0.tgz",
"reference": "s3://helm-charts/derp-
0.1.0.tgz",
"type": "s3/object",
"version": "/derp-0.1.0.tgz"
},
"id": "s3://helm-charts/derp-0.1.0.tgz",
"matchArtifact": {
"kind": "s3",
"name": "s3://helm-charts/derp-
0.1.0.tgz",
"type": "s3/object"
},
"useDefaultArtifact": true,
"usePriorArtifact": false
}
],
"keepWaitingPipelines": false,
"lastModifiedBy": "anonymous",
"limitConcurrent": true,
"notifications": [],
"parameterConfig": [
{
"default": "master",
"description": "Docker image tag",
"hasOptions": false,
"label": "Image Tag",
"name": "image_tag",
"required": false
},
{
"default": true,
"description": "Deploy to prod-2-ue1d",
"hasOptions": true,
"label": "Deploy to prod-2-ue1d",
"name": "prod-2-ue1d",
"options": [
{
"value": false
}
],
"required": false
},
{
"default": true,
"description": "Deploy to prod-2-ue1e",
"hasOptions": true,
"label": "Deploy to prod-2-ue1e",
"name": "prod-2-ue1e",
"options": [
{
"value": false
}
],
"required": false
}
],
"stages": [
{
"expectedArtifacts": [
{
"id": "derp",
"matchArtifact": {
"kind": "base64",
"name": "derp",
"type": "embedded/base64"
}
}
],
"inputArtifacts": [
{
"account": "s3",
"id": "s3://helm-charts/derp-0.1.0.tgz"
}
],
"name": "Bake Manifest (derp) - prod-2-
ue1d",
"namespace": "derp",
"outputName": "derp",
"overrides": {
"image.tag": "${parameters.image_tag}"
},
"refId": "Bake Manifest (derp) - prod-2-
ue1d",
"requisiteStageRefIds": [],
"stageEnabled": {
"expression": "parameters["prod-2-
ue1d"] == true",
"type": "expression"
},
"templateRenderer": "HELM2",
"type": "bakeManifest"
},
{
"account": "prod-2-ue1d",
"cloudProvider": "kubernetes",
"manifestArtifactAccount": "embedded-
artifact",
"manifestArtifactId": "derp",
"moniker": {
"app": "derp"
},
"name": "Deploy Manifest (derp) - prod-
2-ue1d",
"overrideTimeout": true,
"refId": "Deploy Manifest (derp) - prod-
2-ue1d",
"requisiteStageRefIds": [
"Bake Manifest (derp) - prod-2-ue1d"
],
"source": "artifact",
"stageEnabled": {
"expression": "parameters["prod-2-
ue1d"] == true",
"type": "expression"
},
"stageTimeoutMs": 600000,
"type": "deployManifest"
{
"expectedArtifacts": [
{
"id": "derp",
"matchArtifact": {
"kind": "base64",
"name": "derp",
"type": "embedded/base64"
}
}
],
"inputArtifacts": [
{
"account": "s3",
"id": "s3://helm-charts/derp-0.1.0.tgz"
}
],
"name": "Bake Manifest (derp) - prod-2-
ue1e",
"namespace": "derp",
"outputName": "derp",
"overrides": {
"image.tag": "${parameters.image_tag}"
},
"refId": "Bake Manifest (derp) - prod-2-
ue1e",
"requisiteStageRefIds": [],
"stageEnabled": {
"expression": "parameters["prod-2-
ue1e"] == true",
"type": "expression"
},
"templateRenderer": "HELM2",
"type": "bakeManifest"
},
{
"account": "prod-2-ue1e",
"cloudProvider": "kubernetes",
"manifestArtifactAccount": "embedded-
artifact",
"manifestArtifactId": "derp",
"moniker": {
"app": "derp"
},
"name": "Deploy Manifest (derp) - prod-
2-ue1e",
"overrideTimeout": true,
"refId": "Deploy Manifest (derp) - prod-
2-ue1e",
"requisiteStageRefIds": [
"Bake Manifest (derp) - prod-2-ue1e"
],
"source": "artifact",
"stageEnabled": {
"expression": "parameters["prod-2-
ue1e"] == true",
"type": "expression"
},
"stageTimeoutMs": 600000,
"type": "deployManifest"
}
],
"triggers": [
...
Rendering the Pipelines
● Continue to use Drone for rendering
pipeline definitions
● We use the Spin CLI to talk to
Spinnaker’s API
● Pipeline templating is done in a
Docker container which is versioned
● This allows for managed upgrades
to newer pipeline templates
● Container lints and validates
pipelines
Where are we today?
● Spinnaker has been in production for ~5 months
● Spinnaker drives all of our production k8s service deploys
● Services are deployed to multiple clusters
● All pipelines are rendered by Jsonnet, nothing managed by the UI
● Service owner response has been very positive
● Two Spinnaker deployments, production and staging
The Future
Lessons Learned
● Simply providing tools is not enough if
they’re hard to use
● Log aggregation is a must for Spinnaker
● Jsonnet is very powerful. Devs need to be
encouraged to stay within guardrails
● Helm difficult abstraction for new k8s
developers
Future Improvements
● More visibility in Helm chart rendering
● Traffic control / service mesh support.
● Managed Pipeline Templates 2.0
● Automated canarying
twitter.com/asdf
reddit.com/u/heselite
Contact Info
Interesting Links
● Helmfile: https://github.com/roboll/helmfile
● Drone: https://drone.io/
● Jsonnet: https://jsonnet.org/
● Sponnet: https://github.com/spinnaker/spinnaker/tree/master/sponnet
● K8s-pipeliner: https://github.com/namely/k8s-pipeliner

More Related Content

What's hot

Scala, docker and testing, oh my! mario camou
Scala, docker and testing, oh my! mario camouScala, docker and testing, oh my! mario camou
Scala, docker and testing, oh my! mario camouJ On The Beach
 
Greach 2019 - Creating Micronaut Configurations
Greach 2019 - Creating Micronaut ConfigurationsGreach 2019 - Creating Micronaut Configurations
Greach 2019 - Creating Micronaut ConfigurationsIván López Martín
 
Security Goodness with Ruby on Rails
Security Goodness with Ruby on RailsSecurity Goodness with Ruby on Rails
Security Goodness with Ruby on RailsSource Conference
 
Creating Reusable Puppet Profiles
Creating Reusable Puppet ProfilesCreating Reusable Puppet Profiles
Creating Reusable Puppet ProfilesBram Vogelaar
 
Service discovery in mesos miguel, Angel Guillen
Service discovery in mesos miguel, Angel GuillenService discovery in mesos miguel, Angel Guillen
Service discovery in mesos miguel, Angel GuillenJ On The Beach
 
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory CourseRuby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Coursepeter_marklund
 
A reviravolta do desenvolvimento web
A reviravolta do desenvolvimento webA reviravolta do desenvolvimento web
A reviravolta do desenvolvimento webWallace Reis
 
Mobile Open Day: React Native: Crossplatform fast dive
Mobile Open Day: React Native: Crossplatform fast diveMobile Open Day: React Native: Crossplatform fast dive
Mobile Open Day: React Native: Crossplatform fast diveepamspb
 

What's hot (20)

Wider than rails
Wider than railsWider than rails
Wider than rails
 
About Data::ObjectDriver
About Data::ObjectDriverAbout Data::ObjectDriver
About Data::ObjectDriver
 
Scala, docker and testing, oh my! mario camou
Scala, docker and testing, oh my! mario camouScala, docker and testing, oh my! mario camou
Scala, docker and testing, oh my! mario camou
 
Rack
RackRack
Rack
 
RxSwift to Combine
RxSwift to CombineRxSwift to Combine
RxSwift to Combine
 
RxSwift to Combine
RxSwift to CombineRxSwift to Combine
RxSwift to Combine
 
JavaCro'14 - Scala and Java EE 7 Development Experiences – Peter Pilgrim
JavaCro'14 - Scala and Java EE 7 Development Experiences – Peter PilgrimJavaCro'14 - Scala and Java EE 7 Development Experiences – Peter Pilgrim
JavaCro'14 - Scala and Java EE 7 Development Experiences – Peter Pilgrim
 
Getting Started With Aura
Getting Started With AuraGetting Started With Aura
Getting Started With Aura
 
Greach 2019 - Creating Micronaut Configurations
Greach 2019 - Creating Micronaut ConfigurationsGreach 2019 - Creating Micronaut Configurations
Greach 2019 - Creating Micronaut Configurations
 
Security Goodness with Ruby on Rails
Security Goodness with Ruby on RailsSecurity Goodness with Ruby on Rails
Security Goodness with Ruby on Rails
 
Creating Reusable Puppet Profiles
Creating Reusable Puppet ProfilesCreating Reusable Puppet Profiles
Creating Reusable Puppet Profiles
 
Mini Rails Framework
Mini Rails FrameworkMini Rails Framework
Mini Rails Framework
 
Service discovery in mesos miguel, Angel Guillen
Service discovery in mesos miguel, Angel GuillenService discovery in mesos miguel, Angel Guillen
Service discovery in mesos miguel, Angel Guillen
 
Dockerize All The Things
Dockerize All The ThingsDockerize All The Things
Dockerize All The Things
 
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory CourseRuby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
 
Plack at YAPC::NA 2010
Plack at YAPC::NA 2010Plack at YAPC::NA 2010
Plack at YAPC::NA 2010
 
A reviravolta do desenvolvimento web
A reviravolta do desenvolvimento webA reviravolta do desenvolvimento web
A reviravolta do desenvolvimento web
 
Mobile Open Day: React Native: Crossplatform fast dive
Mobile Open Day: React Native: Crossplatform fast diveMobile Open Day: React Native: Crossplatform fast dive
Mobile Open Day: React Native: Crossplatform fast dive
 
Snakes on a Treadmill
Snakes on a TreadmillSnakes on a Treadmill
Snakes on a Treadmill
 
Practical ERSync
Practical ERSyncPractical ERSync
Practical ERSync
 

Similar to Feb 2018 Spinnaker Meetup Reddit Presentation

Automating Complex Setups with Puppet
Automating Complex Setups with PuppetAutomating Complex Setups with Puppet
Automating Complex Setups with PuppetKris Buytaert
 
Introducing the Seneca MVP framework for Node.js
Introducing the Seneca MVP framework for Node.jsIntroducing the Seneca MVP framework for Node.js
Introducing the Seneca MVP framework for Node.jsRichard Rodger
 
Crafting Evolvable Api Responses
Crafting Evolvable Api ResponsesCrafting Evolvable Api Responses
Crafting Evolvable Api Responsesdarrelmiller71
 
From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...Jérôme Petazzoni
 
node.js: Javascript's in your backend
node.js: Javascript's in your backendnode.js: Javascript's in your backend
node.js: Javascript's in your backendDavid Padbury
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileAmazon Web Services Japan
 
Introduction to NodeJS with LOLCats
Introduction to NodeJS with LOLCatsIntroduction to NodeJS with LOLCats
Introduction to NodeJS with LOLCatsDerek Anderson
 
Exciting JavaScript - Part II
Exciting JavaScript - Part IIExciting JavaScript - Part II
Exciting JavaScript - Part IIEugene Lazutkin
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStackPuppet
 
Icinga 2009 at OSMC
Icinga 2009 at OSMCIcinga 2009 at OSMC
Icinga 2009 at OSMCIcinga
 
Automating Your Workflow with Gulp.js - php[world] 2016
Automating Your Workflow with Gulp.js - php[world] 2016Automating Your Workflow with Gulp.js - php[world] 2016
Automating Your Workflow with Gulp.js - php[world] 2016Colin O'Dell
 
Ballerina Serverless with Kubeless
Ballerina Serverless with KubelessBallerina Serverless with Kubeless
Ballerina Serverless with KubelessWSO2
 
Ballerina Serverless with Kubeless
Ballerina Serverless with KubelessBallerina Serverless with Kubeless
Ballerina Serverless with KubelessBallerina
 
I Just Want to Run My Code: Waypoint, Nomad, and Other Things
I Just Want to Run My Code: Waypoint, Nomad, and Other ThingsI Just Want to Run My Code: Waypoint, Nomad, and Other Things
I Just Want to Run My Code: Waypoint, Nomad, and Other ThingsMichael Lange
 
JavaScript in 2015
JavaScript in 2015JavaScript in 2015
JavaScript in 2015Igor Laborie
 
Building a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkBuilding a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkLuciano Mammino
 
Original slides from Ryan Dahl's NodeJs intro talk
Original slides from Ryan Dahl's NodeJs intro talkOriginal slides from Ryan Dahl's NodeJs intro talk
Original slides from Ryan Dahl's NodeJs intro talkAarti Parikh
 

Similar to Feb 2018 Spinnaker Meetup Reddit Presentation (20)

Automating Complex Setups with Puppet
Automating Complex Setups with PuppetAutomating Complex Setups with Puppet
Automating Complex Setups with Puppet
 
Introducing the Seneca MVP framework for Node.js
Introducing the Seneca MVP framework for Node.jsIntroducing the Seneca MVP framework for Node.js
Introducing the Seneca MVP framework for Node.js
 
20120816 nodejsdublin
20120816 nodejsdublin20120816 nodejsdublin
20120816 nodejsdublin
 
Crafting Evolvable Api Responses
Crafting Evolvable Api ResponsesCrafting Evolvable Api Responses
Crafting Evolvable Api Responses
 
From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...
 
node.js: Javascript's in your backend
node.js: Javascript's in your backendnode.js: Javascript's in your backend
node.js: Javascript's in your backend
 
Amazon EC2 Container Service
Amazon EC2 Container ServiceAmazon EC2 Container Service
Amazon EC2 Container Service
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
 
Introduction to NodeJS with LOLCats
Introduction to NodeJS with LOLCatsIntroduction to NodeJS with LOLCats
Introduction to NodeJS with LOLCats
 
Exciting JavaScript - Part II
Exciting JavaScript - Part IIExciting JavaScript - Part II
Exciting JavaScript - Part II
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStack
 
Icinga 2009 at OSMC
Icinga 2009 at OSMCIcinga 2009 at OSMC
Icinga 2009 at OSMC
 
Automating Your Workflow with Gulp.js - php[world] 2016
Automating Your Workflow with Gulp.js - php[world] 2016Automating Your Workflow with Gulp.js - php[world] 2016
Automating Your Workflow with Gulp.js - php[world] 2016
 
Ballerina Serverless with Kubeless
Ballerina Serverless with KubelessBallerina Serverless with Kubeless
Ballerina Serverless with Kubeless
 
Ballerina Serverless with Kubeless
Ballerina Serverless with KubelessBallerina Serverless with Kubeless
Ballerina Serverless with Kubeless
 
Nodejs - A quick tour (v4)
Nodejs - A quick tour (v4)Nodejs - A quick tour (v4)
Nodejs - A quick tour (v4)
 
I Just Want to Run My Code: Waypoint, Nomad, and Other Things
I Just Want to Run My Code: Waypoint, Nomad, and Other ThingsI Just Want to Run My Code: Waypoint, Nomad, and Other Things
I Just Want to Run My Code: Waypoint, Nomad, and Other Things
 
JavaScript in 2015
JavaScript in 2015JavaScript in 2015
JavaScript in 2015
 
Building a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless frameworkBuilding a serverless company on AWS lambda and Serverless framework
Building a serverless company on AWS lambda and Serverless framework
 
Original slides from Ryan Dahl's NodeJs intro talk
Original slides from Ryan Dahl's NodeJs intro talkOriginal slides from Ryan Dahl's NodeJs intro talk
Original slides from Ryan Dahl's NodeJs intro talk
 

Recently uploaded

How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 

Recently uploaded (20)

How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 

Feb 2018 Spinnaker Meetup Reddit Presentation

  • 1. Reddit Deployment Infrastructure: Past, Present, and Future Ed Ceaser - /u/heselite
  • 2. K8s At Reddit ● Multi-year project ● 10 separate k8s clusters ● 11 production services, across 7 teams ● Up to 10 deploys per day on some services
  • 4. Helm, Helmfile, Drone ● Helm provides basic release management ● Helmfile provides declarative Helm chart management ● Helmfiles stored in a central Git repo ● Drone CI runs in each K8s cluster, runs Helmfile
  • 5. What did we learn???
  • 7. Too brittle! ● Tiller sometimes would require manual intervention ● Drone only reports pass/fail, doesn’t understand deploys ● Too many layers, hard to debug ● Everything serialized around a single Git repo
  • 8. Requirements for Deploys V2 ● Self-service deploys for service owners ● Deploy needs to support Helm ● Deploy status should be directly visible to service owners ● Deployment system should simplify things for service owners, not complicate things
  • 10. Enter Spinnaker ● Other tools were either CI job executors, or very immature ● Spinnaker was relatively mature for non-k8s deployments ● V2 K8s Provider seemed to be a reasonable approach, under active development ● Had relationships with other users
  • 11. What are the risks?
  • 12. Solving Operational Complexity ● We run Spinnaker in k8s ● Use Halyard ● We use Helm to install and manage Halyard. Halyard then deploys Spinnaker ● We set up metrics and alerting immediately out of the gate
  • 13. Solving Developer Complexity ● Pipelines need to be templated ● Point-and-click style pipeline configuration needs to be avoided ● Pipeline configuration should be declarative ● Managed Pipeline Templates 1.0 were already deprecated ● We discovered Sponnet, which gave us a way to address these issues
  • 15. Jsonnet Example { person1: { name: "Alice", welcome: "Hello " + self.name + "!", }, person2: self.person1 { name: "Bob" }, } { "person1": { "name": "Alice", "welcome": "Hello Alice!" }, "person2": { "name": "Bob", "welcome": "Hello Bob!" } } local Person(name='Alice') = { name: name, welcome: 'Hello ' + name + '!', }; { person1: Person(), person2: Person('Bob'), }
  • 16. reddit.jsonnet ● ‘Sponnet’ jsonnet library is still very low-level ● Sponnet mainly consists of helpers to construct pipeline primitives ● Users of Sponnet still need to understand how Spinnaker pipelines work ● We created a Jsonnet library that exposes a simple DSL ● That DSL is what is exposed to service owners to configure their pipelines
  • 17. Deployment Definition Example local reddit = import 'reddit.libsonnet'; local chart = reddit.helmChart(name=’derp', version='0.1.0') .withGithubValues(‘reddit/reddit-service-derp’, ‘deploy/values.yaml’); local pipeline = reddit.pipeline(application='derp', name='Deploy') .deployHelmChart(chart, namespace=’derp’, cluster=['prod-ue1d', ‘prod-ue1e’]) .notifySlackChannel(‘derp-notifications’); reddit.render(pipeline)
  • 18. Which turns into... { "expectedArtifacts": [ { "defaultArtifact": { "kind": "default.s3", "name": "s3://helm-charts/derp- 0.1.0.tgz", "reference": "s3://helm-charts/derp- 0.1.0.tgz", "type": "s3/object", "version": "/derp-0.1.0.tgz" }, "id": "s3://helm-charts/derp-0.1.0.tgz", "matchArtifact": { "kind": "s3", "name": "s3://helm-charts/derp- 0.1.0.tgz", "type": "s3/object" }, "useDefaultArtifact": true, "usePriorArtifact": false } ], "keepWaitingPipelines": false, "lastModifiedBy": "anonymous", "limitConcurrent": true, "notifications": [], "parameterConfig": [ { "default": "master", "description": "Docker image tag", "hasOptions": false, "label": "Image Tag", "name": "image_tag", "required": false }, { "default": true, "description": "Deploy to prod-2-ue1d", "hasOptions": true, "label": "Deploy to prod-2-ue1d", "name": "prod-2-ue1d", "options": [ { "value": false } ], "required": false }, { "default": true, "description": "Deploy to prod-2-ue1e", "hasOptions": true, "label": "Deploy to prod-2-ue1e", "name": "prod-2-ue1e", "options": [ { "value": false } ], "required": false } ], "stages": [ { "expectedArtifacts": [ { "id": "derp", "matchArtifact": { "kind": "base64", "name": "derp", "type": "embedded/base64" } } ], "inputArtifacts": [ { "account": "s3", "id": "s3://helm-charts/derp-0.1.0.tgz" } ], "name": "Bake Manifest (derp) - prod-2- ue1d", "namespace": "derp", "outputName": "derp", "overrides": { "image.tag": "${parameters.image_tag}" }, "refId": "Bake Manifest (derp) - prod-2- ue1d", "requisiteStageRefIds": [], "stageEnabled": { "expression": "parameters["prod-2- ue1d"] == true", "type": "expression" }, "templateRenderer": "HELM2", "type": "bakeManifest" }, { "account": "prod-2-ue1d", "cloudProvider": "kubernetes", "manifestArtifactAccount": "embedded- artifact", "manifestArtifactId": "derp", "moniker": { "app": "derp" }, "name": "Deploy Manifest (derp) - prod- 2-ue1d", "overrideTimeout": true, "refId": "Deploy Manifest (derp) - prod- 2-ue1d", "requisiteStageRefIds": [ "Bake Manifest (derp) - prod-2-ue1d" ], "source": "artifact", "stageEnabled": { "expression": "parameters["prod-2- ue1d"] == true", "type": "expression" }, "stageTimeoutMs": 600000, "type": "deployManifest" { "expectedArtifacts": [ { "id": "derp", "matchArtifact": { "kind": "base64", "name": "derp", "type": "embedded/base64" } } ], "inputArtifacts": [ { "account": "s3", "id": "s3://helm-charts/derp-0.1.0.tgz" } ], "name": "Bake Manifest (derp) - prod-2- ue1e", "namespace": "derp", "outputName": "derp", "overrides": { "image.tag": "${parameters.image_tag}" }, "refId": "Bake Manifest (derp) - prod-2- ue1e", "requisiteStageRefIds": [], "stageEnabled": { "expression": "parameters["prod-2- ue1e"] == true", "type": "expression" }, "templateRenderer": "HELM2", "type": "bakeManifest" }, { "account": "prod-2-ue1e", "cloudProvider": "kubernetes", "manifestArtifactAccount": "embedded- artifact", "manifestArtifactId": "derp", "moniker": { "app": "derp" }, "name": "Deploy Manifest (derp) - prod- 2-ue1e", "overrideTimeout": true, "refId": "Deploy Manifest (derp) - prod- 2-ue1e", "requisiteStageRefIds": [ "Bake Manifest (derp) - prod-2-ue1e" ], "source": "artifact", "stageEnabled": { "expression": "parameters["prod-2- ue1e"] == true", "type": "expression" }, "stageTimeoutMs": 600000, "type": "deployManifest" } ], "triggers": [ ...
  • 19. Rendering the Pipelines ● Continue to use Drone for rendering pipeline definitions ● We use the Spin CLI to talk to Spinnaker’s API ● Pipeline templating is done in a Docker container which is versioned ● This allows for managed upgrades to newer pipeline templates ● Container lints and validates pipelines
  • 20. Where are we today? ● Spinnaker has been in production for ~5 months ● Spinnaker drives all of our production k8s service deploys ● Services are deployed to multiple clusters ● All pipelines are rendered by Jsonnet, nothing managed by the UI ● Service owner response has been very positive ● Two Spinnaker deployments, production and staging
  • 22. Lessons Learned ● Simply providing tools is not enough if they’re hard to use ● Log aggregation is a must for Spinnaker ● Jsonnet is very powerful. Devs need to be encouraged to stay within guardrails ● Helm difficult abstraction for new k8s developers
  • 23. Future Improvements ● More visibility in Helm chart rendering ● Traffic control / service mesh support. ● Managed Pipeline Templates 2.0 ● Automated canarying
  • 25. Interesting Links ● Helmfile: https://github.com/roboll/helmfile ● Drone: https://drone.io/ ● Jsonnet: https://jsonnet.org/ ● Sponnet: https://github.com/spinnaker/spinnaker/tree/master/sponnet ● K8s-pipeliner: https://github.com/namely/k8s-pipeliner

Editor's Notes

  1. My name is Ed Ceaser, and I work at Reddit on the infrastructure team. More specifically, I work on release engineering, which is the group responsible for migrating Reddit’s infrastructure to k8s.
  2. First, I’ll give a brief introduction to K8s at Reddit. We’ve been working for the last year and a half or so to migrate our infrastructure to k8s. To ease operational overhead, we’re treating our k8s clusters as relatively throwaway/lightweight. This allows us to more easily operate our 10 separate clusters. Currently we have 11 production services running in k8s, managed across 7 teams. Some of these services are under heavy development, and can see 10 or more deploys per day. So deploys need to be fast and reliable.
  3. So where did we start with handling these deployments? Our initial priority was to get a Kubernetes workflow in the hands of devs as quickly as possible, so we could start learning lessons. So initially we prioritized using off-the-shelf tooling and existing systems that we already had to save time and reduce risk.
  4. The basic building block that we started with was Helm, the package manager for Kubernetes. Since we treat our k8s clusters as throwaway components, we needed some way to repeatably declare what should be deployed onto a cluster so we could recover the workload after replacing a cluster. For this, we used Helmfile, a tool which wraps Helm and allows one to specify a list of Helm charts to deploy onto a cluster. Helmfile then loops through this list and deploys each chart with Helm. We stored these Helmfiles in a single git repo. To orchestrate all of this, we use Drone, our Continuous Integration tool.
  5. We learned a great deal from this process :) Not everything we learned was technical, we learned a lot about the implicit assumptions that teams had about how deploys should work, which in many cases they weren’t aware of ahead of time either. So it was very beneficial that we had this V1 process in place to tease these requirements out.
  6. The main issue was that this system had too many disjointed components. The diagram here (the one on the right) is basically how everything was wired up, except we had a separate flow for each cluster, because at the time we didn’t have a story for cross-account AWS access that we would need to centralize the deployments. This made things difficult to operate in practice, failures in the process could be at any point in any of the parallel implementations of this deployment workflow.
  7. Compounding this was the brittleness of some of the components in this system. For example, Tiller, Helm’s deploy management component, has various dead ends in its state machine that caused releases to get stuck in broken states and required operator intervention. The other major issue is that Drone is simply a CI pipeline executor. It doesn’t have any first-class knowledge of deployments and monitoring their state. It just reports success or failure for its individual tasks. Furthermore, since we’re not even executing Helm directly for deployments (we’re executing helmfile which then runs a bunch of helm commands) the end result is that there is very little direct feedback to developers when they attempt to deploy their services. The feedback that developers do get is generally very opaque, and buried in noise. Finally, our choice of using a single git repo to store all of our cluster’s state means that all deployments to ANY cluster must be serial. This doesn’t improve from the status quo in our non-kubernetes infrastructure. And if we’re not improving on that infrastructure, there’s less organizational value in moving to Kubernetes.
  8. From all this we distilled things down to requirements for the next deployment system. Some of these seem obvious in hindsight, but it was very useful to explicitly identify them. A few of the more important requirements: Deployments should be self-service. No more serialization around a central git repo. This serialization slowed down deploys, service owners hated that. We also needed the deploy system to support deploying Helm charts. We didn’t want to couple changing the deployment system with changing how we expressed our k8s resources. We found that service owners REALLY needed deployment status to be easily visible. That means that we need to do better than just CI-style pass/fail. We need to, as much as possible, expose more granular deploy status to the service owners. And a final, more nebulous requirement; we didn’t want to introduce a system that was more complex than the previous for service owners, at least in terms of the interface it exposes. Onboarding devs onto Kubernetes is alot of work already, throwing a complex deployment system into the mix only makes that worse.
  9. So what did we do with those requirements?
  10. Since we didn’t have the time or appetite to build something in-house, we went hunting for deployment tools that may have cropped up since we initially started our k8s migration. We looked at OSS things like Argo CD (which was immature at the time), and some others. The obvious more mature choice was Jenkins, but jenkinsX, the k8s deploy plugin, was still very new at the time, and we weren’t really excited about the prospect of running jenkins JUST to do deployments, since we were already had Drone for CI. So, Spinnaker was the other major alternative at this point. At the time, the V2 kubernetes provider was still in beta, but we got some confidence from the fact that there were people successfully using it already, and we liked that the underlying spinnaker infra was relatively mature. So we decided to investigate and do a proof of concept. A few things stood out during the evaluation process which gave us a bit more confidence. We liked that Spinnaker was built for deployments. It has an underlying model for what it means to deploy a service and how to reason about its state. This is opposed to the standard CI-style pass/fail model. Secondly, we had relationships with other orgs / teams that were already using it. Datadog, Google, a few others. We liked that we got very honest feedback from these peers about what it is like to run Spinnaker.
  11. From our own proof of concept and from talking to peers who were using it, it was clear that Spinnaker wasn’t going to be a drop-in replacement for our existing system that also fulfilled the requirements we identified, at least not immediately. We distilled what we found down into two major sources of risk. Complexity for us as the operators. Complexity for the service owners deploying their services via Spinnaker.
  12. As operators, the primary goal was avoiding a state where we have to manage Spinnaker full-time. Our Infra team is small and resource constrained, very difficult to use technology that we have to manage, as the metaphor goes, as “pets” vs “cattle.” So, to reduce our complexity we tried to focus on making sure our Spinnaker deployment was as declarative / repeatable as possible, much like how we treat our Kubernetes clusters. The goal is to always be able to converge Spinnaker to a working state with minimal operator intervention. Primary thing that helps us achieve this goal is to run Spinnaker in kubernetes from the start. Keeping Spinnaker’s environment immutable prevents us from accidentally painting ourselves into a corner where we can’t safely redeploy everything from scratch. Second, we got our hands on an early version of a Spinnaker helm chart. We deploy Halyard with this helm chart, and then use halyard to deploy Spinnaker itself. We encode all halyard configuration into the Helm chart. We treat interacting with halyard directly as somewhat of a smell; and it should only be done when testing new things or responding to incidents. The chart also gives us a place to hook in custom logic specific to Reddit in a repeatable fashion. E.g. we pull our clusters’ k8s api credentials from vault when we deploy the helm chart. Finally, we made sure that we had metrics and alerting set up BEFORE we even started onboarding teams. This let us learn about how to measure and detect failures from the very start. Spinnaker’s relative maturity as a production deployment system worked to our benefit here, as its out-of-the-box metrics exposure is fairly complete.
  13. The other major pain point that that we found was the complexity of managing and configuring deployment pipelines. We wanted the ability to centrally manage deployment strategies so we can easily deliver new deployment patterns to services and teams without tons of organizational elbow grease. We also wanted to avoid point-and-click UI-driven style of pipeline management. This required teams to immediately learn about how Spinnaker pipelines are configured and work, e.g. about artifacts, etc. It also also meant that there was a bunch of boilerplate in common between all of our deployments (notifications, etc). Both these things would lead developers to cargo cult everyone else’s pipelines without really understanding them. This already happens with our helm charts, and so we knew it would make managing deploys difficult. Finally, this style of pipeline management meant that Spinnaker would be very stateful. Like with the deployment of Spinnaker itself, we really wanted to make sure that pipeline state was also declarative and repeatable. We wanted to hopefully store pipeline configuration in services’ git repos, like we do with Drone’s CI configuration. At the time, Spinnaker didn’t really provide a good story for this. Managed Pipeline Templates 1.0 were already deprecated, and there was nothing to replace them.. The pattern we saw in the community that we liked the best was from Namely. They built a tool, k8s pipeliner. This is a separate service that can take pipeline definitions expressed in YAML and write them to Spinnaker. This wasn’t an out of the box solution, however. We weren’t really excited about a. running a custom service to broker Spinnaker’s pipelines, and b. Having YET ANOTHER YAML configuration. YAML works, but as a markup language, it’s hard for developers to immediately understand how that structure maps to a deployment’s behavior. We then stumbled across Sponnet, a JSonnet library that popped up in the Spinnaker github repo. This was being built to support the nascent managed pipeline templates 2.0, but we found it to be immediately useful for our needs.
  14. For us, this was our first real encounter with Jsonnet. For those that haven’t worked with it before, Jsonnet is a formal templating language for rendering JSON (or YAML).
  15. Jsonnet is object-oriented. It also allows you to define methods that can implement repeatable logic, and write libraries that expose these methods and objects. These are a couple of simple JSonnet examples copied from their website. The top example shows the object orientation, where you can inherit and override one object from another. The bottom example shows how you can define functions that encapsulate repeatable structures.
  16. However, jsonnet is a relatively complex language, and Spinnaker’s JSON-encoded pipelines are themselves very complex. The goal here is to reduce complexity. The Sponnet Jsonnet library is useful for building Spinnaker’s pipelines, but is very low level. It mainly consists of helpers to construct Spinnaker pipelines’ JSON structures, but leaves it to you to hook everything together. So, Sponnet itself isn’t that useful to reduce the cognitive load on service owners just getting started with Spinnaker and possibly Kubernetes as well. After playing with JSonnet for a bit, I realized that it had a great deal of value not just as a template engine for JSON, but as a way to express a custom Domain Specific Language to generate JSON. Wrote a JSonnet library that wraps sponnet and abstracts away Spinnaker’s pipelines to provide a simple fluent DSL that is easy for service owners to understand.
  17. Here’s an example of what this looks like. We have methods that allow you to define helm charts and the cluster to deploy these charts to. Additional methods to wire up notifications, github triggers, and add helm values files from Git. The library handles wiring up all the spinnaker primitives; artifact management, stages, and the like, and adds sensible defaults, like triggers for Github webhooks.
  18. So that PIpeline definition ends up turning into this, which is Spinnaker’s JSON-encoded pipeline.
  19. How do we orchestrate the rendering of the pipeline templates into Spinnaker? We continue to utilize Drone for that. In drone, each task in a CI pipeline is a Docker container. We built a container that drone can execute that has jsonnet, our jsonnet library, and the spin CLI which we use to write to Spinnaker’s API. The container is semantically versioned. This allows us to transparently deploy new container versions for non-breaking template changes by simply updating the version tags and pushing the container. We can then fix bugs in the template rendering or improve the pipelines without any involvement from service owners. Also lets us make breaking changes and opt teams into it safely by incrementing a major version number. The container lets us lint the pipelines as part of CI. It only hits the Spinnaker API to update the pipeline on merges to master.
  20. We’ve had Spinnaker in production now for the past 5 months or so. It drives all of our kubernetes deploys, some into multiple clusters, and gives direct feedback on deployment health to service owners. All pipelines are rendered by Jsonnet, none of them are managed by the UI. Service owners only interact with the UI to manually launch deploys. Spinnaker has been very stable and has required minimal effort to scale up, thanks to running it in Kubernetes. And because deploying Spinnaker is relatively easy for us due to the helm chart, we can keep a staging environment around for experimentation and for validating upgrades. This allows us to keep up with Spinnaker’s release cycle and opt into new features without much risk. This is important because the k8s v2 provider is still under rapid development.
  21. However we’ve learned some valuable lessons as a part of all of this.
  22. The most important lesson that we learned is that it’s not enough to provide tools, no matter how good they are, if they’re hard for your organization to use. This applies to both Spinnaker and Kubernetes itself. It’s important to think about the abstractions that you’re providing, so you can help encourage and guide inexperienced users. Additionally, there were a few small things about operating spinnaker that weren’t very obvious from the start. For example, it is very beneficial to have some form of log aggregation. Many failures are not exposed through Spinnaker’s UI, so you have to dig through all of the microservice logs to figure out exactly what went wrong and where. With time and experience this becomes easier as you know where to look, but it is very difficult to onboard new operators to managing spinnaker without tooling to help. On the template side, since Jsonnet is a powerful language, if developers want to, they can pretty much do anything. The risk here is that people start going nuts and we have a million different pipeline definition styles floating around. Important to encourage developers to stay within the DSL-provided guardrails. This means that the DSL needs to be well-documented, and that we need to stay on top of how teams want their deployments to work so we can support that as part of the DSL. If we lag behind teams’ needs, devs will start messing with the plumbing to implement what they want. Finally, Helm is great for templating k8s resources if you’re experienced with k8s already. But for devs new to k8s (which most of our devs are), it doesn’t really provide a useful abstraction, as its mainly a text templating engine on top of YAML. Devs still need to tackle the fairly steep k8s learning curve in order to get their services deployed. We address this somewhat by templating our helm charts, but that is adding ANOTHER layer of abstraction on top of Helm, so its not a perfect solution.
  23. There are some future improvements that we are excited to work on. Many of these have come up in the Spinnaker Kubernetes working group. One big annoyance is that rendering Helm charts in Spinnaker is a bit opaque, so it makes it hard to debug chart bugs. There really needs to be more visibility there. Currently we rely on standard Kubernetes deployment strategies to control service rollout. We’re in the process of evaluating service meshes for Kubernetes. And one of the promises there is we can have fine grained traffic control during deploys. Having Spinnaker have some awareness of this would be really great. Additionally, all of our templating work was done prior to the release of MPT 2.0. Some of the Jsonnet logic could potentially be moved into that. Another thing that’s come up recently is better support in Spinnaker for automated canaries. Right now our canary deploys are manual, as is our observation of the canaries. As our services grow, we definitely want to start automating that flow.
  24. Thats it You can reach me on twitter at @asdf, Or on reddit as heselite.