SlideShare a Scribd company logo
CONTINUOUS DEPLOYMENT
                          The Dirty Details


Mike Brittain
ENGINEERING DIRECTOR   @mikebrittain
                       mike@etsy.com
CONTINUOUS DEPLOYMENT
       The Dirty Details
“OK, sounds cool. But I have some questions...”
CD 100- & 200-levels
- CI environment for automated tests
- Committing to trunk
- Branching in code
- Config flags (a.k.a. feature flags)
- DevOps mentality
- Metrics and alerting
- Automated deploy scripts




                                        credit: photobookgirl (flickr)
CD 100- & 200-levels
- CI environment for automated tests
- Committing to trunk
- Branching in code
- Config flags (a.k.a. feature flags)
- DevOps mentality                       CD 300 level
- Metrics and alerting                - Deploys vs. releases
- Automated deploy scripts
                                   - Decoupled systems, schema changes
                                   - How we work: Arch. & Process
                                   - Integration and Operations


                                                           credit: photobookgirl (flickr)
Continuous Deployment: The Dirty Details
www.   .com
GROSS MERCHANDISE SALES




http://www.etsy.com/blog/news/2013/notes-from-chad-2012-year-in-review/
DECEMBER 2012
                                                                                        1.5 Billion page views
                                                                                        $117 Million of goods sold
                                                                                        6 Million items sold




Items by anjaysdesigns, betwixxt, OneStarLeatherGoods, mediumcontrol, TheDesignPallet    http://www.etsy.com/blog/news/2013/etsy-statistics-december-2012-weather-report/
175+ Committers, everyone deploys




credit: martin_heigan (flickr)
DEPLOYMENTS PER DAY




       Very end of 2009
                          Today
Continuous delivery is a pattern language in growing use
          in software development to improve the process of
          software delivery.
                 Techniques such as automated testing, continuous integration,
           and continuous deployment allow software to be developed to a high
           standard and easily packaged and deployed to test environments,
           resulting in the ability to rapidly, reliably and repeatedly push out
           enhancements and bug fixes to customers at low risk and with
           minimal manual overhead.
                                                                            ~wikipedia
credit: Stewart, redgen (flickr)
Architecture Stack
                                  Linux, Apache, MySQL, PHP
                                  Memcache, Gearman, Postgresql, Solr,
                                  Java, Traffic Server, Hadoop, HBase
                                  Git, Jenkins




credit: Saire Elizabeth (flickr)
Continuous Deployment: The Dirty Details
Then             Now
     2009          2010-today

 Just before we
started using CD
Then                 Now
    6-14 hours            15 mins
“Deployment Army”         1 person

 Special event and    Part of everyday
highly orchestrated      workflow
Then           Now
Blocked for   Blocked for
6-14 hours.   15 minutes.

6+ hours to   15 minutes to
 redeploy       redeploy
Then                 Now
  Release branch,        Mainline,
 database schemas,     minimal linking
  data transforms,      and building,
     packaging,            rsync,
   rolling restarts,       site up
   cache purging,
scheduled downtime
1st day
Put your face on Etsy.com.
2nd day
                         Complete tax, insurance, and
                              benefits forms.




credit: ktpupp (flickr)
Continuous Deployment: The Dirty Details
WARNING
GROSS MERCHANDISE SALES
Continuous Deployment
      Small, frequent changes.
Constantly integrating into production.
        30+ deploys per day.
“Wow... 30 deploys a day.
How do you build features so quickly?”
Software Deploy ≠ Product Launch
Deploys frequently gated by config flags
             (“dark” releases)
$cfg[‘new_search’]   =   array('enabled'   =>   'off');
$cfg[‘sign_in’]      =   array('enabled'   =>   'on');
$cfg[‘checkout’]     =   array('enabled'   =>   'on');
$cfg[‘homepage’]     =   array('enabled'   =>   'on');
$cfg[‘new_search’] = array('enabled' => 'off');
$cfg[‘new_search’] = array('enabled' => 'off');

// Meanwhile...




# old and boring search
$results = do_grep();
$cfg[‘new_search’] = array('enabled' => 'off');

// Meanwhile...

if ($cfg[‘new_search’] == ‘on’) {
  # New and fancy search
  $results = do_solr();
} else {
  # old and boring search
  $results = do_grep();
}
$cfg[‘new_search’] = array('enabled' => 'on');

// or...

$cfg[‘new_search’] = array('enabled' => 'staff');

// or...

$cfg[‘new_search’] = array('enabled' => '1%');

// or...

$cfg[‘new_search’] = array('enabled' => 'users',
                           'user_list' => 'mike,john,kellan');
Validate in production, hidden from public.
What’s in a deploy?
Small incremental changes to the application
New classes, methods, controllers
Graphics, stylesheets, templates
Copy/content changes

Turning flags on, off, or % ramp up
Low MTTR (response times)
Latent bugs and security holes
Traffic management, load shedding
Adding and removing infrastructure

Tweaking config flags or releasing patches.
http://www.flickr.com/photos/flyforfun/2694158656/
Config flags
                          Operator
                                                        Metrics




http://www.flickr.com/photos/flyforfun/2694158656/
Favorites
 “on”
Favorites
 “off”
Many deploys eventually lead to a product launch.
“How do you continuously deploy
  database schema changes?”
Code deploys ~15-20 minutes
Schema changes
Code deploys ~15-20 minutes
Schema changes THURSDAYS!
Our web application is largely monolithic.
     Etsy.com, Support & Back-office tools,
     Developer API, Gearman (async work)
Our web application is largely monolithic.
     Etsy.com, Support & Back-office tools,
     Developer API, Gearman (async work)


                                PHP, Apache, Memcache
External “services” are not deployed with
          the main application.
e.g. Databases, Search, Photo storage, Payments
External “services” are not deployed with
            the main application.
   e.g. Databases, Search, Photo storage, Payments


    MYSQL                                                        PCI
                                           PROXY CACHE,
(schema changes)                                                 (controlled access)
                                        FILERS, AMAZON S3
                      SOLR, JVM
                                          (specialized infra.)
                   (rolling restarts)
For every config flag, there are two states
  we can support — present and future.
For every config flag, there are two states
  we can support — present and future.
                            ... or past and present.
“Non-Breaking Expansions”
 Expose new version in a service interface;
support multiple versions in the consumer.
Example: Changing a Database Schema
    Merging “users” and “users_prefs”
C
     RULE OF THUMB:
     Prefer ADDs over ALTERs (non-breaking expansion)
1. Write to both versions
2. Backfill historical data
3. Read from new version
4. Cut-off writes to old version
0. Add new version to schema
1. Write to both versions
2. Backfill historical data
3. Read from new version
4. Cut-off writes to old version
0. Add new version to schema
Schema change to add prefs columns to “users” table.

“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “off”
“read_prefs_from_users_table” => “off”
1. Write to both versions
Write code for writing prefs to the “users” table.

“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “off”
2. Backfill historical data
Offline process to sync existing data from “user_prefs”
to new columns in “users”
3. Read from new version
Data validation tests. Ensure consistency both internally
and in production.

“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “staff”
3. Read from new version
Data validation tests. Ensure consistency both internally
and in production.

“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “1%”
3. Read from new version
Data validation tests. Ensure consistency both internally
and in production.

“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “5%”
3. Read from new version
Data validation tests. Ensure consistency both internally
and in production.

“write_prefs_to_user_prefs_table” => “on”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “on”


(“on” == “100%”)
4. Cut-off writes to old version
After running on the new table for a significant amount
of time, we can cut off writes to the old table.

“write_prefs_to_user_prefs_table” => “off”
“write_prefs_to_users_table” => “on”
“read_prefs_from_users_table” => “on”
“Branch by Astraction”

                            Controller                               Controller




                                              Users Model                    (Abstraction)




“users” (old)                     “user_prefs”                                          “users”

                  old schema                                                          new schema


                              http://paulhammant.com/blog/branch_by_abstraction.html
     http://continuousdelivery.com/2011/05/make-large-scale-changes-incrementally-with-branch-by-abstraction/
“The Migration 4-Step”
1. Write to both versions
2. Backfill historical data
3. Read from new version
4. Cut-off writes to old version
“When do you clean up all of those config flags?
We might remove config flags for the old version when...
It is no longer valid for the business.
It is no longer stable, maintained, or trusted.
It has poor performance characteristics.
The code is a mess, or difficult to read.
We can afford to spend time on it.
Promote “dev flags” to “feature flags”
// Feature flag
$cfg[‘mobilized_pages’] = array('enabled' => 'on');

// Dev flags
$cfg[‘mobile_templates_seller_tools’]     =   array('enabled'   =>   'on');
$cfg[‘mobile_templates_account_tools’]    =   array('enabled'   =>   'on');
$cfg[‘mobile_templates_member_profile’]   =   array('enabled'   =>   'on');
$cfg[‘mobile_templates_search’]           =   array('enabled'   =>   'off');
$cfg[‘mobile_templates_activity_feed’]    =   array('enabled'   =>   'off');

...

if ($cfg[‘mobilized_pages’] == ‘on’ && $cfg[‘mobile_templates_search’] == ‘on’) {
     // ...
     // ...
}
// Feature flags
$cfg[‘search’]                = array('enabled' => 'on');
$cfg[‘developer_api’]         = array('enabled' => 'on');
$cfg[‘seller_tools’]          = array('enabled' => 'on');

$cfg[‘the_entire_web_site’]                       = array('enabled' => 'on');
Continuous Deployment: The Dirty Details
// Feature flags
$cfg[‘search’]              = array('enabled' => 'on');
$cfg[‘developer_api’]       = array('enabled' => 'on');
$cfg[‘seller_tools’]        = array('enabled' => 'on');

$cfg[‘the_entire_web_site’]                     = array('enabled' => 'on');
$cfg[‘the_entire_web_site_no_really_i_mean_it’] = array('enabled' => 'on');
Architecture and Process
Deploying is cheap.
Releasing is cheap.
Some philosophies on product development...
Gathering data should be cheap, too.
        staff, opt-in prototypes, 1%
Treat first iterations as experiments.
Get into code as quickly as possible.
“Where a new system concept or new technology is used, one has to build a
system to throw away, for even the best planning is not so omniscient as to
get it right the first time. Hence plan to throw one away; you will, anyhow.”

                                             ~ Fred Brooks, The Mythical Man-Month
Architecture largely doesn’t matter.
Kill things that don’t work.
Your assumptions will be wrong
    once you’ve scaled 10x.
“We don’t optimize for being right. We optimize
  for quickly detecting when we’re wrong.”
                                ~Kellan Elliott-McCrea, CTO
Become really good at changing
     your architecture.
Invest time in architecture by the
       2nd or 3rd iteration.
Integration and Operations
WARNING

          REMEMBER THIS?
Continuous Deployment
      Small, frequent changes.
Constantly integrating into production.
         30 deploys per day.
Code review before commit
Automated tests before deploy
Why Integrate with Production?
Dev ≠ Prod
Verify frequently and in small batches.
Integrating with production is a test in itself.
    We do this frequently and in small batches.
More database servers in prod.
Bigger database hardware in prod.
More web servers.
Various replication schemes.
Different versions of server and OS software.
Schema changes applied at different times.
Physical hardware in prod.
More data in prod.
Legacy data (7 years of odd user states).
More traffic in prod.
Wait, I mean MUCH more traffic in prod.
Fewer elves.
Faster disks (SSDs) in prod.
Using a MySQL database in dev for an application that will be running
on Oracle in production: Priceless
Verify frequently and in small batches.
Dev ⇾ QA ⇾ Staging ⇾ Prod
Dev ⇾ QA ⇾ Staging ⇾ Prod
Dev ⇾ Pre-Prod ⇾ Prod
Test and integrate where you’ll see value.
Config flags (again)
off, on, staff, opt-in prototypes, user list, 0-100%
Config flags (again)
off, on, staff, opt-in prototypes, user list, 0-100%

                    “canary pools”
Automated tests after deploy
Real-time metrics and dashboards
Network & Servers, Application, Business
Continuous Deployment: The Dirty Details
SERVER METRICS
Apache requests/sec, Busy processes,
CPU utilization, Script exec time (med. & 95th)
APPLICATION METRICS
Logins, Registrations, Checkouts,
Listings created, Forum posts


Time and event correlated.
Continuous Deployment: The Dirty Details
Real humans reporting trouble!
“Theoretical” vs. “Practical” Engineering
Config flags
                          Operator
                                                        Metrics




http://www.flickr.com/photos/flyforfun/2694158656/
Managing risk during Holiday Shopping season
  Thanksgiving, “Black Friday,” “Cyber Monday” ➔ Christmas
                           (~30 days)



                    Code Freeze?
DEPLOYMENTS PER DAY




                      “Code Slush”
Tighten your feedback cycles
Integrate with production and validate early in cycle.
Use tools that allow you to detect issues early.
Optimize for quick response times.

Applied to both feature development and operability.
Thank you
                                                           ... and questions?


These slides will be available later today at http://mikebrittain.com/talks



    Mike Brittain
     ENGINEERING DIRECTOR       @mikebrittain
                                mike@etsy.com
Ad

Recommended

超実践 Cloud Spanner 設計講座
超実践 Cloud Spanner 設計講座
Samir Hammoudi
 
Chaos Engineering
Chaos Engineering
Yury Roa
 
The Microservices world in. NET Core and. NET framework
The Microservices world in. NET Core and. NET framework
Massimo Bonanni
 
Chaos Engineering Kubernetes
Chaos Engineering Kubernetes
Alex Soto
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
HostedbyConfluent
 
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Ana Medina
 
Introduction to Chaos Engineering with Microsoft Azure
Introduction to Chaos Engineering with Microsoft Azure
Ana Medina
 
アラート、カスタムダッシュボード、タイムラインを運用現場で活用する
アラート、カスタムダッシュボード、タイムラインを運用現場で活用する
Elasticsearch
 
Docker volume
Docker volume
MyoungSu Shin
 
Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1
Hao H. Zhang
 
Cross Site Scripting - Web Defacement Techniques
Cross Site Scripting - Web Defacement Techniques
Ronan Dunne, CEH, SSCP
 
Automation with Packer and TerraForm
Automation with Packer and TerraForm
Wesley Charles Blake
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 
HBase スキーマ設計のポイント
HBase スキーマ設計のポイント
daisuke-a-matsui
 
Serverless
Serverless
Young Yang
 
CMDBuild Ready2Use紹介資料
CMDBuild Ready2Use紹介資料
OSSラボ株式会社
 
Introduction to Git Commands and Concepts
Introduction to Git Commands and Concepts
Carl Brown
 
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Databricks
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Building Repeatable Infrastructure using Terraform
Building Repeatable Infrastructure using Terraform
Jeeva Chelladhurai
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
ScyllaDB
 
マルチテナント化で知っておきたいデータベースのこと
マルチテナント化で知っておきたいデータベースのこと
Amazon Web Services Japan
 
Introduction to Git and Github
Introduction to Git and Github
Roland Emmanuel Salunga
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Grafana introduction
Grafana introduction
Rico Chen
 
From Pivotal to VMware Tanzu: What you need to know
From Pivotal to VMware Tanzu: What you need to know
VMware Tanzu
 
Pivotal Container Service Overview
Pivotal Container Service Overview
VMware Tanzu
 
DevOps: Infrastructure as Code
DevOps: Infrastructure as Code
Julio Aziz Flores Casab
 
Database compatibility
Database compatibility
Pascal-Louis Perez
 
Continuous Deployment at Etsy: A Tale of Two Approaches
Continuous Deployment at Etsy: A Tale of Two Approaches
Ross Snyder
 

More Related Content

What's hot (20)

Docker volume
Docker volume
MyoungSu Shin
 
Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1
Hao H. Zhang
 
Cross Site Scripting - Web Defacement Techniques
Cross Site Scripting - Web Defacement Techniques
Ronan Dunne, CEH, SSCP
 
Automation with Packer and TerraForm
Automation with Packer and TerraForm
Wesley Charles Blake
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 
HBase スキーマ設計のポイント
HBase スキーマ設計のポイント
daisuke-a-matsui
 
Serverless
Serverless
Young Yang
 
CMDBuild Ready2Use紹介資料
CMDBuild Ready2Use紹介資料
OSSラボ株式会社
 
Introduction to Git Commands and Concepts
Introduction to Git Commands and Concepts
Carl Brown
 
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Databricks
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Building Repeatable Infrastructure using Terraform
Building Repeatable Infrastructure using Terraform
Jeeva Chelladhurai
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
ScyllaDB
 
マルチテナント化で知っておきたいデータベースのこと
マルチテナント化で知っておきたいデータベースのこと
Amazon Web Services Japan
 
Introduction to Git and Github
Introduction to Git and Github
Roland Emmanuel Salunga
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Grafana introduction
Grafana introduction
Rico Chen
 
From Pivotal to VMware Tanzu: What you need to know
From Pivotal to VMware Tanzu: What you need to know
VMware Tanzu
 
Pivotal Container Service Overview
Pivotal Container Service Overview
VMware Tanzu
 
DevOps: Infrastructure as Code
DevOps: Infrastructure as Code
Julio Aziz Flores Casab
 
Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1
Hao H. Zhang
 
Cross Site Scripting - Web Defacement Techniques
Cross Site Scripting - Web Defacement Techniques
Ronan Dunne, CEH, SSCP
 
Automation with Packer and TerraForm
Automation with Packer and TerraForm
Wesley Charles Blake
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 
HBase スキーマ設計のポイント
HBase スキーマ設計のポイント
daisuke-a-matsui
 
Introduction to Git Commands and Concepts
Introduction to Git Commands and Concepts
Carl Brown
 
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Databricks
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Building Repeatable Infrastructure using Terraform
Building Repeatable Infrastructure using Terraform
Jeeva Chelladhurai
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
ScyllaDB
 
マルチテナント化で知っておきたいデータベースのこと
マルチテナント化で知っておきたいデータベースのこと
Amazon Web Services Japan
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Grafana introduction
Grafana introduction
Rico Chen
 
From Pivotal to VMware Tanzu: What you need to know
From Pivotal to VMware Tanzu: What you need to know
VMware Tanzu
 
Pivotal Container Service Overview
Pivotal Container Service Overview
VMware Tanzu
 

Viewers also liked (20)

Database compatibility
Database compatibility
Pascal-Louis Perez
 
Continuous Deployment at Etsy: A Tale of Two Approaches
Continuous Deployment at Etsy: A Tale of Two Approaches
Ross Snyder
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at Etsy
Mike Brittain
 
Continuous delivery for databases
Continuous delivery for databases
DevOpsGroup
 
Managing (Schema) Migrations in Cassandra
Managing (Schema) Migrations in Cassandra
DataStax Academy
 
From Building a Marketplace to Building Teams
From Building a Marketplace to Building Teams
Mike Brittain
 
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
John Allspaw
 
How to Get to Second Base with Your CDN
How to Get to Second Base with Your CDN
Mike Brittain
 
Take My Logs. Please!
Take My Logs. Please!
Mike Brittain
 
Continuous Deployment with Cassandra
Continuous Deployment with Cassandra
DataStax Academy
 
Advanced Topics in Continuous Deployment
Advanced Topics in Continuous Deployment
Mike Brittain
 
Fungus on White Bread
Fungus on White Bread
Gaurav Lochan
 
Scaling Up Continuous Deployment
Scaling Up Continuous Deployment
Timothy Fitz
 
Continuous Delivery in the AWS Cloud
Continuous Delivery in the AWS Cloud
Nigel Fernandes
 
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Mike Brittain
 
Metrics-Driven Engineering
Metrics-Driven Engineering
Mike Brittain
 
Web Performance Culture and Tools at Etsy
Web Performance Culture and Tools at Etsy
Mike Brittain
 
Analysis of TLS in SMTP World
Analysis of TLS in SMTP World
Binu Ramakrishnan
 
The Hard Problems of Continuous Deployment
The Hard Problems of Continuous Deployment
Timothy Fitz
 
AppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec Program
AppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec Program
Matt Tesauro
 
Continuous Deployment at Etsy: A Tale of Two Approaches
Continuous Deployment at Etsy: A Tale of Two Approaches
Ross Snyder
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at Etsy
Mike Brittain
 
Continuous delivery for databases
Continuous delivery for databases
DevOpsGroup
 
Managing (Schema) Migrations in Cassandra
Managing (Schema) Migrations in Cassandra
DataStax Academy
 
From Building a Marketplace to Building Teams
From Building a Marketplace to Building Teams
Mike Brittain
 
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
John Allspaw
 
How to Get to Second Base with Your CDN
How to Get to Second Base with Your CDN
Mike Brittain
 
Take My Logs. Please!
Take My Logs. Please!
Mike Brittain
 
Continuous Deployment with Cassandra
Continuous Deployment with Cassandra
DataStax Academy
 
Advanced Topics in Continuous Deployment
Advanced Topics in Continuous Deployment
Mike Brittain
 
Fungus on White Bread
Fungus on White Bread
Gaurav Lochan
 
Scaling Up Continuous Deployment
Scaling Up Continuous Deployment
Timothy Fitz
 
Continuous Delivery in the AWS Cloud
Continuous Delivery in the AWS Cloud
Nigel Fernandes
 
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Mike Brittain
 
Metrics-Driven Engineering
Metrics-Driven Engineering
Mike Brittain
 
Web Performance Culture and Tools at Etsy
Web Performance Culture and Tools at Etsy
Mike Brittain
 
Analysis of TLS in SMTP World
Analysis of TLS in SMTP World
Binu Ramakrishnan
 
The Hard Problems of Continuous Deployment
The Hard Problems of Continuous Deployment
Timothy Fitz
 
AppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec Program
AppSec++ Take the best of Agile, DevOps and CI/CD into your AppSec Program
Matt Tesauro
 
Ad

Similar to Continuous Deployment: The Dirty Details (20)

Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
Mike Brittain
 
ServerTemplate Deep Dive
ServerTemplate Deep Dive
RightScale
 
Working Software Over Comprehensive Documentation
Working Software Over Comprehensive Documentation
Andrii Dzynia
 
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
KenAtIndeed
 
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
Daniel Oh
 
Cloud Best Practices
Cloud Best Practices
Eric Bottard
 
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...
DevOpsDays Tel Aviv
 
Web app and more
Web app and more
faming su
 
Web Apps and more
Web Apps and more
Yan Shi
 
AD113 Speed Up Your Applications w/ Nginx and PageSpeed
AD113 Speed Up Your Applications w/ Nginx and PageSpeed
edm00se
 
The Ember.js Framework - Everything You Need To Know
The Ember.js Framework - Everything You Need To Know
All Things Open
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Timothy Fitz
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Eric Ries
 
How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...
Jos Boumans
 
Real-World Pulsar Architectural Patterns
Real-World Pulsar Architectural Patterns
Devin Bost
 
Everything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the Web
James Rakich
 
Dev Ops without the Ops
Dev Ops without the Ops
Konstantin Gredeskoul
 
App engine devfest_mexico_10
App engine devfest_mexico_10
Chris Schalk
 
The Web Scale
The Web Scale
Guille -bisho-
 
Google App Engine for Java v0.0.2
Google App Engine for Java v0.0.2
Matthew McCullough
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
Mike Brittain
 
ServerTemplate Deep Dive
ServerTemplate Deep Dive
RightScale
 
Working Software Over Comprehensive Documentation
Working Software Over Comprehensive Documentation
Andrii Dzynia
 
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
KenAtIndeed
 
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
[RHFSeoul2017]6 Steps to Transform Enterprise Applications
Daniel Oh
 
Cloud Best Practices
Cloud Best Practices
Eric Bottard
 
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...
DevOpsDays Tel Aviv
 
Web app and more
Web app and more
faming su
 
Web Apps and more
Web Apps and more
Yan Shi
 
AD113 Speed Up Your Applications w/ Nginx and PageSpeed
AD113 Speed Up Your Applications w/ Nginx and PageSpeed
edm00se
 
The Ember.js Framework - Everything You Need To Know
The Ember.js Framework - Everything You Need To Know
All Things Open
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Timothy Fitz
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Eric Ries
 
How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...
Jos Boumans
 
Real-World Pulsar Architectural Patterns
Real-World Pulsar Architectural Patterns
Devin Bost
 
Everything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the Web
James Rakich
 
App engine devfest_mexico_10
App engine devfest_mexico_10
Chris Schalk
 
Google App Engine for Java v0.0.2
Google App Engine for Java v0.0.2
Matthew McCullough
 
Ad

Recently uploaded (20)

A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Mastering AI Workflows with FME by Mark Döring
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
OpenPOWER Foundation & Open-Source Core Innovations
OpenPOWER Foundation & Open-Source Core Innovations
IBM
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 

Continuous Deployment: The Dirty Details

  • 1. CONTINUOUS DEPLOYMENT The Dirty Details Mike Brittain ENGINEERING DIRECTOR @mikebrittain mike@etsy.com
  • 2. CONTINUOUS DEPLOYMENT The Dirty Details “OK, sounds cool. But I have some questions...”
  • 3. CD 100- & 200-levels - CI environment for automated tests - Committing to trunk - Branching in code - Config flags (a.k.a. feature flags) - DevOps mentality - Metrics and alerting - Automated deploy scripts credit: photobookgirl (flickr)
  • 4. CD 100- & 200-levels - CI environment for automated tests - Committing to trunk - Branching in code - Config flags (a.k.a. feature flags) - DevOps mentality CD 300 level - Metrics and alerting - Deploys vs. releases - Automated deploy scripts - Decoupled systems, schema changes - How we work: Arch. & Process - Integration and Operations credit: photobookgirl (flickr)
  • 6. www. .com
  • 8. DECEMBER 2012 1.5 Billion page views $117 Million of goods sold 6 Million items sold Items by anjaysdesigns, betwixxt, OneStarLeatherGoods, mediumcontrol, TheDesignPallet http://www.etsy.com/blog/news/2013/etsy-statistics-december-2012-weather-report/
  • 9. 175+ Committers, everyone deploys credit: martin_heigan (flickr)
  • 10. DEPLOYMENTS PER DAY Very end of 2009 Today
  • 11. Continuous delivery is a pattern language in growing use in software development to improve the process of software delivery. Techniques such as automated testing, continuous integration, and continuous deployment allow software to be developed to a high standard and easily packaged and deployed to test environments, resulting in the ability to rapidly, reliably and repeatedly push out enhancements and bug fixes to customers at low risk and with minimal manual overhead. ~wikipedia credit: Stewart, redgen (flickr)
  • 12. Architecture Stack Linux, Apache, MySQL, PHP Memcache, Gearman, Postgresql, Solr, Java, Traffic Server, Hadoop, HBase Git, Jenkins credit: Saire Elizabeth (flickr)
  • 14. Then Now 2009 2010-today Just before we started using CD
  • 15. Then Now 6-14 hours 15 mins “Deployment Army” 1 person Special event and Part of everyday highly orchestrated workflow
  • 16. Then Now Blocked for Blocked for 6-14 hours. 15 minutes. 6+ hours to 15 minutes to redeploy redeploy
  • 17. Then Now Release branch, Mainline, database schemas, minimal linking data transforms, and building, packaging, rsync, rolling restarts, site up cache purging, scheduled downtime
  • 18. 1st day Put your face on Etsy.com.
  • 19. 2nd day Complete tax, insurance, and benefits forms. credit: ktpupp (flickr)
  • 23. Continuous Deployment Small, frequent changes. Constantly integrating into production. 30+ deploys per day.
  • 24. “Wow... 30 deploys a day. How do you build features so quickly?”
  • 25. Software Deploy ≠ Product Launch
  • 26. Deploys frequently gated by config flags (“dark” releases)
  • 27. $cfg[‘new_search’] = array('enabled' => 'off'); $cfg[‘sign_in’] = array('enabled' => 'on'); $cfg[‘checkout’] = array('enabled' => 'on'); $cfg[‘homepage’] = array('enabled' => 'on');
  • 29. $cfg[‘new_search’] = array('enabled' => 'off'); // Meanwhile... # old and boring search $results = do_grep();
  • 30. $cfg[‘new_search’] = array('enabled' => 'off'); // Meanwhile... if ($cfg[‘new_search’] == ‘on’) { # New and fancy search $results = do_solr(); } else { # old and boring search $results = do_grep(); }
  • 31. $cfg[‘new_search’] = array('enabled' => 'on'); // or... $cfg[‘new_search’] = array('enabled' => 'staff'); // or... $cfg[‘new_search’] = array('enabled' => '1%'); // or... $cfg[‘new_search’] = array('enabled' => 'users', 'user_list' => 'mike,john,kellan');
  • 32. Validate in production, hidden from public.
  • 33. What’s in a deploy? Small incremental changes to the application New classes, methods, controllers Graphics, stylesheets, templates Copy/content changes Turning flags on, off, or % ramp up
  • 34. Low MTTR (response times) Latent bugs and security holes Traffic management, load shedding Adding and removing infrastructure Tweaking config flags or releasing patches.
  • 36. Config flags Operator Metrics http://www.flickr.com/photos/flyforfun/2694158656/
  • 39. Many deploys eventually lead to a product launch.
  • 40. “How do you continuously deploy database schema changes?”
  • 41. Code deploys ~15-20 minutes Schema changes
  • 42. Code deploys ~15-20 minutes Schema changes THURSDAYS!
  • 43. Our web application is largely monolithic. Etsy.com, Support & Back-office tools, Developer API, Gearman (async work)
  • 44. Our web application is largely monolithic. Etsy.com, Support & Back-office tools, Developer API, Gearman (async work) PHP, Apache, Memcache
  • 45. External “services” are not deployed with the main application. e.g. Databases, Search, Photo storage, Payments
  • 46. External “services” are not deployed with the main application. e.g. Databases, Search, Photo storage, Payments MYSQL PCI PROXY CACHE, (schema changes) (controlled access) FILERS, AMAZON S3 SOLR, JVM (specialized infra.) (rolling restarts)
  • 47. For every config flag, there are two states we can support — present and future.
  • 48. For every config flag, there are two states we can support — present and future. ... or past and present.
  • 49. “Non-Breaking Expansions” Expose new version in a service interface; support multiple versions in the consumer.
  • 50. Example: Changing a Database Schema Merging “users” and “users_prefs”
  • 51. C RULE OF THUMB: Prefer ADDs over ALTERs (non-breaking expansion)
  • 52. 1. Write to both versions 2. Backfill historical data 3. Read from new version 4. Cut-off writes to old version
  • 53. 0. Add new version to schema 1. Write to both versions 2. Backfill historical data 3. Read from new version 4. Cut-off writes to old version
  • 54. 0. Add new version to schema Schema change to add prefs columns to “users” table. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “off” “read_prefs_from_users_table” => “off”
  • 55. 1. Write to both versions Write code for writing prefs to the “users” table. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “off”
  • 56. 2. Backfill historical data Offline process to sync existing data from “user_prefs” to new columns in “users”
  • 57. 3. Read from new version Data validation tests. Ensure consistency both internally and in production. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “staff”
  • 58. 3. Read from new version Data validation tests. Ensure consistency both internally and in production. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “1%”
  • 59. 3. Read from new version Data validation tests. Ensure consistency both internally and in production. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “5%”
  • 60. 3. Read from new version Data validation tests. Ensure consistency both internally and in production. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “on” (“on” == “100%”)
  • 61. 4. Cut-off writes to old version After running on the new table for a significant amount of time, we can cut off writes to the old table. “write_prefs_to_user_prefs_table” => “off” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “on”
  • 62. “Branch by Astraction” Controller Controller Users Model (Abstraction) “users” (old) “user_prefs” “users” old schema new schema http://paulhammant.com/blog/branch_by_abstraction.html http://continuousdelivery.com/2011/05/make-large-scale-changes-incrementally-with-branch-by-abstraction/
  • 63. “The Migration 4-Step” 1. Write to both versions 2. Backfill historical data 3. Read from new version 4. Cut-off writes to old version
  • 64. “When do you clean up all of those config flags?
  • 65. We might remove config flags for the old version when... It is no longer valid for the business. It is no longer stable, maintained, or trusted. It has poor performance characteristics. The code is a mess, or difficult to read. We can afford to spend time on it.
  • 66. Promote “dev flags” to “feature flags”
  • 67. // Feature flag $cfg[‘mobilized_pages’] = array('enabled' => 'on'); // Dev flags $cfg[‘mobile_templates_seller_tools’] = array('enabled' => 'on'); $cfg[‘mobile_templates_account_tools’] = array('enabled' => 'on'); $cfg[‘mobile_templates_member_profile’] = array('enabled' => 'on'); $cfg[‘mobile_templates_search’] = array('enabled' => 'off'); $cfg[‘mobile_templates_activity_feed’] = array('enabled' => 'off'); ... if ($cfg[‘mobilized_pages’] == ‘on’ && $cfg[‘mobile_templates_search’] == ‘on’) { // ... // ... }
  • 68. // Feature flags $cfg[‘search’] = array('enabled' => 'on'); $cfg[‘developer_api’] = array('enabled' => 'on'); $cfg[‘seller_tools’] = array('enabled' => 'on'); $cfg[‘the_entire_web_site’] = array('enabled' => 'on');
  • 70. // Feature flags $cfg[‘search’] = array('enabled' => 'on'); $cfg[‘developer_api’] = array('enabled' => 'on'); $cfg[‘seller_tools’] = array('enabled' => 'on'); $cfg[‘the_entire_web_site’] = array('enabled' => 'on'); $cfg[‘the_entire_web_site_no_really_i_mean_it’] = array('enabled' => 'on');
  • 74. Some philosophies on product development...
  • 75. Gathering data should be cheap, too. staff, opt-in prototypes, 1%
  • 76. Treat first iterations as experiments.
  • 77. Get into code as quickly as possible.
  • 78. “Where a new system concept or new technology is used, one has to build a system to throw away, for even the best planning is not so omniscient as to get it right the first time. Hence plan to throw one away; you will, anyhow.” ~ Fred Brooks, The Mythical Man-Month
  • 80. Kill things that don’t work.
  • 81. Your assumptions will be wrong once you’ve scaled 10x.
  • 82. “We don’t optimize for being right. We optimize for quickly detecting when we’re wrong.” ~Kellan Elliott-McCrea, CTO
  • 83. Become really good at changing your architecture.
  • 84. Invest time in architecture by the 2nd or 3rd iteration.
  • 86. WARNING REMEMBER THIS?
  • 87. Continuous Deployment Small, frequent changes. Constantly integrating into production. 30 deploys per day.
  • 90. Why Integrate with Production?
  • 92. Verify frequently and in small batches.
  • 93. Integrating with production is a test in itself. We do this frequently and in small batches.
  • 94. More database servers in prod. Bigger database hardware in prod. More web servers. Various replication schemes. Different versions of server and OS software. Schema changes applied at different times. Physical hardware in prod. More data in prod. Legacy data (7 years of odd user states). More traffic in prod. Wait, I mean MUCH more traffic in prod. Fewer elves. Faster disks (SSDs) in prod.
  • 95. Using a MySQL database in dev for an application that will be running on Oracle in production: Priceless
  • 96. Verify frequently and in small batches.
  • 97. Dev ⇾ QA ⇾ Staging ⇾ Prod
  • 98. Dev ⇾ QA ⇾ Staging ⇾ Prod
  • 99. Dev ⇾ Pre-Prod ⇾ Prod
  • 100. Test and integrate where you’ll see value.
  • 101. Config flags (again) off, on, staff, opt-in prototypes, user list, 0-100%
  • 102. Config flags (again) off, on, staff, opt-in prototypes, user list, 0-100% “canary pools”
  • 104. Real-time metrics and dashboards Network & Servers, Application, Business
  • 106. SERVER METRICS Apache requests/sec, Busy processes, CPU utilization, Script exec time (med. & 95th) APPLICATION METRICS Logins, Registrations, Checkouts, Listings created, Forum posts Time and event correlated.
  • 110. Config flags Operator Metrics http://www.flickr.com/photos/flyforfun/2694158656/
  • 111. Managing risk during Holiday Shopping season Thanksgiving, “Black Friday,” “Cyber Monday” ➔ Christmas (~30 days) Code Freeze?
  • 112. DEPLOYMENTS PER DAY “Code Slush”
  • 113. Tighten your feedback cycles Integrate with production and validate early in cycle. Use tools that allow you to detect issues early. Optimize for quick response times. Applied to both feature development and operability.
  • 114. Thank you ... and questions? These slides will be available later today at http://mikebrittain.com/talks Mike Brittain ENGINEERING DIRECTOR @mikebrittain mike@etsy.com