2. “Continuous deployment involves
deploying early and often, so as to
avoid the pitfalls of "deployment hell".
The practice aims to reduce timely rework and thus
reduce cost and development time.”
3. “Our highest priority is to satisfy
the customer through early
and continuous delivery of
valuable software.”
- Principles behind the Agile Manifesto (2001)
4. The Hard Problems
(The ones I’m going to cover)
• Data
• Mobile
• Scale
5. Data makes CD harder
• Updating data is slow
• Moving data is slow
• Schemas live outside code deploy process
• Rollbacks are often hard or impossible
6. Schema rev. N, N+1 compatible
• Add the column in production
• Push the code that writes to that column
• Optionally, run a data migration to populate
the existing rows with data
• Push the code that reads from that column.
7. Apply only cheap changes
• Only apply changes that are cheap enough to
not affect live traffic
• More complex changes split into tiny steps:
– Create new table
– Write to both
– Cut over eventually
– Drop old table
8. Apply change to standby
• Run two DB instances
• Apply change to standby
• Failover if successfully applied
• Might run 3rd db instance for availability
9. Blue/Green Deploy
• Run two copies of entire cluster
• All databases are replicated
• Lets you test, update and rollback both code
and schema in one step
13. Mobile
• Users must opt-in to every update
• iOS submissions take a week to be approved
• Luckily, lots of tools aimed at the space
14. Remember the basics
• CI server / automated tests are critical
• Can’t fallback on production alarms / rollback
• Hosted options: http://cisimple.com
15. Data-drive everything
• Build your views / content from data files
• Ping server for updates
• Hosted: http://appgrok.com/
– Lets you deploy txt, png and xib dynamically!
16. 99% HTML
• Entire app is a single UIWebView
• Glue native code to allow access to APIs
• Clutch.io is awesome (and FOSS now)
– Live reloading for local dev
– Streamlined deploys
– https://github.com/clutchio
17. Hybrid HTML/Native
• Core app is native
• Sections can be replaced by HTML
– i.e. Facebook stream entries fallback to HTML
• Infrequently used sections are 100% HTML
18. Recap
• Remember the basics
• Data-drive everything
• 99% HTML
• Hybrid HTML/Native
24. How do you make tests fast?
• Tests can exercise large amounts of code
without being slow
• Minimize system calls (no I/O, no disk)
• Minimize test data size
• Make sure all systems are cheap to
instantiate/teardown
• No external state makes tests more reliable
25. Run Tests in Parallel
• Multiprocess
• Multimachine
• Multi-VM
• Instant multi-VM: http://circleci.com
26. Hardware Scale
• CI Cluster will get huge
– Function of cumulative engineering man-months
– Rule of thumb: 10% of your cluster size
• You will need a CI/CD DevOps person
– CI cluster monitoring / alerting
– Configuration Management critical
27. Scale testing infrastructure recap
• Write the right kind of tests
• Make those tests as fast as possible
• Run those tests in parallel
28. People / Roles
• Sheriff
– Designated reverter / problem troubleshooter
– Common pattern (IMVU, Chromium, Firefox)
• CD “Product Owner”
– Held accountable for SLA / Performance
– Manage infrastructure backlog
29. Single trunk
• Do this until it doesn’t work for you
• Gets painful in the 16 – 32 developer range
• Faster commit->deploy reduces the pain
– But effort becomes prohibitive
30. “Try” pipeline
• Conceptually, a second tree that “doesn’t
matter” but still gets tested for feedback
• Buildbot implements a patch-pushing version
• Takes a significant amount of pressure off of
trunk builds
31. CI Server takes active role
• Server automatically reverts red commits
• Server merges green commits to trunk
32. Feature branches
• All incremental development happens on
branches, branches land when feature is
“ready”
• If “feature” is kept small, can be 2-3 per
engineer per week on average
• Less continuous, but scales much better
– Feature branches tested before merge
33. Merge tree
• Tree per team / feature
• Trees merged into trunk daily (if green)
• Scale up via tree of trees (of trees…)
• Again, less continuous
34. Federation
• Each team gets their own deploy pipeline
• Requires SOA / component architecture
• Each team can set their own CD pace
• “Enterprise Ready”
35. Recap
• Single trunk + Try pipeline / Autorevert
• Feature Branches
• Merge Tree
• Federation
About me: IMVU, Canvas, Continuous Deployment)Ground rules: I don’t demand your attention, please tweet / follow links while I’m talking. If you have questions, shoot up a hand. If I don’t see you, yell at me.
ContinousDeployment vs Continuous Delivery (next slide)
Very similarContinuous Delivery is “agile as it should be”Continuous Delivery does NOT advocate for deploy-on-commit (Book and term reaction to Continuous Deployment)Continuous Deployment advocates for MORE THAN deploy-on-commit. (August 2007, 5 years old!)At the end of the day, I don’t care. People seem to be using both terms interchangeably. Whatever you call it, however far you take it, it’s better than not deploying very often.
These are the problems I get asked about all the time.No silver bullets.
In general I assume some sort of LAMP stack.
Seems straightforward, but often means what would’ve been a 1 step process (with downtime) is now a 15 step process. BUT! If you follow it well, you can always step backwards, one small step at a time, in the event that it doesn’t work out / is wrong
Sharding your data and keeping your shards balanced (and measured) means you can quantitatively assess this, especially if you keep an offline copy of a single shard.i.e. keep 100k users per shard, letting you do almost anything to any O(user) table.
Not hitting on master-master / master-standby / etc distinctions, because the exact setup depends on your db / replication / required availability.
Hybrid SQL db and NoSQLdb, use each where appropriate
Most real world scenarios are a hybrid
Experience heavily biased to iOS, would love to hear Android perspective
Don’t just throw up your arms and say “CD can’t be done”Apparently cozying up with Apple can lower your submission turnaround time from 1 week to 1 day.
Trades development speed for performanceFor most apps that’s probably the right trade (users don’t really notice)
Get back the performance you were missing, and native scrollingLose some of the dynamic propertiesMuch more expensive to develop
Commit to deploy< 5 minute: stay in flow5 – 15 minutes: can keep working on feature15 minutes: failures are surprises and require expensive rewindingLocal dev loop< 2s: stay in tight flow2s – 10s: tab away from terminal, looser flow10s – 1min: start thinking or coding on next thing, failures require rewinding1m – 5min: significant rewinding, high distraction, painful testing (rolling chair jousting)
Example: All metrics green, everything looks great, but got to metrics by shaming anyone who breaks build. Culture of tip-toeing through the bulid system leads to reduced happiness (and reduced throughput!)Sidenote: Measure throughput! Deploys per engineer (avg/median/extremes), is it scaling up with org or are you getting less deploys?
GUI Testsi.e. SeleniumIntegration Testsi.e. uses the databaseUnit Testssmall fast tests with no external state
i.e. Don’t reload your country code or ipgeo tables, even into memory from disk, on every test
Highly recommend Buildbot when you hit scale. It’s proven at huge scale (500+ nodes) and growing, and allows way better pipeline customization at that scale
Test time is O(Cumulative man-months)Doubling staff means halving test time, in spite of ever-marching-on increase in tests
Github modelMost FOSS is moving to or using this model
Lots of people/process overheadClassical way of scaling up to extremely large teams
Fortune 500-proof.
Remember: Don’t overinvest up front. Better to do something simple until it doesn’t work, than to overbuild.Don’t underbuild either. Use