Michael Ewins 
@ewinsmi 
Highland Fling Sessions – 22 November 2014
Share some experiences shipping software 
products 
Share a playlist of 80’s music 
Can you spot all the songs and bands?
Don’t You (Forget About 
 Me) 
11 
Homepage showing a list of games 
Game page 
HTML5 Games 
Adverts 
Run it on a server 
Test on multiple devices 
Is That All?
Does it work? 
Will customers like it? 
Can it handle the load? 
Is it fast enough? 
How to measure activity? 
How to spot things going wrong? 
What to do when things go wrong? 
Who does what day-to-day? 
How can we make changes? 
How do customers get help? 
Do we keep customers safe? 
How will I get customers? 
What is the fulfilment process? 
Can the team continue to deliver?
Where does the data in the UI come from? 
How frequently will the data be updated? 
Does it behave the same on all channels? 
How to measure the success of this change? 
Does this require any new reports?
Identify pause points 
Quick to use 
Simple 
Exact 
The real world
Please, Please, Please Let Me Get 
What I Want 
 2 
Match existing platform capabilities 
PLUS custom / one-off solutions 
PLUS new items we’ve been waiting on 
PLUS migrate ALL the partners and 
customers seamlessly 
And do it in 18 months
Limit the scope 
Defer the rest 
Minimise risks 
Maximise the chance of success 
Get something sooner
Minimum 
Product 
Custom 
Logon 
Custom 
Payment 
Custom UI 
Migrate i18n 
Migrate 
Migrate 
Migrate 
Migrate 
+ i18n 
Migrate 
+ EN
Show progress 
Avoid surprises 
Remove potential for misunderstanding 
Build credibility 
Real feedback 
Sense of urgency
High Frequency Low Frequency 
High 
Impact 
1 - CRITICAL 
• Critical path 
• Blocks progress / Data loss 
• Crash 
• Severe performance 
2 – MAJOR 
• infrequently used 
• Crash 
• Complex / time 
consuming workaround 
Low 
Impact 
3 – INTRUSIVE 
• Likely to encounter 
• Simple workaround 
• Annoying cosmetic issue. 
4 – MINOR 
• Obscure 
• Cosmetic.
Build + The Light Is Always 
Green 
 33 
Commit to version control linked to issue 
tracking 
Triggers build & unit tests / code analysis 
Triggers creation of versioned release 
artifacts & release notes 
Triggers deployment & functional tests
Over the Wall 
 44 
42 people organised into feature teams 
Feature branches & merge to trunk 
Handover to Production Support 
‘Release train’ every 1-2 weeks 
39 major releases in 2 years (2012-2013) 
Each release took approx 5 - 6 hours 
Also patched releases in between major releases 
Let me explain some of the pain…
TRUNK 
Create 
branch 
Create 
branch Merge 
Feature development 
Update 
branch Merge
Bug fix 
(rev 100) 
CANDIDATE RELEASE 
2-1-0 branch 
TRUNK 
Create 
Merge bug fix 
needed in 2-1-0 
Bug fix 
(rev 101) 
New feature 
(rev 102) 
Merge bug fix 
to trunk 
2-1-0 
end of life 
Hot fix 
(rev 104) 
Merge bug fix 
to trunk 
And also 
2-2-0 
Create 2-2-0 branch 
Hot fix 
(rev 103) 
CANDIDATE RELEASE
Create branches 
Any open issues in trunk from handover? 
Run the standard tests (automated but slow) 
Any specific issues that need regressed? 
Rehearse the deployment 
Sign-offs 
Release to production 
Post release checking 
Enable feature flags
Small multi-disciplinary team of 8 people 
11 x Microservices 
Mixture of deployment models & tools 
Heavy reliance on test automation 
396 releases in 2014 (Jan – Nov) 
75% of team have performed deployments 
Each release takes 2 – 15 min
Easier to understand / transparent 
Align to small teams 
Independently deployable 
Easier to debug / test / change / replace 
Lower lead times & cycle times 
Less technical debt 
Easier to experiment with new ideas / tech
Stick to one version or release multiple 
versions of the same service? 
What happens if a change depends on 
multiple services? 
What about deploying database changes?
Perrfforrmance 
 55 
Ref: http://www.webpagetest.org/result/140304_0C_DAA/
DNS lookup 
TCP handshake 
HTTP request 
HTTP response
Review the purpose & cost of every asset 
Make fewer requests 
Make responses smaller 
Reduce round trip time 
Understand critical rendering 
Establish a performance budget & stick to it
Moonniittoorr 
 6 
Monitor from the outside 
 Response times 
 Transactions 
 RUM 
System level monitoring 
Email alerts link to the runbook 
Aggregate patterns in log files 
Graphite dashboard including release 
indicators
Frequency 
Location 
Scripting language 
 Goto URL 
 Form filling including logons 
 Interact via id / selector / label 
Versioning 
Escalation policy
It’s The End Of The World As We 
Know It 
 77 
A BOMB HAS LANDED IN THE COMPOUND... 
Henry: (reading instructions) And carefully cut 
the wires leading to the clockwork fuse at the 
head. 
(Trapper cuts the wires) 
Henry: But first, remove the fuse. 
(Everyone exchanges panicked looks, Trapper 
listens to the bomb with a stethoscope) 
Trapper: It stopped ticking. 
Hawkeye: Let's get the hell out of here. We've 
only got 2 minutes...maybe.
Public readme on GitHub for 
our Game SDK 
One page / 9 quick steps 
Post a score 
Save / restore game data 
Trigger adverts
Each service has a one pager on the wiki 
Overview Diagram 
Links to code + CI build + test artifacts 
Dependencies (link to APIs / integrations) 
API + how to use 
How to deploy, validate & rollback 
Monitoring (links) 
Troubleshooting advice 
Data & Database
Where are the inputs? 
What is happening? 
Where are the outputs?
Michael Ewins 
@ewinsmi
What the music of the 1980s taught me about shipping software

What the music of the 1980s taught me about shipping software

  • 1.
    Michael Ewins @ewinsmi Highland Fling Sessions – 22 November 2014
  • 2.
    Share some experiencesshipping software products Share a playlist of 80’s music Can you spot all the songs and bands?
  • 3.
    Don’t You (ForgetAbout  Me) 11 
  • 5.
    Homepage showing alist of games Game page HTML5 Games Adverts Run it on a server Test on multiple devices Is That All?
  • 6.
    Does it work? Will customers like it? Can it handle the load? Is it fast enough? How to measure activity? How to spot things going wrong? What to do when things go wrong? Who does what day-to-day? How can we make changes? How do customers get help? Do we keep customers safe? How will I get customers? What is the fulfilment process? Can the team continue to deliver?
  • 7.
    Where does thedata in the UI come from? How frequently will the data be updated? Does it behave the same on all channels? How to measure the success of this change? Does this require any new reports?
  • 8.
    Identify pause points Quick to use Simple Exact The real world
  • 9.
    Please, Please, PleaseLet Me Get What I Want  2 
  • 11.
    Match existing platformcapabilities PLUS custom / one-off solutions PLUS new items we’ve been waiting on PLUS migrate ALL the partners and customers seamlessly And do it in 18 months
  • 13.
    Limit the scope Defer the rest Minimise risks Maximise the chance of success Get something sooner
  • 14.
    Minimum Product Custom Logon Custom Payment Custom UI Migrate i18n Migrate Migrate Migrate Migrate + i18n Migrate + EN
  • 15.
    Show progress Avoidsurprises Remove potential for misunderstanding Build credibility Real feedback Sense of urgency
  • 16.
    High Frequency LowFrequency High Impact 1 - CRITICAL • Critical path • Blocks progress / Data loss • Crash • Severe performance 2 – MAJOR • infrequently used • Crash • Complex / time consuming workaround Low Impact 3 – INTRUSIVE • Likely to encounter • Simple workaround • Annoying cosmetic issue. 4 – MINOR • Obscure • Cosmetic.
  • 17.
    Build + TheLight Is Always Green  33 
  • 18.
    Commit to versioncontrol linked to issue tracking Triggers build & unit tests / code analysis Triggers creation of versioned release artifacts & release notes Triggers deployment & functional tests
  • 21.
    Over the Wall  44 
  • 22.
    42 people organisedinto feature teams Feature branches & merge to trunk Handover to Production Support ‘Release train’ every 1-2 weeks 39 major releases in 2 years (2012-2013) Each release took approx 5 - 6 hours Also patched releases in between major releases Let me explain some of the pain…
  • 23.
    TRUNK Create branch Create branch Merge Feature development Update branch Merge
  • 24.
    Bug fix (rev100) CANDIDATE RELEASE 2-1-0 branch TRUNK Create Merge bug fix needed in 2-1-0 Bug fix (rev 101) New feature (rev 102) Merge bug fix to trunk 2-1-0 end of life Hot fix (rev 104) Merge bug fix to trunk And also 2-2-0 Create 2-2-0 branch Hot fix (rev 103) CANDIDATE RELEASE
  • 25.
    Create branches Anyopen issues in trunk from handover? Run the standard tests (automated but slow) Any specific issues that need regressed? Rehearse the deployment Sign-offs Release to production Post release checking Enable feature flags
  • 26.
    Small multi-disciplinary teamof 8 people 11 x Microservices Mixture of deployment models & tools Heavy reliance on test automation 396 releases in 2014 (Jan – Nov) 75% of team have performed deployments Each release takes 2 – 15 min
  • 27.
    Easier to understand/ transparent Align to small teams Independently deployable Easier to debug / test / change / replace Lower lead times & cycle times Less technical debt Easier to experiment with new ideas / tech
  • 28.
    Stick to oneversion or release multiple versions of the same service? What happens if a change depends on multiple services? What about deploying database changes?
  • 29.
  • 30.
  • 31.
    DNS lookup TCPhandshake HTTP request HTTP response
  • 32.
    Review the purpose& cost of every asset Make fewer requests Make responses smaller Reduce round trip time Understand critical rendering Establish a performance budget & stick to it
  • 33.
  • 34.
    Monitor from theoutside  Response times  Transactions  RUM System level monitoring Email alerts link to the runbook Aggregate patterns in log files Graphite dashboard including release indicators
  • 37.
    Frequency Location Scriptinglanguage  Goto URL  Form filling including logons  Interact via id / selector / label Versioning Escalation policy
  • 38.
    It’s The EndOf The World As We Know It  77 
  • 39.
    A BOMB HASLANDED IN THE COMPOUND... Henry: (reading instructions) And carefully cut the wires leading to the clockwork fuse at the head. (Trapper cuts the wires) Henry: But first, remove the fuse. (Everyone exchanges panicked looks, Trapper listens to the bomb with a stethoscope) Trapper: It stopped ticking. Hawkeye: Let's get the hell out of here. We've only got 2 minutes...maybe.
  • 40.
    Public readme onGitHub for our Game SDK One page / 9 quick steps Post a score Save / restore game data Trigger adverts
  • 41.
    Each service hasa one pager on the wiki Overview Diagram Links to code + CI build + test artifacts Dependencies (link to APIs / integrations) API + how to use How to deploy, validate & rollback Monitoring (links) Troubleshooting advice Data & Database
  • 43.
    Where are theinputs? What is happening? Where are the outputs?
  • 44.

Editor's Notes

  • #3 Start – The Jam - 1980
  • #4 Simple Minds – peaked at number 7 in the UK Singles Chart on 4th May 1985 http://www.officialcharts.com/archive-chart/_/1/1985-05-04
  • #5 1. Greenfield HTML5 Games Portal for mobile / tablet / PC running on multiple channels 2. 6 weeks of effort: started development in first week of December 2013, 2 week holidays, and ready to launch by last week of January 2014. 3. Launched in mid-February 2014 as we needed decision on the launch games. 4. Team of 7: 3 back-end, 1 front-end, 1 Ops, 1 QA + me (and we also had to deal with other requests)
  • #6 What you need – opening tracking on 1984 album Listen Like Thieves by INXS Is That All? – from the U2 album October released 1981 These are multiple features – so each feature should have its own definition of done
  • #7 https://www.nccgroup.com/media/57196/ncc_group_55_killer_questions_website_performance.pdf Checklists help us avoid mistakes – where the knowledge already exists yet we run the risk of failing to apply it correctly
  • #8 Ripple effect Feature flags & deployment sequence – always deploy with feature switched off
  • #9 They help to improve outcomes with no increase in skill. Pause points: ready to start; ready to merge (a feature branch); and so on. 5-9 questions / points in the domain language Not a comprehensive how-to guide. We use checklists all the time in our development process. The Real World – album track by The Mighty Lemon Drops from their 1989 album Laughter
  • #10 Hatful of Hollow released in 1984. Number 28 in John Peel Festive 50 of 1984 There is a tension that can exist in some teams or between teams. In my experience this can sometimes arise between people in different locations not having an appreciation of the challenges in other half of the team. This can happen at product and feature level.
  • #11 Proposed in 2009 Delay in signing contracts into Jan 2010 (platform tech + external dev) May 2010: project tech handover by Architect – 1 week handover & scope not defined. Go live was meant to be September 2010. Phoenix – The Cult “Love” album
  • #12 May 2010: project handover; External dev to internal team; Architect – 1 week handover & scope not defined. Go live was meant to be September 2010. Existing: Multi-tenant (275 channels), i18n, Ecommerce (merchandise, catalog, checkout / tax, fulfilment), Support, GS subscriptions, digital downloads & fulfilment, analytics & reporting, emails, etc. Custom: SSO, Billing, UI, Text, Feeds New: time to make decision + merchandising workflow, personalisation / recommendations, paypal + adyen, UI flows, client software, ratings, FB integration, etc Some new capabilities gained by ecommerce platform being used. 18 months: means 30 partners per month The Clash – technically London Calling released 14-Dec 1979 but really an album most listened to in the 80s.
  • #13 https://twitter.com/paulg/status/389213926769958912 A wish away – The Wonder Stuff – 8 legged groove machine
  • #14 V2 – single by That Petrol Emotion in 1984. Re-released on their Manic Pop Thrill album Core platform: content merchandising workflow + channel merchandising + storefront + purchase / fulfilment + pay & get paid + user support (self-service UI) 1 partner + no custom integrations (SSO / payment) + no custom UI + English language only GameSaver was initially deferred but added before we launched ATG 9 / Commerce Refrence Store – working software
  • #15 Iterate and evolve Extend platform >> what channels can we migrate now? Started Feb 11 and went live in Aug 11. We migrated approx 85 partners in total including custom capabilities. This approach gave us a path to evolve and split the work. We shutdown approx 150 partners. We never did 2-byte support + custom OEM features.
  • #16 Distributed team exagerates the communication issues (product team in NY) We gave our first demo after 2 weeks and then gave weekly demos This applies to demos or the full feature / product Ultimately these steps are core to what we did after we were acquired by iWin.
  • #17 QA reported issues: blame culture historically existed if they missed things Product team: every bug is MUST fix We wanted to remove the subjective nature of issues Out by 1 pixel bugs Lack of test coverage – more bugs, more stuff you will never fix
  • #18 Double A-side from the People Who Grinned Themselves to Death (second abum) Build - Reached number 41 in the New Zealand charts on 1987 Norman Cook – Fat Boy Slim Paul Heaton – The Beautiful South Stan Collimore Hugh Whittaker – jail / axe
  • #19 I would often use this as an interview question – not for right or wrong answer but to see if the candidate would understand the change process. For some of our services we can do this process in < 15 mins – automation, some steps are optomised out. Rails apps (branch) v varnish (release package) Small atomic check-ins Code reviews – sometimes pre-checkin & sometimes post check-in
  • #20 We have LOTs of products / services – this is a subset. REPEATABLE Visible: dashboard & email notifcation on failure (for some builds we notify on success) Fire and forget – it worked locally Fast: individual builds/tests (<10 mins), slow builds break flow, and queue of builds (competing for build servers) Fix the build – halt the line Last Success – want to build everything at least weekly. From 1988 JAMC album Barbed Wire Kisses.
  • #21 Start with 100% or as close to it and stick to it as you go. If you don’t start that way it can be difficult to find time & have the discipline to plug the gaps. We ran our Java coverage overnight. Needs instrumented code. Takes approx 10 more minutes (i.e. 45 mins) Rails we run as is.
  • #22 Echo and the Bunnymen – Heaven Up Here – 1981 This song makes me think about software deployment. Software is sometimes thrown over a wall for someone else to deploy to production and make it work end-to-end. http://c2.com/cgi/wiki?ThrownOverTheWall You can write code but there are often whole other challenges in getting it out the door. Two scenarios http://www.slideshare.net/beamrider9/continuous-deployment-at-etsy-a-tale-of-two-approaches
  • #23 Painful to merge Long lived branches were painful to keep up to date If you missed the train then business pressure to get it out. Every release was a big deal. Microservices: recommendations, adverts, FES, etc See https://svn.iplay.com/atg/internal/site/tags/archives/release/ for archive
  • #24 Branches are good to avoid poluting trunk But merging can be painful (merge conflicts) Test feature before re-merge and again afterwards - WASTEFUL And long lived branches can be painful cos you merge in both directions We were also using feature flags so that we could try to get code out quicker.
  • #25 QA dominant
  • #27 Microservices: chat, logon, Catalog, Event, front end, advert, etc. Deployment models: 1. Advert: Build/unit test >> deploy / cucumber >>GATED deploy to production (no stage environment). Major change we will deploy a new production instance and cutover that way. 2. Varnish: deploy & test against mock (sinatra) >> auto build a release >> GATE deploy to production 3. MP + CMS: build / test + deploy / cuke >> candidate branch + stage env (preview / training) >> production. We don’t have version branches or feature branches. We develop to trunk and fix broken builds immediately. Usually there is not many hours or days difference between trunk and production branch.
  • #28 For devs / ops / non-devs Less stuff to cram in your head – less code / faster Lead time: from an idea being proposed to being in production Cycle time = when start working to when ready for delivery Difference between trunk and production is minimal Less ripple effect (well defined APIs) The barrier to testing / deploying is lower Find a bug: just fix it and get it out
  • #29 1. Whatever means zero downtime & less complexity - Whatever works in your context - Tidy up afterwards 2. Roll out one service at a time and don’t break things. A simple approach is to make sure it works as is and enable via feature flags Add things don’t change (e.g. adverts) Enable code with feature flags Tidy up after wards 3. DB Add table / column and populate data Roll out code that uses it Roll out code that stops using a column or table in db When nobody using it then delete
  • #30 Happy Mondays – Performance is an album track from their 1988 album Bummed I saw them support James at the QMU and went out and bought that album the next week.
  • #31 1. HTML page download – the initial requst 2. Critical path JS / CSS 3. Other resources http://www.webperformancetoday.com/2014/03/18/waterfalls-101-how-to-use-a-waterfall-chart-to-diagnose-performance-pains/
  • #32 1. Dark green = DNS lookup. One lookup per domain. Try to use fewer domains & pre-fetch. 2. Orange = TCP Connection. Round trip. Browsers can have approx 6 connections to the same domain. Use a CDN – closer. HTTP 1.1. 3. We’re ignoring SSL / HTTPS – it will incur 1-2 additional round trips 4. Bright green = Time to first byte. This is the time from the request being made until the first byte is served back. Fewer requests. Concatenate. Sprites. Leverage the browser cache. Server time. 5. Blue = content download. This indicates how long it takes for a server to fully serve the requested resource. Compression, minify, gzip, review all headers. Congestion window.
  • #33 Critical rendering: async JS, inline critical CSS. Speed Index Budget – in tech language or measure that matters to users Mobile web: less than 50 requests, less than 600K, 0 redirects, etc. Mobile web: page speed of less than 2000 on 3G HTML5 game: less than 500K of audio, one audio sprite, less than 50 resource requests, etc. HTML5 game: fully loaded in less than 10 seconds on cable
  • #34 Monitor – Siouxsie and the Bansheesfrom the 1981 album Juju Featuring John McGeoch from Greenock. He was also a member of Public Image Ltd Self-healing systems High availability – survive single point failures We call / text out of hours via Pager Duty if user impact (i.e. do something now) We email for other monitoring alerts - thresholds
  • #36 We have minimum of two instances of all apps – sometimes behind R53 or ELB Opsview CPU, Memory, Load Average, Disk space, specific processes, etc Alert on thresholds
  • #38 Monitor from end user perspective Pagerduty integration – this means we don’t have people sitting watching servers
  • #39 REM – Document - 1987 1. Sometimes / often dev teams want to run a mile if they are asked to document anything 2. Lots of services then you need documentation. Even if you have things automated. 3. The code is the documentation is sometimes the attitude but that is unhelpful. http://martinfowler.com/bliki/CodeAsDocumentation.html 4. It’s all about seeing what you are doing from the shoes of someone else.
  • #40 This was broadcast in 1973 but I watched it in the 1980s
  • #41 Who: a game developer who wants to start earning money from their game asap Simple instructions – a list. Doable in under an hour for a HTML5 game developer. Can dig deeper if necessary. Writing simple instructions for a different audience does not come naturally to all developers.
  • #42 One page Our team is small – we have more services than people. Some services see frequent development / release – some are more mature and releases are infrequent. Easy to forget steps. If you are context switching. We have a template that we use to write up each of our services. Many of these steps are automated but still worth having the notes as a memory aid.
  • #43 Quiet Life – Japan. Released 4 January 1980 Think about what happens if the server has issues at 3am Welcome message – status & cut & paste commands Apps all in same place
  • #44 Log files are also documentation Can I find the log file? Can someone who didn’t write the code diagrnose what is happening by reviewing log file? Poor logging means you need to patch production with improved logging before you can diagnose an issue. INFO / WARN / ERROR Time stamp + before / input + after / output for each step. Report error context.