10th August 2016
John Clegg at ScaleConf
Performance as a
Feature
The story of the Impossible Mission Force (IMF) Team at Xero
Performance as a Feature
Performance thinking as a part of dev
The Impossible Mission Force
Our team’s ultimate goal is to “self destruct”
Why
“Performance as a
Feature”
is important?
Customer satisfaction
Speed affect conversion rates
Cost to serveCost to Serve
Mobile experience
Why don’t we all do
“Performance as a
Feature” ?
Premature optimisation is EVIL
Premature optimization is the root of all
evil
- Donald Knuth
(1974)
The right optimisation is NOT EVIL
“We should forget about small efficiencies,
say about 97% of the time:
Premature optimization is the root of all
evil.
Yet we should not pass up our opportunities
in that critical 3%.”
-
Donald Knuth (1974)
We’ll look at performance at the
Functional vs non-functional
Features Features Features
We don’t know it’s a problem?
Product growth
Customers uses of the system gets bigger
People growth
How to make
“Performance as a
Feature”
happen !
1. Get business buy in
2. Get devs to take ownership
3. Education + culture
Make “Performance as a Feature”
happen!
Business buy in
Prove there is a problem
Quantify investment
Reduce Risk: Product Spikes
Product Spike
● A discovery story used to analyse or answer a
question
○ Yes - Further define story and continue
○ No - Save analysis
● Time-boxed
● Quantified against our goals
Performance is a feature
Adapt as your product / business grows
Get devs to take
ownership
Metrics
How fast are your pages?
Measure all things
Understand what metrics are important
Outliers
Decode metrics for the business
The “Pain” ratio
Find out what percentage of users will experience a given “percentile” time
in a user session
p = percentile
n = Average number of pages per session
1 - (1-p)n
The “Pain” ratio example
Example Scenario:
Average number of pages per session = 20 pag
95th percentile total page time for your site = 6s1 - (1-p)n
1 - (1 - 0.05)20 = 1 - 0.358 => 64.2 %
64% chance that a user will hit a 6s page
Get developers the
right tools
What’s going on in my stack?
Make performance tests easy
Performance A/B tests
Continuous testing
Feature flagging
Scientist
Education + Culture
Training + Workshops
Performance “Coach”
Work with the teams
Carrot + Stick
What does success
look like?
Teams take ownership
Added to “Definition of Done”
Code Quality
Takeaways
Enable teams to take ownership
Enable teams to take ownership
1. Get the right tooling in place
2. Own the metrics from the start
3. Automate performance testing
Show business the ROI
Show business the Return on
Investment
1. Prove you have a problem
2. Quantify Investment
3. Reduce Risk - Spikes
Education and Training
Education and Training
1. Make it easy to measure & test
2. “Hands on” training
3. “Coach” the teams
IMF team pic
Resources
Metrics
YouTube “How to NOT measure latency” - Gil Tene
YouTube “Fast and Resilient Apps” - Illya Grigorik (Google I/O 16)
Tools
“Scientist Measure twice, cut once” -
http://githubengineering.com/scientist/
“Feature Toggles” - Martin Fowler -
http://martinfowler.com/articles/feature-toggles.html
Pictures
https://www.flickr.com/photos/christian-dl/3754582699
https://www.youtube.com/watch?v=WhgcDeARanQ
https://www.flickr.com/photos/whatknot/15781706315
https://www.flickr.com/photos/developersteve/28219486931/
https://www.flickr.com/photos/deanhochman/13623115913
https://www.flickr.com/photos/srvega/15140121578/
https://www.flickr.com/photos/jarodcarruthers/8489647601
https://www.flickr.com/photos/87007001@N04/15126712086/
https://www.flickr.com/photos/conalg/17250403565
https://www.flickr.com/photos/99783447@N07/9431062947
https://www.flickr.com/photos/62141688@N08/16645686899/
https://www.flickr.com/photos/thomashawk/45974145
https://www.flickr.com/photos/abeles/1390183044
Questions

Performance as a feature - Scale conf

Editor's Notes

  • #3 I’m John Clegg Come from an Ops background Been involved with building and scaling websites for a long time I now work at Xero - Accounting software company which makes global accounting software
  • #4 My talk is about “Performance as a feature”
  • #5 This talk is really about how to make Performance “thinking” part of the dev and Ops culture
  • #6 It’s also the story of my team - Impossible Mission Force (IMF) - we are the performance and scalability team of Xero. We’re not a team the focuses exclusively on fixing slow web pages Our mission is to: Get right tooling in place Create Standard metrics for business Educate and train team Assist teams with learning perf tooling
  • #7 Our goal is once misson we’ll “self destruct”
  • #9 Research shows users hate slow pages 57% of users will abandon a site after 3 seconds What most users do is CTRL-T / something else!
  • #10 All the stats show us faster pages = faster conversions
  • #11 In the world of cloud infrastructure saving ms = saving $$$ We make a change to one our most popular pages and save a 1.3 seconds of time = 41,422 minutes of server time saved EVERY day
  • #12 Most NZ sites are not mobile friendly Slow websites = higher DATA = cost $$ and bad user experience
  • #14 We’ll figure it out when its a problem...
  • #15 We’ll figure it out when its a problem...
  • #16 Performance tuning at the last step. Or you simply run out of time in the rush to get features out the door. Sometimes you can’t do that because you need infrastructure changes and that can take time
  • #17 Non functionals like security and performance often are ignored or low priority
  • #18 The eternal push for features When feature usage ramps up - team has moved onto the next feature (Performance as part of V2 of a product)
  • #19 Minimal metrics or not the right metrics. Or customers are telling you that you are slow
  • #20 You’ve not take account of product growth Your metrics become a sea of data and you find out hard to spot issues.
  • #21 Internal processes need to change when # dev increases & distributed (ie new offices)
  • #25 Get data and metric of your site We delivered a “State of the Nation” performance report for the business. Put it in terms that the business can understand In customer terms - Number of customers who experience a problem every day . Percentages can mask the “real” impact Customer support terms - eg tickets Customer experience wasted – seconds Once our favourites
  • #26  We need to show progress of what we’re doing – Understand the investment in building metrics and tooling. We’re always thinking about ROI We have to be careful we don’t get trapped looking for the perfect solution There are always low hanging fruit and then optimisations get harder and long
  • #27 Figure how you can deliver incremental improvement Proof of concept spikes - help the business reduce risk and how teams to understand effort
  • #28 Teams need Time and resource is allocated to measure & test properly Part of “feature signoff” In practice this is something that can be measured and tested throughout the development process.
  • #29 Customers and product needs change You have to scale your performance metrics and testing to cater to the changes
  • #31 We want to know where teams were at with performance thinking So we started with what do they know about their pages
  • #32 We asked the teams a simple question How fast your pages in production? We got mixed result some teams knew and some teams didn’t Who was looking after features that didn’t have active teams. We realised we need to surface better metrics to teams
  • #33  Make all data available and shareable - Data dog + Sumo Train teams on how to use it - What to look for. We made templates, teams add application specific metrics to our templates
  • #34  We live in data and metrics , what’s important is important Synthetic vs Real user metrics Ie. median, averages, and percentiles can be affected Median, Average, 95th percentiles Think about worst cases, outliers “How not to measure Latency” - Gil Tene
  • #35 Convert metrics to # customers affected We converted some metrics to simple traffic light Eg. pages response time to %5 customers affected (How many customers are affected- State of the nation report)
  • #39 Find out how your code is running on your stack Application performance monitoring tools like Dev perspective - tools to help isolate and identify problems We’ve found the best ROI for these tools are when you are delivering new features and triage problems
  • #40 Simplify what’s needed for a team to get started, simple templates and training to get started On their own environments Conscientiously decided NOT to have a dedicated environment for testing
  • #41 You need to ability to test a before and after feature changes. Create a simple template to test before and after and to be able to compare results It’s important that you can identify subcomponents eg. Ajax calls to isolate potential changes
  • #42 Performance testing should be a part of the build process. Dev’s need to “flagged” early on that there are performance issues
  • #43 Feature flagging Not only the ability to turn a feature on / off Limit Internal users Subset of users percentages This enables the devs and business to gain confidence in the quality and perf of a feature
  • #44 Scientist path Popularised by github Run two code paths . Log results of second code path This enables devs to test in production and check results Really helps with edge cases
  • #45 We don’t know what we don’t know
  • #46 Facets - Making the training really approachable Two phased approach - Introductory Low entry / Practical . 2nd - Workshops to work on their own problems. aimed QAs + Senior devs
  • #47 Assist the teams and try not to do the work
  • #48 Attend team reviews , be part of technical kick off discuss Promote early discussion of getting performance metrics and testing
  • #49 Celebrate the wins Speed demon award – 2kg pack of jet planes The stick - putting warnings into build and eventually failing builds???
  • #51 Convert metrics to # customers affected Eg. pages response time to %5 customers affected (How many customers are affected- State of the nation report)
  • #52 Metrics Performance testing Github - story isn’t complete and until its fast
  • #53 Performance becomes part of code quality discussion It’s one of the criteria for pull requests Criteria for build success.
  • #55 Get them the tools Metrics Tools Training Carrot and Stick
  • #56 Get them the tools Metrics Tools Training Carrot and Stick
  • #57 Prove its a problem Show ROI Quantify investment
  • #58 Prove its a problem Show ROI Reduce risk Spikes, feature flagging, scientist
  • #59 Prove its a problem Show ROI Quantify investment
  • #60 Prove its a problem Show ROI Reduce risk Spikes, feature flagging, scientist