Agile Metrics...That Matter


Published on

Given at ALM Chicago 2013. For more info and highlights, visit

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Meet Bruno. Bruno is a portfolio manager at JP Morgan. He’s a smart guy, highly educated, put in decades of work before getting a senior position.Manages Synthetic Credit Portfolio as a hedge – this was supposed to be a safe position, an insurance policy if you will, to hedge against risk in other more riskier portfolios. Well, these things get pretty complex, and in April 2012, in a manner of days, it became clear that something was very wrong.6-9 billion dollar loss.
  • Matt Levine @ Dealbreaker:How should one read JPMorgan’s Whale Report? One way to read it is as a depressing story about measurement. There were some people and whales, and there was a pot of stuff, and the people and whales sat around looking at the stuff and asking themselves, and each other, “what is up with that stuff?” The stuff was in some important ways unknowable: you could list what the stuff was, if you had a big enough piece of paper, but it was hard to get a handle on what it would do. But that was their job. And the way you normally get such a handle, at a bank, is with a number, or numbers, and so everyone grasped at a number. Everyone tried to understand the pool of stuff through one or two or three numbers, and everyone failed dismally through some combination of myopia and the fact that each of those numbers was sort of horrible or tampered or both, each in its own special way. When we’re dealing with complex things (like a synthetic credit portfolio), it becomes harder and harder to manage it with metrics.
  • I’ve been there. I’ve come to believe that more metrics, more data, doesn’t necessarily mean more understanding.
  • When you’re on a long project – 6 months, a year or longer, we need someway to gauge these things.Developing software is a complex system that is mostly intangible. So we use these measurements as a window into that world. What’s going on here? When will we be done? What’s our quality like? Etc.It’s human nature to explain things we can’t see.
  • What do you think about this metric? Actually it’s a really bad one – there’s correlation/causation errors going on, and overall “project success” is way too complicated a system to judge based on one metric.Chaos Report from 1995 to 2010: project success rate goes from 16% to 30%
  • Agile takes all the worry and all that risk and packages it up into cute little time boxes. Agile inherently limits risk. Even if one of these boxes explode, the project isn’t a failure. And every few weeks we produce a valuable increment of product, we have the chance to inspect it and adapt our approach, reprioritize, replan etc. Managers no longer need to be worried about and have this anxiety over predicting project performance over months and months. We have real tangible results every few weeks. We can inspect it and determine the ACTUAL characteristics of the product that we used to use metrics to try to get at. Agile Projects inherently limit riskTime Boxes, WIP, DoD, AC, fast feedback(lead in) So that’s nice, but how do you define quality on this increment and on the product as a whole?
  • Two ways. In on any single increment we use the above mindset. These are not strict equations, I’m not doing any math here, it’s just a way to think about quality in the agile world. DoD: Shared definition among the team of what “done” means. Typically you see things like coding standards, unit test coverage, tests pass, deployable, reviewed, etc. Every piece of work must adhere to the DoD.AC: Product Owners business-language criteria for how a specific piece of work must function. Sometime written in the GIVEN-WHEN-THEN format, a practice associated with ATDD. So as we string increments of working software together, how do we get at the quality of the product? We use the mindset at the bottom for this.On the product level, it’s no longer so much about defining quality in a quantitative sense as it is about having a development process that can easily react to change. React to negative customer feedback as well as suggestions for new features and what’s most important to the customer at the moment.Stakeholders that don't show up at the Sprint Review will still be nervous, and rightly so.  The corollary is: every time a manager/stakeholder/etc. asks for a report, instead of giving it to them stress the importance of showing up at the Sprint Review.
  • You have clear development principles that help limit risk (DoD) (verification) and clear business objectives that help limit risk (Acceptance Criteria ) (validation). This ensures some base level of quality in your product, and then through frequent stakeholder and customer feedback, we ensure ongoing quality and value of product. Our chief metric in scrum is working software. That said, what other metrics do we need? Right?
  • Explain Hawthorne Experiment @ Western Electric. Select group of workers old they were being studied, and their productivity changed. All the researchers did was minutely change the lighting levels.Also called demand characteristics: refers to an experimental artifact where participants form an interpretation of the experiment's purpose and unconsciously change their behavior to fit that interpretation
  • For example, measuring test pass/fail status always causes pass percentage to rise. But it is an artificial rise, due to people not wanting to fail tests or splitting up tests into smaller and smaller units to drive the percentage calculation up (which is just creating waste).
  • If people can’t seem to find good examples:Hawthorne: The JCI example of one sprint before/after we starting measuring test pass/fail metrics. Test was originally failed, and then passed with a bug.Gaming:Schwaber’s example of clearchannel employee’s auto-generating the sprint burndown because managers were getting on their case about it.
  • If people can’t seem to find good examples:Hawthorne: The JCI example of one sprint before/after we starting measuring test pass/fail metrics. Test was originally failed, and then passed with a bug.Gaming:Schwaber’s example of clearchannel employee’s auto-generating the sprint burndown because managers were getting on their case about it.
  • Robert Austin. Measuring and Managing Performance in Organization. Nucor Steel. Based plant managers salaries on productivity – of ALL plants, not just theirs.The obvious example here is defect counts.Edward Demming, the noted quality expert, insisted that most quality defects are not caused by individuals, but by management systems that make error-free performance all but impossible.
  • Easy to Measure. Zero Value.
  • “There are so many possible measures in a software process that a selection of metrics will not likely turn up something of value” – Watts Humphrey Metrics used in isolation probably don’t measure what you think they do.-System is more complex than this. We’re probably not ever going to be able to measure enough to give us a simple indicator of the system. - Isolated metrics entice people to draw system wide conclusions.-> Primary/Secondary MetricBeware long hanging fruit. Also, old literature praises low hanging fruit!-> Just because we can measure something easily doesn’t actually mean it’s meaningful.
  • Ask: Does everyone agree this is a easy to gather metric? What is this metric really telling us? Stakeholders: “How come we have less tests than a few sprints ago? That can’t be right. We must not be testing enough.” Stakeholders: “On my last project we had thousands of tests, why are there only a couple hundred? That can’t be right, we must not be testing enough, I bet this thing is littered with bugs.”This is an example of things that are easy to measure, and things measured in isolation. The system – the software development machine – is far too complex to be making broad quality statements based on such isolated measurements. But we’re so used to doing that. So you can start to see that some traditional metrics might not really fit the bill. Let’s go on
  • In his 14 Points, Deming said “Eliminate management by numbers and numerical goals. Instead substitute with leadership.”  The more we rely on metrics to tell us what happened, the more we distance ourselves from the actual work being done.
  • We realize that measuring a system as complex as the software development machine, doesn’t really provide understanding, just data. Sometimes bad data, sometimes good data. And we realize that the obvious answer isn’t always right – like blaming bad developers for buggy products – “it must be the developers” – we respect that there is likely more going on in the system than any one root cause of anything. Further, if we use metrics the wrong way, we build games and systems that reward paying attention to the metric and not the success of the company.Overall we believe that being agile is important to the goal – our goal being making really good software products that have high value and delight customers. So we will use metrics that help us be agile. That encourage us to embrace lean and XP and good development practices.
  • Trends over static numbers: tear the labels off the y axisIs this setting up stakeholders to draw a system conclusion based on an isolated metric?Understand and respect the complex systemNo single prescription – figure out what makes sense for you. Take these considerations into account. We’ll go over a bunch of possible metrics next, but I’m not advocating a simple recipe for anyone. I’m certainly not saying you have to use all of these.
  • "you will see these, they are very useful for teams.  They aren't really what you should be chiefly interested in, in fact the more you care about these the more they garner negative Hawthorne effects, possibly gaming.  These are low level things that need to be driven from empirical data at the team level, so they can be honest and transparent with their work.  This is a good thing.  Too much focus here is too low level.  So as managers and executives, here are some ways you can measure up..."
  • Indicates team progress. A way to visualize what’s done and what’s WIP and what’s left to do. Tool to use to see when we’ll be DONE with a particular chunk of value.Don’t like hours? Don’t want a graph? Fine: use a task board, count tasks, stories-to-done, whatever. It’s just a tool so that you as a team know how work is progressing, and can visualize that and discuss it as a team.If it’s not given to management, there is little risk of negative hawthorne effect or gaming.
  • Forecasts what can get to DONE in a SprintMeasures throughput, not capacityNot individualsNo comparing across teamsNot really for management, certainly not for incentives (risk of gaming)
  • Helps the business know when a larger chuck of functionality might be DONE. Not really part of scrum but also something you usually can’t get away without doing. At least this method of planning is based on empirical evidence of past sprints velocity and what’s actually on the backlog now, and also look at the cone of uncertainty there – we’re not promising a date, we’re just giving a forecast as accurately as we can while still being able to sleep at night.Increments are great, and this tells us when enough increments put together will satisfy some large business objective.
  • Our chief metric is working software. Did we get to the end of the sprint and have potentially shippable product? How do you measure this? A simple thumbs up or thumbs down. Get everyone in a room and do it. Not good enough? Then document it. We keep a running go/nogo document.if you can gather everybody who had a hand in creating the increment and get them to give a thumbs up/thumbs down, this is more powerful than mngt by numbers.  Humans can dissect the complexity of software development, and they will, in the right environment, process all the information from the past sprint and come to a conclusion on whether or not the increment is good to ship.Try not to focus on what didn’t get done – keep the positive Hawthorne effect going by asking for and getting working software. Teams should be transparent on what doesn’t get done, but keep the focus positive.Why not just do this in waterfall? Get everyone in a room at after a year long project and give it the thumbs up? Well in some sense you do – we often ignore all those other metrics we’ve spent so long gathering. We rationalize sev 1’s down to 2’s, etc. In agile you can do this more safely because YOU HAVE CONTEXT. You have really good context and memory within a timebox. The risk is limited.
  • You will need to define what throughput means to you. We’ll talk about revenue here. You may define value/throughput in terms of cost savings, compliance to regulations, etc.
  • Alternative: Average Cycle Time per Feature.
  • Measuring Revenue is obvious. It’s the highest level we can go. But we still have correlation/causation problems. Without structuring specific experiments like Variant A/B testing or Cohort Analysis, we can never really know if our development dollars are a wise investment. Perhaps the revenue growth is due to our recent ad campaign or our awesome salesforce, or something else. Measuring Revenue/Feature in some way allows us to get at the specific ROI of developing specific features.Overall, measuring features delivered or value point velocity (from the previous slide) is dangerous if you don’t quickly take it to the next level: cold hard cash.
  • Modern social science and Positive Psychology has shown that happiness is a prerequisite to success. http://happiily.comEncourages self-awarenessLeading indicatorNameHow happy are you with Crisp? (scale 1-5)Last update of this row (timestamp)What feels best right now?What feels worst right now?What would increase your happiness index?Other comments
  • Isn’t this what we want?If we start with the premise that what we want is a zero defect product, we’re naturally driven to measure build-side things. Like defect counts and test statuses etc.But, if we look at this a different way, and say the opposite … is a high revenue, highly satisfied customer base, quickly changing and adapting product, we’re driven to measure other higher-level customer-side things, and lower level metrics seems less important.
  • The rationale behind the use of ACSOI is that marketing and subscriber acquisition expenses have value long into the future: they build a brand, therefore, they should be spread out over time. In the first quarter of 2011, Groupon reported a $117 million operating loss, but ACSOI was almost $82 million. That's because some $180 million of online marketing spending -- plus more than $18 million of stock-based employee compensation -- had been stripped out.
  • Andrew Mason Feb 28th, 2013
  • Agile Metrics...That Matter

    1. 1. Agile Metrics … That Matter Erik Weber
    2. 2. About ErikWork Stuff Me Stuff• Healthcare, Finance, • Huge foodie and amateur Green Energy cook• Huge Conglomerates, • Wearer of bowties Small Employee Owned, Fortune 500 • Homebrewer and beverage imbiber• Agile Solutions Manager• Scrum Coach & Trainer • Passionate about Agile (have multiple kanban boards up in• Passionate about Agile my house)
    3. 3. About a Whale…LET ME TELL YOU A STORY
    4. 4. © 1993 - 2012, All Rights Reserved 4
    5. 5. Bruno
    6. 6. Tester ManagerProjectManager Code Programmer Technical Writer Scalability Database Engineer Architect
    7. 7. Confession
    8. 8. AGENDA
    9. 9. •Why Metrics? •Measuring in Traditional vs. Agile•The Human Side of Metrics•Agile Metrics•Wrap up
    10. 10. History and comparison of traditional and agile environmentsWHY METRICS?
    11. 11. We Need Tangibles• As gauges or indicators - For status, quality, doneness, cost, etc.• As predictors - What can we expect in the future?• As decision making tools - Can we release yet?• A visual way to peer into a mostly non-visual world - Because we don‟t completely understand what‟s going on in the software/project and we need to
    12. 12. History• Tons of research, mostly from the 80-90‟s• Based on industrial metrics• Implementation of metrics in project management has grown exponentially• Hasn‟t really affected project success (what a metric!) Metrics Usage Software Project Success Rate 1980 1985 1990 1995 2000 2005 2010 Chaos Report from 1995 to 2010: project success rate goes from 16% to 30%
    13. 13. Traditional to AgileLong time horizon Short sprintsIntangible for months Inspection every sprintManual Risk Mitigation Inherent Risk MitigationMay sacrifice quality by Builds Quality Infixing schedule/scopeMany metrics used and Chief metric is workingneeded software
    14. 14. SCRUM BUILDS QUALITY IN Definition of Done+ Acceptance Criteria Quality Sprint Review+ Stakeholder and Customer Feedback Quality
    15. 15. The only metric that really mattersis what I say about yourproduct.
    16. 16. MetricsTHE HUMAN SIDE OF METRICS
    17. 17. The Hawthorne Effect• When you measure something, you influence it• You can exploit this effect in a positive way• Most traditional metrics have a negative Hawthorne effect• Gaming = Hawthorne Effect * Deliberate Personal Gain “Tell me how you will measure me and I will tell you how I will behave”
    18. 18. Hawthorne Effect 5 min• Where have you seen this in software development?• Where have you experienced gaming? – What have you gamed?• What do you measure now that might have negative Hawthorne effects or easily be gamed?
    19. 19. The Hawthorne EffectTRY AVOID• Identify positive/negative • Using metrics with Hawthorne effects on each negative Hawthorne metric that exists effects• Measuring things you want more of • Easily gamed systems• No-questions-asked policy • Measuring things you of reporting gaming (so you don‟t really want more can simply stop wasting your time gathering that of, and don‟t really have metric) an effect on outcomes
    20. 20. Measur e Up!
    21. 21. Measure Up• Austin Corollary: You get what you measure, and only what you measure; and you tend to lose others you cannot measure: collaboration, creativity, happiness, dedication to customer service …• Suggests “measuring up” – Measure the team, not the individual – Measure the business, not the team• Helps keep focus on outcomes, not output
    22. 22. Measure Up 5 min• What are some possible outcomes of the following common metrics: – Lines of code – Defects/person – Defects/week – Velocity
    23. 23. Measure Up 5 min• How about these? – Accepted Features or Features/Month – Revenue or Revenue/Feature – Customer Retention or Churn Rate – Net Promoter Score – Happiness
    24. 24. Measure UpTRY AVOID• Customer reported • Defects during development defects • Capacity/Efficiency• Team Throughput • Velocity (or worse: LoC)• Accepted Features • New Customers• Customer LTV • Cost• Value
    25. 25. The Measurement Paradox “Not everything that can be counted counts, and not everything that counts can be counted” – Albert Einstein• Software development is a complex system – Metrics used in isolation don‟t measure what you think they do – Stakeholders are focused on the system• Beware „low hanging fruit‟ – Value of Measurement = 1/Ease of Measuring
    26. 26. Easy to Measure. Too Isolated. 5 min Number of Test Cases600500400300200100 0 December January February March
    27. 27. The Measurement ParadoxTRY AVOID• Measuring up! • “If we just had more• Making measurements data…” visible only at the • Management by metrics appropriate level• Measuring what really matters, and has a direct • Sets of easy-to-gather line-of-sight contribution metrics that purport to tell to outcomes you something about the system/outcome.
    28. 28. AGILE METRICS
    29. 29. Guiding Principles• We no longer view or use metrics as isolated gauges, predictors, or decision making tools; rather they indicate a need to investigate something and have a conversation, nothing more.• We realize now that the system is more complex than could ever be modeled by a discrete set of measurements; we respect this.• We understand there are some behavioral psychology concepts associated with measuring people and their work; we respect this.
    30. 30. No Single Prescription• What really matters? – Listen to the customer – Trends over static numbers• Will this help us be more agile?• For each one, let‟s ask: – What is this really measuring? – Who is the metric for? Who should see it? – What behaviors will this drive? – What‟s the risk of negative Hawthorne effects or gaming? – Are we measuring at the right level? Up?
    31. 31. Metrics for the Team• These are primarily for the team (can be communicated to management) – Sprint Burndown – Velocity – Release Burndown• From the management level, intense focus or incentivizing on these is not good• Allow the team to use empirical data, and remain transparent and honest
    32. 32. Sprint Burndown
    33. 33. Velocity
    34. 34. Release Burndown
    35. 35. Metrics for Management• These are for the team and for management – Working Software – Throughput – Happiness• Higher level measurements (measure up!)• Positive Hawthorne effects
    36. 36. Working SoftwareCan everybody confidently give the “thumbs up” to the increment?
    37. 37. Throughput• Measures how much “stuff” is: – Getting Done – Adding Value – The right “stuff”• Need to view team AND business throughput simultaneously – Careful with correlation and causation – Empirical way to gauge value/spend• In place of direct capacity or productivity measures
    38. 38. Throughput 5 min• What does this mean to you? – How to you define “the right stuff”• How would you measure it? – What does “value” mean in your context?
    39. 39. Throughput: Team Delivered Features or Value Points2010.Q1 2010.Q2 2010.Q3 2010.Q4 2011.Q1 2011.Q2 2011.Q3 2011.Q4 2012.Q1 2012.Q2
    40. 40. Throughput: Business• Revenue • Revenue/Feature• If we‟re delivering – Revenue-data-driven features all the decision making time, how is that • Split A/B testing effecting revenue? – Does variant A or B• Are our development result in more revenue? efforts effecting • Cohort Analysis revenue? Or is it – How is revenue something else? changing across cross- sections of prospects/customers?
    41. 41. Throughput: what to look for
    42. 42. Happiness
    43. 43. Final Thoughts…WHERE IS THIS ALL GOING?
    44. 44. Build & Measure Learn Too LatePLAN RELEASE BUILD TEST HEADACHE
    46. 46. Shifting Mindsets What‟s the opposite of a fragile, defect- ridden, return-to-sender, crappy product?1st Premise: Better Premise:Zero Defects! High Value / High RevenueMeets requirements! High Customer Satisfaction A quick-to-change Agile Product
    47. 47. About a Pilgrim…LET ME TELL YOU ANOTHER STORY
    48. 48. ACSO I Adjusted Consolidated Segment Operating Income Income, without marketing costs or stock-based employee comp (basically) ACSOI looks good – let‟s go public!(and use ACSOI in S-1 filing to the SEC for our upcoming $1 Billion Dollar IPO )
    49. 49. (This is for Groupon employees, but Im posting it publicly since it will leak anyway)After four and a half intense and wonderful years as CEO ofGroupon, Ive decided that Id like to spend more time with my family.Just kidding - I was fired today. If youre wondering why... you haventbeen paying attention. From controversial metrics in our S-1 [IPO]to … the events of the last year and a half speak for themselves.…If theres one piece of wisdom that this simple pilgrim would like toimpart upon you: have the courage to start with the customer. Mybiggest regrets are the moments that I let a lack of data override myintuition on whats best for our customers. This leadership changegives you some breathing room to break bad habits and deliversustainable customer happiness - dont waste the opportunity!…
    50. 50. Final Thoughts• Measure Up. Start with the Customer.• Build it quick enough & often enough to make measuring on the build side irrelevant. Focus measurements on the Customer side. There‟s no place like Prod.• The only metric that really matters is what your customers say about your product. What are they saying about yours?
    51. 51. Erik Weber@erikjweber
    52. 52. Resources•Goldratt– The Goal•Mike Grifiths - Leading Answers: “Smart Metrics”• Elisabeth Hendrickson – Test Obsessed : “Question from the Mailbox:What Metrics Do You Use in Agile?”• Ian Spence – Measurements for Agile Software Development Organizations: “Better FasterCheaper Happier”• N.E. Fenton – “Software Metrics: Successes, Failures & New Directions”• Robert Austin– “Measuring and Managing Performance in Organization”• Mary Poppendieck – Lean Software Development “Measure Up”• Jeff Sutherland – Scrum Log: “Happiness Metric – The Wave of the Future”
    53. 53. ProfessionalScrum Masterby– Improving the Profession of Software Early Bird Pricing: sign up by Wednesday March 13Plus: 20% promo code “ALMCHICAGO”Erik WeberMar 27-28 2013 v3.1
    54. 54. May 9th fast. forward. thinking. 20% off Promo Code: ALMCHICAGO