Your Own Metric System


Published on

“What should I work on next?” Code metrics can help you answer that question. They can single out sections of your code that are likely to contain bugs. They can help you get a toehold on a legacy system that’s poorly covered by tests.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hello, and welcome. I’m Ian.\n
  • By day, I make oscilloscopes. By night, I play guitar irresponsibly.\n
  • I also write books, mostly about Ruby topics. A group of us—me, plus two major contributors to the Cucumber test framework—are working on a new book of specific testing techniques. Ruby and its various test frameworks were my gateway drug to code metrics, though for this talk we’ll be concentrating on other languages.\n
  • Oscilloscopes have been available commercially since the 1940s. Their architecture changes slowly. Software needs to last, and it tends to last whether we wish for a rewrite or not. Our team’s exploration of this mix of old and new code led to our interest in code metrics.\n
  • And even if you’re not working on a large legacy code base, there are likely issues that we face in common.\n
  • There are a lot of forces that push on us and our teams. Today, I want to talk about two very different forces that have surprisingly similar effects: the entropy that drags our code down over time, and the apathy that drags us down personally over time.\n
  • How do we fight these forces? How do we keep our interest after our tenth straight hour wading into the weeds of an incomprehensible legacy routine? How do we prevent the code we write today from being someone’s nightmare tomorrow?\n
  • We have many tools in our chest; one is a good set of metrics—information about our code base. My hope is that you’ll consider code metrics at least as an intriguing, low-cost possibility for making the day go by a little better.\n
  • The risk with doing this—and there’s always a risk—is that we might waste our time making changes we don’t need, or worse, end up trashing our code in the name of blindly satisfying some target number.\n
  • How do we address that risk? By letting our project needs dictate our metric choices, not the other way around. It sounds simple. But as we’ll see, it’s possible to misapply a metric and make a big mess.\n
  • Since getting the reasons right is so important, let’s talk about why we’re gathering this data.\n
  • The purpose of any metric should be to help you answer a question. Since we’re developers who maybe also do a little testing, let’s ask a few example questions now.\n
  • For example, if several files need some love, where should I concentrate my efforts?\n
  • Something else may be giving us guidance on what part of the code to work in—like the product backlog. But you may be in a situation where you’ve got a little more leeway, like an explicit charter to pay down technical debt.\n
  • That said, when you do wander off the map, you do risk creating a bug. With legacy code, you may also uncover an existing bug and get the blame nonetheless. One way to address this risk is to improve your test coverage, and make small changes at a time. Another is to choose the right metrics; fixing static analysis warnings has anecdotally been one of the lowest-risk change activities I’ve ever seen.\n
  • Here’s another question we might ask. Where are the bugs? Where are the old bugs we haven’t found yet? Where are the new ones we might have created recently?\n
  • We can also turn to our code for ideas of what questions to ask. Has anyone seen something like this comment in production code? The number of these red flags in your code is a kind of code metric you can measure and reduce.\n
  • That quantitative measurement—number of bad comments in the code—is helping us make a qualitative determination.\n
  • The questions we’ve heard so far are things we might ask,...\n
  • ...not things someone else might ask.\n
  • Not that other people’s questions aren’t legitimately interesting, or that they might not apply metrics of their own.\n
  • For example, the SQA team might be looking for red flags that could hold up the release.\n
  • So they might look at aggregate errors per thousand lines of code. Not something I necessarily use to make decisions as a developer, but it doesn’t scare me if this metric is in use somewhere.\n
  • On a more sinister note, tracking rates of code production or error creation/resolution are outright destructive of teams.\n
  • There was apparently a brief, dark time at Apple when employees were tracked by lines of code produced, until Bill Atkinson showed that you can improve and shorten the code at the same time.\n
  • Another, more subtle trap is setting absolute thresholds for various metrics.\n
  • Doing so is like blindly obeying a GPS device: sooner or later, you’ll drive off a cliff.\n
  • Metrics are supposed to be here for our benefit.\n
  • And indeed, in addition to answering specific questions about our projects, they can make coding seem a little bit like a game where the side effect is to produce better code...\n
  • long as we still get around to writing the code eventually.\n
  • We have to be careful not to spend all day writing fancier shell scripts and slapping our stats onto elaborate dashboards (though there are quick-and-cheap dashboards I like; see the Tranquil project).\n
  • Now that we have a few questions in mind about our code base, let’s look at some metrics commonly used by many projects. (Later, we’ll look at writing our own.) The nice thing about prefab metrics is that we can find open source implementations and supporting research.\n
  • Rather than present you with a laundry list, I’m going to stick to a few targeted examples in C and Perl. But similar tools likely exist for your language; catch me in the hall afterwards if you’d like to explore that together.\n
  • The code samples you’re about to see are on GitHub; feel free to send a pull request if you’d like your favorite language to be included.\n
  • The granddaddy of modern code metrics is McCabe Cyclomatic Complexity. It’s meant to be a loose measure of how many different paths there are through a piece of code.\n
  • The fancy explanation is that you draw a graph of control flow through your function, then calculate a score from the number of edges, nodes, and return points.\n
  • The simpler explanation is that we walk through the code and add a point for each decision the code has to make.\n
  • So we’d start with a value of 1 for this code sample...\n
  • ...add 1 point for the if statement...\n
  • ...and add 1 final point for the boolean operator. Depending on the implementation, we might add a point for the multiple returns.\n
  • One easy-to-use implementation of this metric for C code is pmccabe.\n
  • When we run it, it prints the complexity, size, and location of each function in our project.\n
  • CPAN has several metrics modules for Perl; Perl::Metrics::Simple is an easy one to get started with.\n
  • Here’s a Perl subroutine similar to the one we saw.\n
  • Similar to pmccabe, Perl::Metrics::Simple gives us the size and complexity of each method.\n
  • Speaking of size and complexity, this paper reexamined several previous studies and found that several popular code metrics were effectively just expensive ways...\n
  • ...of counting lines. The paper didn’t consider cyclomatic complexity alone (and there were other issues dealt with in subsequent papers by other authors), but we should always be skeptical of our own metrics. Fortunately, most tools give us both a line count and a complexity metric; we can decide for ourselves.\n
  • Some teams set complexity targets. In the degenerate case, they turn their code into a bunch of tiny functions that do nothing—making the overall code base more complex and prone to bugs.\n
  • Another widely used metric is the percentage of your code that gets executed by your tests.\n
  • Measuring this typically involves instrumenting your code, so that you can watch it as it runs your tests.\n
  • Knowing our test coverage helps address the “epic confidence” problem that Laura Thomson described in her Open Source Bridge talk, “How Not To Release Software.” Teams afflicted by this bug assert without evidence that their tests are great.\n
  • In addition to combating hubris, measuring coverage helps us make our code more testable. Testability is not an end in itself, but a property with beneficial side effects.\n
  • For C projects, it’s easy to measure coverage. GCC comes with the gcov coverage tool.\n
  • Here’s a test that exercises just one branch of our code from earlier.\n
  • First, we’d compile and link our program with a couple of gcov’s required flags.\n
  • Then, we’d run our tests and point gcov at the logfiles.\n
  • The result is a list of what lines did and didn’t get executed. In this case, we never ran the “else” clause.\n
  • Not to be outdone, Perl provides Devel::Cover.\n
  • You just point Devel::Cover at your tests, and it produces an HTML report for you.\n
  • Devel::Cover gives us more information than gcov did. We executed line 26 once, but didn’t exercise both sides of the “&&”.\n
  • Which brings us to another thing to keep in mind. Hitting each line of code once isn’t the same as hitting each combination of branches. Code coverage is meant to help you look for holes, not to lull you into false security.\n
  • The advantage of applying commonly used measurements is good support. The downside is lack of context; the creators of those metrics have nowhere near the knowledge of your project that you do. So you may want to supplement common metrics with a few of your own. I can’t tell you what those metrics are, but I can tell you a couple of the ones I’ve seen used.\n
  • First, let’s look at what I’ll call X-rated-ness.\n
  • Just as George Carlin gave us his famous list of words you can’t say on television,...\n
  • teams have their own lists of bad words.\n
  • (Sorry, I should have blurred those out. ;-)\n
  • This is dead simple to do with ack, the modern-day replacement for grep. Just count string occurrences across your files, and optionally do a little sorting.\n
  • Grepping works on nearly every language, of course. But Perl has its own specific implementation of this metric.\n
  • All you have to do is throw a couple of lines into a “.t” file...\n
  • ...and Perl won’t even let your tests pass if you’ve got a naughty word in your code.\n
  • Another metric that’s not universally used, but can still come in handy, is code churn: how often does a given piece of code change?\n
  • Churn can tell us what parts have changed recently; those parts may have new bugs.\n
  • Churn can also tell us what parts change often; those parts can become trouble spots.\n
  • You can get as crazy as you want with churn: examining which lines have changed the most, which functions have had the most people working on them, and so on. Git can tell you a lot more than a simple metric can, but if you’re on a centralized system you may want to just grab the data yourself and pick it apart with UNIX tools.\n
  • If you’re writing code that’s going to get used by developers outside of your team, you might use a metric like documentation coverage to identify the parts of the code that most badly need docs.\n
  • Most of the metrics we’ve seen so far have been one-shot numbers. But it’s also possible to track things over time, like occurrences of compiler errors or test failures.\n
  • Zed Shaw does a great demo of this in his Play by Play screencast with PeepCode.\n
  • We’ve talked about the kinds of questions we want to ask about our code, and the metrics that can help us answer those questions. Now for the bigger question: what’s the effect on our software? Well, here are some of the things that happened with my team.\n
  • One, I found a surprisingly high complexity number in what was supposed to be a simple math routine. Somebody had snuck in an unwanted dependency on an unrelated system.\n
  • While looking for untested code, we found some code that didn’t need any tests—because it was never called anyway!\n
  • I personally like to look at what features have changed when it’s time to do manual testing.\n
  • Some designs come at a time when our understanding of the domain is imperfect. As our understanding improves, we refactor the code. Complexity metrics can be handy for prioritizing.\n
  • One of the common themes woven through much of this discussion is that absolute limits for code metrics are not as helpful as relative measures within a project.\n
  • My hope is that you come away from this session with a couple of ideas for metrics you’d like to try, and with the well-founded belief that you can get started with very little time investment.\n
  • I hope you find the answers you need for your project, and that you have fun getting them.\n
  • Thank you, and have a fantastic OSCON.\n
  • The images in this presentation were used by permission under the terms of a Creative Commons license.\n
  • Your Own Metric System

    1. 1. Your Own Metric System Ian Dees ·@undees OSCON 2012
    2. 2.
    3. 3. Setting
    4. 4. The forces against us❥ Entropy drags our code down❥ Apathy drags us down
    5. 5. Stay engaged and productive
    6. 6. Knowing our codecan help us do our jobs and have more fun
    7. 7. Risk #1Missing or poor informationcan waste our timeor lead us to cause harm
    8. 8. Two steps forward1. Ask questions about your code2. Choose metrics that answer those questions
    9. 9. Purpose of metrics
    10. 10. purpose of metricsHelp you answer a question
    11. 11. purpose of metricsWhat mess should I clean up next?
    12. 12. purpose of metricsThe product backlog isn’t a substitute for your brain
    13. 13. Risk #2Making structural changescan introduce new bugs(or expose existing ones)
    14. 14. purpose of metricsWhere are the bugs (likely to be)?
    15. 15. purpose of metrics/** * REMOVE THIS CODE * BEFORE WE SHIP! */
    16. 16. purpose of metricsHave we forgotten anything for this release?
    17. 17. purpose of metricsThese questions are for us
    18. 18. purpose of metricsNot for someone else
    19. 19. purpose of metricsQuestions from others: (outside the scope of our metrics)
    20. 20. purpose of metricsShould we hold the release?
    21. 21. purpose of metricserrors/KLOC → time →
    22. 22. purpose of metrics Who’s got the bestKLOC or error rate?
    23. 23. purpose of metricsIt was time to fill out the management form forthe first time. When he got to the lines of codepart, he thought about it for a second, and thenwrote in the number: -2000.After a couple more weeks, they stopped askingBill to fill out the form, and he gladly complied. —
    24. 24. purpose of metricsHave we met our targetcomplexity or coverage?
    25. 25. purpose of metrics Metrics serve you,not the other way around
    26. 26. purpose of metricsKeep the job fun
    27. 27. purpose of metrics More fun thanactually working?
    28. 28. Risk #3There is a trap here for thedistractible
    29. 29. Common metrics
    30. 30. common metrics Languages❥ C: a case study❥ Perl: the beginner’s experience❥ <your lang> just ask!
    31. 31. common metricsRepo for this talk
    32. 32. common metricsCyclomatic complexity
    33. 33. common metricsE – N + 2P
    34. 34. common metrics1. Start with a score of 12. Add 1 for each if, case, for, or boolean condition
    35. 35. Volume speaking_volume( bool correct_room, bool correct_time) { if (correct_room && correct_time) { return INTELLIGIBLE; } else { // rehearsing return INAUDIBLE; }} complexity: 1
    36. 36. Volume speaking_volume( bool correct_room, bool correct_time) { if (correct_room && correct_time) { return INTELLIGIBLE; } else { // rehearsing return INAUDIBLE; }} complexity: 2
    37. 37. Volume speaking_volume( bool correct_room, bool correct_time) { if (correct_room && correct_time) { return INTELLIGIBLE; } else { // rehearsing return INAUDIBLE; }} complexity: 3
    38. 38. common metrics
    39. 39. $ pmccabe *.c | sort -nr | head -103 3 3 6 8!oscon.c(6): speaking_volume1 1 2 16 5!oscon.c(16): main
    40. 40. common metricsPerl::Metrics::Simple
    41. 41. sub speaking_volume { my $correct_room = shift; my $correct_time = shift; if ($correct_room && $correct_time) { return intelligible; } else { # rehearsing return inaudible; }}
    42. 42. $ countperl lib...Tab-delimited list of subroutines, with most complex at top-----------------------------------------------------------complexity sub path size4 speaking_volume lib/ 9...
    43. 43. $ wc -l oscon.c
    44. 44. Risk #4Blindly reducing one numbercan add complexity and bugs
    45. 45. common metricsTest coverage
    46. 46. common metrics1. Instrument your program2. Watch your tests run3. Report which lines get executed
    47. 47. common metrics Addresses“epic confidence” fail
    48. 48. common metricsTestable code is more... testable
    49. 49. common metricsgcov
    50. 50. int main() { assert(speaking_volume(true, true) == INTELLIGIBLE); return 0;}
    51. 51. $ gcc -fprofile-arcs -ftest-coverage -c oscon.c$ gcc -fprofile-arcs oscon.o
    52. 52. $ gcov oscon.c$ cat oscon.c.gcov
    53. 53. 1: 6:Volume speaking_volume(bool correct_room, bool correct_time) { 1: 7: if (correct_room && correct_time) { 1: 8: return INTELLIGIBLE; -: 9: } else { -: 10: // rehearsing####: 11: return INAUDIBLE; -: 12: } -: 13:}
    54. 54. common metricsDevel::Cover
    55. 55. $ cover -test$ cat cover_db/coverage.html
    56. 56. Risk #5High code coveragecan make you thinkyour code is good
    57. 57. Custom metrics
    58. 58. X-rated-ness
    59. 59. custom metricsCarlin’s 7 Dirty Words1. 5.2. 6.3. 7.4.
    60. 60. custom metrics Our 7 Dirty Words1.XXX 5.HACK2.TODO 6.#if 03.FIXME 7.#ifndef TESTING4.TBD
    61. 61. custom metrics Our 7 Dirty Words1. 5.2. 6.3. 7.4.
    62. 62. $ ack -cl XXX|TODO|FIXMEoscon.c:1
    63. 63. custom metricsTest::Fixme
    64. 64. use Test::Fixme;run_tests(where => lib, match => qr/XXX|TODO|FIXME/);
    65. 65. $ make test...t/test-fixme.t .. 1/1# Failed test lib/ at t/test-fixme.t line 2.# File: lib/ 34 # XXX:remove the temp limit before we deploy# Looks like you failed 1 test of 1.t/test-fixme.t .. Dubious, test returned 1 (wstat 256, 0x100)Failed 1/1 subtests
    66. 66. custom metricsChurn
    67. 67. custom metricsRecently changed code may have new bugs
    68. 68. custom metricsFrequently-changed code may have problems
    69. 69. git log --pretty=oneline --since=2012-05-04 oscon.c | wc -l
    70. 70. custom metricsMissing documentation
    71. 71. custom metricsErrors by time of day
    72. 72. custom metricsPlay by Play: Zed Shaw
    73. 73. What do we get from all this?
    74. 74. Found a realdependency problem with pmccabe
    75. 75. Found dead code with gcov
    76. 76. Did a quick churn check at manual test time
    77. 77. Found places we can DRY up the code
    78. 78. Relative, not absolute!
    79. 79. Content-Type:multipart/wish
    80. 80. ❥ Find the answers you need❥ Look like heroes❥ Have fun
    81. 81. Fin
    82. 82.