Lies, Damn Lies, and Benchmarks


Published on

How to avoid Benchmark Stuff ("BS") evaluating performance of code. This installment uses time to compare the execution speed of Perl and various shell commands, with and without plumbing.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Lies, Damn Lies, and Benchmarks

  1. 1. Lies Damn Lies & Benchmarks Steven Lembark Workhorse Computing
  2. 2. “Perl is too slow” Heard that before? Yeah... Mostly wrong – can't refute it without data. Need to benchmark the times.
  3. 3. Damn lies... Good benchmarks find realistic times. Most benchmarks prove a point. They get ignored. Ignored results are not lazy.
  4. 4. Benchmarking perl The *NIX “time” command. Good enough to answer most questions. Avoids much Benchmarking Stuff (“BS”).
  5. 5. Simplest tool: “time” real, system, and user times. real time heavily affected by system load. system + user better indication of “work”. real – work = blocked.
  6. 6. “bash takes less time to start up” perl isn't any slower: Zero work for both. Real is all blocked. $ time perl -e 0 real user sys 0m0.005s 0m0.000s 0m0.000s $ time bash /dev/null real user sys 0m0.005s 0m0.000s 0m0.000s
  7. 7. BS: Startup Times If something just ran it is probably in core. Saves overhead running it the second time. Run everything twice to benchmark startups. Multiple runs or single-user manage background noise.
  8. 8. Minimizing startup issues Save kernel calls, context switches, interrupts, latency, transfer I/O... tmpfs on linux minimizes overhead. Test with un-loaded system. Avoid “virtual” systems (CPU, EMC) unless that is what you are testing.
  9. 9. What does startup time tell us? Opterons are fast? Useless by itself. Necessary baseline. Differences are a warning.
  10. 10. Analyzing startup times. Big differences usually indicate a problem: Mis-compiled: “-O0” “-g” on production code. Mixing 32- and 64-bit code and O/S. Background noise from other running jobs. Botched startups leave everything else suspect.
  11. 11. Do something! OK, let's time an operation. Listing a directory is common enough. “ls” lists the contents, sorts lexically. Perl's “glob” is similar.
  12. 12. Trivial persuit: ls vs glob. Mostly blocked: 7ms bash vs. 9ms perl. Failing to clear the screen can skew results! Remote display, virtual machines. lembark@dizzy etl $ time bash -c '/bin/ls -d /tmp/*' real user sys 0m0.007s 0m0.000s 0m0.000s lembark@dizzy etl $ time perl -e '$="n"; $,=" "; print glob "/tmp/*"' real user sys 0m0.019s 0m0.010s 0m0.000s
  13. 13. BS: Milliseconds matter Really care about 12ms? OK, perl is slower. Most of the difference is in blocked time. Hint: perl and shell block at the same rate. perl compiles a statement, which adds overhead. Use “ls” for what it is.
  14. 14. Doing more Search files using their basenames: Find all of the basenames from “2012.05.05” through “2012.05.16”. First step: How many files are there?
  15. 15. Times Compare File::Find with /bin/find. Roughly same system time, added user for compile. Shell is faster because it is single-purpose. $ time find . -type f | wc -l; 18583 real user sys 0m0.080s 0m0.020s 0m0.050s $ time perl -MFile::Find -e 'my $i = 0; find sub { -l or -d or ++$i },"."; print $i, "n"' 18583 real user sys 0m0.274s 0m0.220s 0m0.050s
  16. 16. Multi-layer pipes Compare the basename to a regex. Shell: find . -type f | xargs -l1 basename | egrep -E '2012.05.(?:0[5-9]|1[0-6])' Find files, extract basenames, and search with extended syntax (largely borrowed from Perl). One-liner with perl, File::Find & File::Basename.
  17. 17. BS: Forks & pipes are “free”. Real, user, and system time are higher for bash. xargs has to fork/exec many copies of basename. system overhead from buffering pipes is also higher. Plumbing is expensive! $ time find . -type f | xargs -l1 basename | egrep -E '2012.05.(?:0[5-9]|1[0-6])' | wc -l 1604 real user sys 0m29.823s 0m0.710s 0m4.220s $ time perl -MFile::Find=find -MFile::Basename=basename -e 'my $i=0; find sub { -l || -d and return; /2012.05.(?:0[5-9]|1[0-6])/ and ++$i }, "."; print $i, "n"' 1604 real user sys 0m0.301s 0m0.170s 0m0.130s
  18. 18. Replacing content “in place” perl's “-i” replaces files in place. Shell pre-opens files, can't “sort Shell requires “sort -d < a > a”. -d < a > b && mv b a”. Now imagine filtering a few thousand files...
  19. 19. perl -n & -p with -i Say you have to update the package names for a few hundred modules from “::Source” to “::RDS”. Mixing shell with perl: find . -type f | xargs perl -i -p -e's/::Sourceb/::RDS/g'; Exercise: Try writing this in pure shell.
  20. 20. Running it doesn't take long either Nice division of labor: find & xargs deal with the names. perl deals with the regex. not much typing either way. not much time either. $ time find . -type f | xargs perl -i -p -e 's/::Sourceb/::RDS/g' real user sys 0m0.112s 0m0.044s 0m0.016s
  21. 21. What this means to you. Plumbing and forks are not free. Single-purpose programs faster for one thing. Chaining the simpler tools adds overhead. Languages faster for multi-stage tasks.