Devel::NYTProf v5 at YAPC::NA 201406

2,617 views

Published on

Slides of my talk on Devel::NYTProf and optimizing perl code at YAPC::NA in June 2014. It covers use of NYTProf and outlines a multi-phase approach to optimizing your perl code.

A video of the talk and questions is available at https://www.youtube.com/watch?v=T7EK6RZAnEA&list=UU7y4qaRSb5w2O8cCHOsKZDw

Published in: Software, Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,617
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
23
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Devel::NYTProf v5 at YAPC::NA 201406

  1. 1. Devel::NYTProf Perl Source Code Profiler Tim Bunce - YAPC::NA - 2014
  2. 2. Devel::DProf Is Broken $ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});n" for 1..1000' > x.pl $ perl -d:DProf x.pl $ dprofpp -r Total Elapsed Time = 0.108 Seconds Real Time = 0.108 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 9.26 0.010 0.010 1 0.0100 0.0100 main::s76 9.26 0.010 0.010 1 0.0100 0.0100 main::s323 9.26 0.010 0.010 1 0.0100 0.0100 main::s626 9.26 0.010 0.010 1 0.0100 0.0100 main::s936 0.00 - -0.000 1 - - main::s77 0.00 - -0.000 1 - - main::s82
  3. 3. Profiling 101 The Basics
  4. 4. CPU Time Real Time Subroutines Statements ? ? ? ? What To Measure?
  5. 5. Subroutine vs Statement • Subroutine Profiling - Measures time between subroutine entry and exit - That’s the Inclusive time. Exclusive by subtraction. - Reasonably fast, reasonably small data files • Problems - Can be confused by funky control flow (goto &sub) - No insight into where time spent within large subs - Doesn’t measure code outside of a sub
  6. 6. Subroutine vs Statement • Line/Statement profiling - Measure time from start of one statement to the start of the next statement, whereever that might be - Fine grained detail • Problems - Very expensive in CPU & I/O - Assigns too much time in some cases - Too much detail for large subs - Hard to get overall subroutine times
  7. 7. CPU Time vs Real Time • CPU Time - Measures time the CPU sent executing your code - Not (much) affected by other load on system - Doesn’t include time spent waiting for i/o etc. • Real Time - Measures the elapsed time-of-day - Your time is affected by other load on system - Includes time spent waiting for i/o etc.
  8. 8. Devel::NYTProf
  9. 9. Public Service Announcement! The NYTProf name is an accident of history I do not work for the New York Times I have never worked for the New York Times I have no affiliation with the New York Times The New York Times last contributed in 2008
  10. 10. Running NYTProf perl -d:NYTProf ... perl -MDevel::NYTProf ... Configure profiler via the NYTPROF env var perldoc Devel::NYTProf for the details To profile code that’s invoked elsewhere: PERL5OPT=-d:NYTProf NYTPROF=file=/tmp/nytprof.out:addpid=1:...
  11. 11. Reporting: KCachegrind • KCachegrind call graph - new and cool - contributed by C. L. Kao. - requires KCachegrind $ nytprofcg # generates nytprof.callgraph $ kcachegrind # load the file via the gui
  12. 12. KCachegrind
  13. 13. Reporting: HTML • HTML report - page per source file, annotated with times and links - subroutine index table with sortable columns - interactive Treemap of subroutine times - generates Graphviz dot file of call graph - -m (--minimal) faster generation but less detailed $ nytprofhtml # writes HTML report in ./nytprof/... $ nytprofhtml --file=/tmp/nytprof.out.793 --open
  14. 14. Summary Links to annotated source code Timings for perl builtins Link to sortable table of all subs
  15. 15. Inclusive vs Exclusive Time Inclusive sub foo Exclusive sub bar bar() bar() foo() Inclusive
  16. 16. Inclusive vs. Exclusive • Inclusive Time is best for Top Down - Overview of time spent “in and below this sub” - Useful to prioritize structural optimizations • Exclusive Time is best for Bottom Up - Detail of time spent “in the code of this sub” - Where the time actually gets spent - Useful for localized (peephole) optimization
  17. 17. Annotated Source View
  18. 18. Overall time spent in and below this sub (in + below) Color coding based on Median Average Deviation relative to rest of this file Timings for each location that calls this subroutine Time between starting this perl statement and starting the next. So includes overhead of calls to perl subs. Timings for each subroutine called by each line
  19. 19. Boxes represent subroutines Colors only used to show packages (and aren’t pretty yet) Hover over box to see details Click to drill-down one level in package hierarchy Treemap showing relative proportions of exclusive time
  20. 20. Calls between packages
  21. 21. Calls to/from/within package
  22. 22. Let’s take a look...
  23. 23. DEMO
  24. 24. Optimizing Hints & Tips
  25. 25. Do your own testing With your own perl binary On your own hardware Beware My Examples!
  26. 26. Take care comparing code fragments! Edge-effects at loop and scope boundaries. Statement time includes time getting to the next perl statement, wherever that may be. Beware 2!
  27. 27. Consider effect of CPU-level data and code caching Tends to make second case look faster! Swap the order to double-check alternatives Beware Your Examples!
  28. 28. Phase 0 Before you start
  29. 29. DON’T DO IT!
  30. 30. “The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” - Michael A. Jackson
  31. 31. Why not?
  32. 32. “More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity.” - W.A. Wulf
  33. 33. “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” - Donald Knuth
  34. 34. “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” - Donald Knuth
  35. 35. How?
  36. 36. “Throw hardware at it!” Hardware == Cheap Programmers == Expensive (& error prone) Hardware upgrades are usually much less risky than software optimizations.
  37. 37. “Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is.” - Rob Pike
  38. 38. “Measure twice, cut once.” - Old Carpenter’s Maxim
  39. 39. Phase 1 Low Hanging Fruit
  40. 40. Low Hanging Fruit 1. Profile code running representative workload. 2. Look at Exclusive Time of subroutines. 3. Do they look reasonable? 4. Examine worst offenders. 5. Fix only simple local problems. 6. Profile again. 7. Fast enough? Then STOP! 8. Rinse and repeat once or twice, then move on.
  41. 41. “Simple Local Fixes” Changes unlikely to introduce bugs
  42. 42. Move invariant expressions out of loops
  43. 43. Avoid->repeated->chains ->of->accessors(...); Avoid->repeated->chains ->of->accessors(...); Use a temporary variable
  44. 44. Use faster accessors Class::Accessor -> Class::Accessor::Fast --> Class::Accessor::Faster ---> Class::Accessor::Fast::XS ----> Class::XSAccessor These aren’t all compatible so consider your actual usage. (The list above is out of date.)
  45. 45. Avoid calling subs that don’t do anything! my $unused_variable = $self->get_foo; my $is_logging = $log->info(...); while (...) { $log->info(...) if $is_logging; ... }
  46. 46. Exit subs and loops early Delay initializations return if not ...a cheap test...; return if not ...a more expensive test...; my $foo = ...initializations...; ...body of subroutine...
  47. 47. Fix silly code - return exists $nav_type{$country}{$key} - ? $nav_type{$country}{$key} - : undef; + return $nav_type{$country}{$key};
  48. 48. Beware pathological regular expressions Devel::NYTProf shows regular expression opcodes. Consider using no feature 'unicode_strings';
  49. 49. Avoid unpacking args in very hot subs sub foo { shift->delegate(@_) } sub bar { return shift->{bar} unless @_; return $_[0]->{bar} = $_[1]; }
  50. 50. Avoid unnecessary (capturing parens) in regex
  51. 51. Retest. Fast enough? STOP! Put the profiler down and walk away
  52. 52. Phase 2 Deeper Changes
  53. 53. Profile with a known workload E.g., 1000 identical requests
  54. 54. Check subroutine call counts Reasonable for the workload?
  55. 55. Check Inclusive Times (especially top-level subs) Reasonable percentage for the workload?
  56. 56. Add caching if appropriate to reduce calls Remember cache invalidation!
  57. 57. Walk up call chain to find good spots for caching Remember cache invalidation!
  58. 58. Creating many objects that don’t get used? Try a lightweight proxy e.g. DateTime::Tiny, DateTimeX::Lite, DateTime::LazyInit
  59. 59. Reconfigure your Perl can yield useful gains with little effort thread support costs ~2..30% debugging support costs ~15% Also consider: usemymalloc, use64bitint, use64bitall, uselongdouble, optimize, disable taint mode. Consider using a different compiler.
  60. 60. Upgrade your Perl Newer versions often faster at some things (though occasionally slower at others) Sometimes have specific micro-optimizations Many memory usage and performance improvements from 5.8 thru 5.20
  61. 61. Retest. Fast enough? STOP! Put the profiler down and walk away.
  62. 62. Phase 3 Structural Changes
  63. 63. Push loops down - $object->walk($_) for @dogs; + $object->walk_these(@dogs);
  64. 64. Use faster modules sort ! Sort::Key Storable ! Sereal LWP ! HTTP::Tiny ! HTTP::Lite ! *::Curl ! Hijk These aren’t all compatible or full-featured or ‘better’ Consider your actual needs See http://neilb.org/reviews/
  65. 65. Change the data structure hashes <–> arrays
  66. 66. Change the algorithm What’s the “Big O”? O(n2) or O(logn) or ...
  67. 67. Rewrite hot-spots in XS / C Consider Inline::C but beware of deployment issues
  68. 68. Small changes add up! “I achieved my fast times by multitudes of 1% reductions” - Bill Raymond
  69. 69. See also “Top 10 Perl Performance Tips” • A presentation by Perrin Harkins • Covers higher-level issues, including - Good DBI usage - Fastest modules for serialization, caching, templating, HTTP requests etc. • http://docs.google.com/present/view?id=dhjsvwmm_26dk9btn3g
  70. 70. Questions? Tim.Bunce@pobox.com http://blog.timbunce.org @timbunce on twitter

×