Successfully reported this slideshow.
Your SlideShare is downloading. ×

Profiling and optimization

Upcoming SlideShare
Python Performance 101
Python Performance 101
Loading in …3
×

Check these out next

1 of 43 Ad
1 of 43 Ad
Advertisement

More Related Content

Advertisement

Profiling and optimization

  1. 1. - Gayatri Nittala
  2. 2. About: Optimization - what, when & where? General Optimization strategies Python specific optimizations Profiling
  3. 3. Golden Rule: "First make it work. Then make it right. Then make it fast!" - Kent Beck
  4. 4. The process: 1.Get it right. 2.Test it's right. 3.Profile if slow. 4.Optimize. 5.Repeat from 2.  test suites  source control
  5. 5. Make it right & pretty! Good Programming :-)  General optimization strategies  Python optimizations
  6. 6. Optimization: Aims to improve Not perfect, the result Programming with performance tips
  7. 7. When to start? Need for optimization  are you sure you need to do it at all?  is your code really so bad?  benchmarking  fast enough vs. faster Time for optimization  is it worth the time to tune it?  how much time is going to be spent running that code?
  8. 8. When to start? Cost of optimization  costly developer time  addition of new features  new bugs in algorithms  speed vs. space Optimize only if necessary!
  9. 9. Where to start?  Are you sure you're done coding? frosting a half-baked cake Premature optimization is the root of all evil! - Don Knuth  Working, well-architected code is always a must
  10. 10. General strategies Algorithms - the big-O notation Architecture Choice of Data structures LRU techniques Loop invariant code out of loops Nested loops try...catch instead of if...else Multithreading for I/O bound code DBMS instead of flat files
  11. 11. General strategies Big – O – The Boss! performance of the algorithms a function of N - the input size to the algorithm  O(1) - constant time  O(ln n) - logarithmic  O(n) - linear  O(n2) - quadratic
  12. 12. Common big-O’s Order Said to be Examples “…. time” -------------------------------------------------- O(1) constant key in dict dict[key] = value list.append(item) O(ln n) logarithmic Binary search O(n) linear item in sequence str.join(list) O(n ln n) list.sort() O(n2) quadratic Nested loops (with constant time bodies)
  13. 13. Note the notation O(N2) O(N) def slow(it): def fast(it): result = [] result = [] for item in it: for item in it: result.insert(0, item) result.append(item) return result result.reverse( ) return result result = list(it)
  14. 14. Big-O’s of Python Building blocks  lists - vectors  dictionaries - hash tables  sets - hash tables
  15. 15. Big-O’s of Python Building blocks Let, L be any list, T any string (plain or Unicode); D any dict; S any set, with (say) numbers as items (with O(1) hashing and comparison) and x any number: O(1) - len( L ), len(T), len( D ), len(S), L [i], T [i], D[i], del D[i], if x in D, if x in S, S .add( x ), S.remove( x ), additions or removals to/from the right end of L
  16. 16. Big-O’s of Python Building blocks O(N) - Loops on L, T, D, S, general additions or removals to/from L (not at the right end), all methods on T, if x in L, if x in T, most methods on L, all shallow copies O(N log N) - L .sort in general (but O(N) if L is already nearly sorted or reverse-sorted)
  17. 17. Right Data Structure  lists, sets, dicts, tuples  collections - deque, defaultdict, namedtuple  Choose them based on the functionality  search an element in a sequence  append  intersection  remove from middle  dictionary initializations
  18. 18. Right Data Structure  my_list = range(n) n in my_list  my_list = set(range(n)) n in my_list  my_list[start:end] = []  my_deque.rotate(-end) for counter in (end-start): my_deque.pop()
  19. 19. Right Data Structure s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]  d = defaultdict(list) for k, v in s: d[k].append(v) d.items() [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]  d = {} for k, v in s: d.setdefault(k, []).append(v) d.items() [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
  20. 20. Python Performance Tips  built-in modules  string concatenation  lookups and local variables  dictionary initialization  dictionary lookups  import statements  loops
  21. 21. Built-ins - Highly optimized - Sort a list of tuples by it’s n-th field  def sortby(somelist, n): nlist = [(x[n], x) for x in somelist] nlist.sort() return [val for (key, val) in nlist] n = 1 import operator nlist.sort(key=operator.itemgetter(n))
  22. 22. String Concatenation  s = "" for substring in list: s += substring  s = "".join(list)  out = "<html>" + head + prologue + query + tail + "</html>"  out = "<html>%s%s%s%s</html>" % (head, prologue, query, tail)  out = "<html>%(head)s%(prologue)s%(query)s% (tail)s</html>" % locals()
  23. 23. Searching:  using ‘in’  O(1) if RHS is set/dictionary  O(N) if RHS is string/list/tuple  using ‘hasattr’  if the searched value is an attribute  if the searched value is not an attribute
  24. 24. Loops:  list comprehensions  map as for loop moved to c – if the body of the loop is a function call  newlist = [] for word in oldlist: newlist.append(word.upper())  newlist = [s.upper() for s in oldlist]  newlist = map(str.upper, oldlist)
  25. 25. Lookups and Local variables:  evaluating function references in loops  accessing local variables vs global variables  upper = str.upper newlist = [] append = newlist.append for word in oldlist: append(upper(word))
  26. 26. Dictionaries  Initialization -- try... Except  Lookups -- string.maketrans Regular expressions:  RE's better than writing a loop  Built-in string functions better than RE's  Compiled re's are significantly faster  re.search('^[A-Za-z]+$', source)  x = re.compile('^[A-Za-z]+$').search x(source)
  27. 27. Imports  avoid import *  use only when required(inside functions)  lazy imports exec and eval  better to avoid  Compile and evaluate
  28. 28. Summary on loop optimization - (extracted from an essay by Guido)  only optimize when there is a proven speed bottleneck  small is beautiful  use intrinsic operations  avoid calling functions written in Python in your inner loop  local variables are faster than globals  try to use map(), filter() or reduce() to replace an explicit for loop(map with built-in, for loop with inline)  check your algorithms for quadratic behaviour  and last but not least: collect data. Python's excellent profile module can quickly show the bottleneck in your code
  29. 29. Might be unintentional, better not to be intuitive! The right answer to improve performance - Use PROFILERS
  30. 30. Spot it Right!  Hotspots  Fact and fake( - Profiler Vs Programmers intuition!) Threads  IO operations  Logging  Encoding and Decoding  Lookups  Rewrite just the hotspots!  Psyco/Pyrex  C extensions
  31. 31. Profilers  timeit/time.clock  profile/cprofile  Visualization  RunSnakeRun  Gprof2Dot  PycallGraph
  32. 32. timeit  precise performance of small code snippets.  the two convenience functions - timeit and repeat  timeit.repeat(stmt[, setup[, timer[, repeat=3[, number=1000000]]]])  timeit.timeit(stmt[, setup[, timer[, number=1000000]]])  can also be used from command line  python -m timeit [-n N] [-r N] [-s S] [-t] [-c] [-h] [statement ...]
  33. 33. timeit import timeit  timeit.timeit('for i in xrange(10): oct(i)', gc.enable()') 1.7195474706909972  timeit.timeit('for i in range(10): oct(i)', 'gc.enable()') 2.1380978155005295  python -m timeit -n1000 -s'x=0' 'x+=1' 1000 loops, best of 3: 0.0166 usec per loop  python -m timeit -n1000 -s'x=0' 'x=x+1' 1000 loops, best of 3: 0.0169 usec per loop
  34. 34. timeit import timeit  python -mtimeit "try:" " str.__nonzero__" "except AttributeError:" " pass" 1000000 loops, best of 3: 1.53 usec per loop  python -mtimeit "try:" " int.__nonzero__" "except AttributeError:" " pass" 10000000 loops, best of 3: 0.102 usec per loop
  35. 35. timeit test_timeit.py  def f(): try: str.__nonzero__ except AttributeError: pass if __name__ == '__main__': f()  python -mtimeit -s "from test_timeit import f" "f()" 100000 loops, best of 3: 2.5 usec per loop
  36. 36. cProfile/profile  Deterministic profiling  The run time performance  With statistics  Small snippets bring big changes!  import cProfile cProfile.run(command[, filename])  python -m cProfile myscript.py [-o output_file] [-s sort_order]
  37. 37. cProfile statistics E:pycon12>profile_example.py 100004 function calls in 0.306 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.014 0.014 0.014 0.014 :0(setprofile) 1 0.000 0.000 0.292 0.292 <string>:1(<module>) 1 0.000 0.000 0.306 0.306 profile:0(example()) 0 0.000 0.000 profile:0(profiler) 1 0.162 0.162 0.292 0.292 profile_example.py:10(example) 100000 0.130 0.000 0.130 0.000 profile_example.py:2(check)
  38. 38. Using the stats  The pstats module  View and compare stats  import cProfile cProfile.run('foo()', 'fooprof') import pstats p = pstats.Stats('fooprof')  p.strip_dirs().sort_stats(-1).print_stats()  p.sort_stats('cumulative').print_stats(10)  p.sort_stats('file').print_stats('__init__')
  39. 39. Visualization  A picture is worth a thousand words!  Other tools to visualize profiles  kcachegrind  RunSnakeRun  GProf2Dot  PyCallGraph  PyProf2CallTree
  40. 40. RunSnakeRun  E:pycon12>runsnake D:simulation_gui.profile
  41. 41. Don't be too clever. Don't sweat it too much.  Develop an instinct for the sort of code that Python runs well.
  42. 42. References  http://docs.python.org  http://wiki.python.org/moin/PythonSpeed/PerformanceTips/  http://sschwarzer.com/download/optimization_europython2006.pdf  http://oreilly.com/python/excerpts/python-in-a-nutshell/testing- debugging.html
  43. 43. Questions?

×