Down the rabbit hole     Know your code     Profiling python
Get those hands dirty
So……
The car engine analogy
Python in that analogy
Scaling
Doing a lot of slow thing at the          same time
Profiling
An act of balancing
Types of profiling• (code) profiling (cpu/IO bound problems)• memory profiling (obviously memory related  problems)
(Code) profiling
Tools•   cProfile•   profile•   hotshot (deprecated? it splits analysis)•   line profiler•   trace
Getting dirtyimport cProfilecProfile.run(foo())Orpython -m cProfile myscript.py
More interactivepython -m cProfile myscript.py –o foo.profileimport pstatsp = pstats.Stats(foo.profile)p.sort_stats(cumula...
More compleximport profile             Source: http://www.doughellmann.com/PyMOTW/profile/def fib(n):    # from http://en....
Output$ python profile_fibonacci_raw.pyRAW================================================================================...
Output$ python profile_fibonacci_raw.pyRAW================================================================================...
Memoize!@memoizedef fib(n):if n == 0:        return 0    elif n == 1:        return 1    else:        return fib(n-1) + fi...
Yeah!$ python profile_fibonacci_memoized.pyMEMOIZED=======================================================================...
Line profiling                                        Source: http://packages.python.org/line_profiler/Line #      Hits   ...
More complex == visualizeRunSnakeRun
KcachegrindPyprof    2calltree
What to look for?• Things you didn’t expect ;)• much time spend in one function• lots of calls to same function
Performance solutions• caching• getting stuff out of inner loops• removing logging
Memory profilingDjango has a memory leak    In debug mode ;)
Tools• heapy (and pysizer)• meliea (more like hotshot)
Meliea!• Memory issues happen sometimes in  production (long living processes). Want to get  info, then analyze locally• r...
Dumping Source: http://jam-bazaar.blogspot.com/2009/11/memory-debugging-with-meliae.htmlfrom meliae import scannerscanner....
Analyzing>>> from meliae import loader>>> om = loader.load(filename.json)>>> s = om.summarize(); sThis dumps out something...
Run s n a k   e   run
Your production system  A completely different story
What can you do to prevent this?• From an API perspective:  – Maybe one WSGI process that is different  – Gets a small amo...
Or…..• Pycounters• Author is in tha house: Boaz Leskes
Finally• You should know about this• Part of your professional toolkit• This should be in IDE’s!  – Komodo already has it,...
Questions?
LinksArticles:http://www.doughellmann.com/PyMOTW/profile/http://www.doughellmann.com/PyMOTW/trace/http://jam-bazaar.blogsp...
Upcoming SlideShare
Loading in...5
×

Down the rabbit hole, profiling in Django

5,395

Published on

Presentation given at the Dutch Django Meetup of 19-04-2012

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,395
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Introduction: - how many of you do profiling - those who don't, who would say they do more complex projects? - what do you do? - how well do you know your code?I'm sorry but you don't
  • Python has helped us tremendously to think on an abstract level about the problems we solve. I totally believe in being less worried about how code gets executed on a low level. Because most of the time I just don't care (TM). Why don't I care? Even though I'm an engineer, I care about solving problems for people (and making a buck out of it). We do Recharted, which means that our clients want to give their customers the best experience possible. That means we should return requested info in a timely manner. Ever had that you where waiting irritated because of waiting for gmail?We had an issue (talked about during my logging presentation) with SUDS soap library doing a LOT of logging debug calls. Which on a call to a SOAP api for booking flights costed us 10-15 seconds on complex responses!
  • Scalability vs. performanceWe hear a lot about scaling, but sometimes we forget performance. Scalability means you can do the same thing for a lot of people. And that more people has a small impact on your performance. But that still means you can have the same shitty baseline perfomance. Actually it is not at all hard to scale shitty perfomance almost :)
  • You can make time.sleep() scale very well (with the right server infrastructure of course)
  • So what is profiling.Basically profiling is running your code in the interpreter, but in a way that statistics are recorded during the actual run. (yes this has a performance impact, so you can't just do this in production. For that there are other ways). Then you look at those statistics. This actually gives you a lot of insight in what happens.I know what happens in the local scope? But what actually happens when an API wsgi request comes in until we deliver the response? You would actually be surprised to know how much stuff happens in between. This is part of actually getting to know your code. Btw your code isn't just you, what about libraries you use? Systems you interface with? This all has an impact on your performance.
  • Actually profiling allows you to zoom in on low hanging fruit, you should ALWAYS balance amount of work/changing code vs. relative wins in performance. Basically you'll most often find fixing the two top entries :)
  • Ncallsfor the number of calls,Tottimefor the total time spent in the given function (and excluding time made in calls to sub-functions),Percallis the quotient of tottime divided by ncallsCumtimeis the total time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.Percallis the quotient of cumtime divided by primitive callsfilename:lineno(function)provides the respective data of each function
  • Ncallsfor the number of calls,Tottimefor the total time spent in the given function (and excluding time made in calls to sub-functions),Percallis the quotient of tottime divided by ncallsCumtimeis the total time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.Percallis the quotient of cumtime divided by primitive callsfilename:lineno(function)provides the respective data of each function
  • Ncallsfor the number of calls,Tottimefor the total time spent in the given function (and excluding time made in calls to sub-functions),Percallis the quotient of tottime divided by ncallsCumtimeis the total time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.Percallis the quotient of cumtime divided by primitive callsfilename:lineno(function)provides the respective data of each function
  • Just because a language is garbage collected doesn't mean it can't leak memory - C modules could leak (harder to find) - Some globally available datastructure could live and grow without you knowing (actually django has a builtin memory leak while running in debug mode) - Cyclic reference funky stuff, letting the interpreter think that memory cannot be released.Also (like with profiling), knowing the memory profile of your application could help. Maybe you have application server instances that are 64 mb. If a quarter of that is unneccasary stuff, you could maybe run more instances on the same hardware! Leading towards faster world domination!
  • So now you find out that your code behaves beautifully on your local machine. And then when in production... borkbork.
  • In conclusion:I think every serious python developer should know about these things, it is part of your toolkit.
  • Transcript of "Down the rabbit hole, profiling in Django"

    1. 1. Down the rabbit hole Know your code Profiling python
    2. 2. Get those hands dirty
    3. 3. So……
    4. 4. The car engine analogy
    5. 5. Python in that analogy
    6. 6. Scaling
    7. 7. Doing a lot of slow thing at the same time
    8. 8. Profiling
    9. 9. An act of balancing
    10. 10. Types of profiling• (code) profiling (cpu/IO bound problems)• memory profiling (obviously memory related problems)
    11. 11. (Code) profiling
    12. 12. Tools• cProfile• profile• hotshot (deprecated? it splits analysis)• line profiler• trace
    13. 13. Getting dirtyimport cProfilecProfile.run(foo())Orpython -m cProfile myscript.py
    14. 14. More interactivepython -m cProfile myscript.py –o foo.profileimport pstatsp = pstats.Stats(foo.profile)p.sort_stats(cumulative).print_stats(10)
    15. 15. More compleximport profile Source: http://www.doughellmann.com/PyMOTW/profile/def fib(n): # from http://en.literateprograms.org/Fibonacci_numbers_(Python) if n == 0: return 0 elif n == 1: return 1 else: return fib(n-1) + fib(n-2)def fib_seq(n): seq = [ ] if n > 0: seq.extend(fib_seq(n-1)) seq.append(fib(n)) return seqprint RAWprint = * 80profile.run(print fib_seq(20); print)
    16. 16. Output$ python profile_fibonacci_raw.pyRAW================================================================================[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765] 57356 function calls (66 primitive calls) in 0.746 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percallfilename:lineno(function) 21 0.000 0.000 0.000 0.000:0(append) 20 0.000 0.000 0.000 0.000:0(extend) 1 0.001 0.001 0.001 0.001:0(setprofile) 1 0.000 0.000 0.744 0.744<string>:1(<module>) 1 0.000 0.000 0.746 0.746profile:0(print fib_seq(20); print) 0 0.000 0.000 profile:0(profiler)57291/21 0.743 0.000 0.743 0.035 profile_fibonacci_raw.py:13(fib) 21/1 0.001 0.000 0.744 0.744 profile_fibonacci_raw.py:22
    17. 17. Output$ python profile_fibonacci_raw.pyRAW================================================================================[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765] 57356 function calls (66 primitive calls) in 0.746 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percallfilename:lineno(function) 21 0.000 0.000 0.000 0.000:0(append) 20 0.000 0.000 0.000 0.000:0(extend) 1 0.001 0.001 0.001 0.001:0(setprofile) 1 0.000 0.000 0.744 0.744<string>:1(<module>) 1 0.000 0.000 0.746 0.746profile:0(print fib_seq(20); print) 0 0.000 0.000 profile:0(profiler)57291/21 0.743 0.000 0.743 0.035 profile_fibonacci_raw.py:13(fib) 21/1 0.001 0.000 0.744 0.744 profile_fibonacci_raw.py:22
    18. 18. Memoize!@memoizedef fib(n):if n == 0: return 0 elif n == 1: return 1 else: return fib(n-1) + fib(n-2)
    19. 19. Yeah!$ python profile_fibonacci_memoized.pyMEMOIZED================================================================================[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765] 145 function calls (87 primitive calls) in 0.003 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percallfilename:lineno(function) 21 0.000 0.000 0.000 0.000:0(append) 20 0.000 0.000 0.000 0.000:0(extend) 1 0.001 0.001 0.001 0.001:0(setprofile) 1 0.000 0.000 0.002 0.002<string>:1(<module>) 1 0.000 0.000 0.003 0.003profile:0(print fib_seq(20); print) 0 0.000 0.000 profile:0(profiler) 59/21 0.001 0.000 0.001 0.000 profile_fibonacci.py:19(__call__) 21 0.000 0.000 0.001 0.000 profile_fibonacci.py:26(fib) 21/1 0.001 0.000 0.002 0.002 profile_fibonacci.py:36(fib_seq)
    20. 20. Line profiling Source: http://packages.python.org/line_profiler/Line # Hits Time Per Hit % Time Line Contents============================================================== 149 @profile 150 def Proc2(IntParIO): 151 50000 82003 1.6 13.5 IntLoc = IntParIO + 10 152 50000 63162 1.3 10.4 while 1: 153 50000 69065 1.4 11.4 if Char1Glob == A: 154 50000 66354 1.3 10.9 IntLoc = IntLoc - 1 155 50000 67263 1.3 11.1 IntParIO = IntLoc -IntGlob 156 50000 65494 1.3 10.8 EnumLoc = Ident1 157 50000 68001 1.4 11.2 if EnumLoc == Ident1: 158 50000 63739 1.3 10.5 break 159 50000 61575 1.2 10.1 return IntParIO
    21. 21. More complex == visualizeRunSnakeRun
    22. 22. KcachegrindPyprof 2calltree
    23. 23. What to look for?• Things you didn’t expect ;)• much time spend in one function• lots of calls to same function
    24. 24. Performance solutions• caching• getting stuff out of inner loops• removing logging
    25. 25. Memory profilingDjango has a memory leak In debug mode ;)
    26. 26. Tools• heapy (and pysizer)• meliea (more like hotshot)
    27. 27. Meliea!• Memory issues happen sometimes in production (long living processes). Want to get info, then analyze locally• runsnakerun has support for it
    28. 28. Dumping Source: http://jam-bazaar.blogspot.com/2009/11/memory-debugging-with-meliae.htmlfrom meliae import scannerscanner.dump_all_objects(filename.json)
    29. 29. Analyzing>>> from meliae import loader>>> om = loader.load(filename.json)>>> s = om.summarize(); sThis dumps out something like:Total 17916 objects, 96 types, Total size = 1.5MiB (1539583 bytes)Index Count % Size % Cum Max Kind 0 701 3 546460 35 35 49292 dict 1 7138 39 414639 26 62 4858 str 2 208 1 94016 6 68 452 type 3 1371 7 93228 6 74 68 code 4 1431 7 85860 5 80 60 function 5 1448 8 59808 3 84 280 tuple 6 552 3 40760 2 86 684 list 7 56 0 29152 1 88 596 StgDict 8 2167 12 26004 1 90 12 int 9 619 3 24760 1 91 40 wrapper_descriptor 10 570 3 20520 1 93 36 builtin_function_or_method ...
    30. 30. Run s n a k e run
    31. 31. Your production system A completely different story
    32. 32. What can you do to prevent this?• From an API perspective: – Maybe one WSGI process that is different – Gets a small amount of requests (load balancing) – This process takes care of doing profiling (preferably in the hotshot way)
    33. 33. Or…..• Pycounters• Author is in tha house: Boaz Leskes
    34. 34. Finally• You should know about this• Part of your professional toolkit• This should be in IDE’s! – Komodo already has it, what about PyCharm??(can you blog this Reinout? ;)
    35. 35. Questions?
    36. 36. LinksArticles:http://www.doughellmann.com/PyMOTW/profile/http://www.doughellmann.com/PyMOTW/trace/http://jam-bazaar.blogspot.com/2010/08/step-by-step-meliae.htmlhttps://code.djangoproject.com/wiki/ProfilingDjangoVideos:http://www.youtube.com/watch?v=Iw9-GckD-gQhttp://blip.tv/pycon-us-videos-2009-2010-2011/introduction-to-python-profiling-1966784Software:http://www.vrplumber.com/programming/runsnakerun/http://kcachegrind.sourceforge.net/html/Home.htmlhttp://pypi.python.org/pypi/pyprof2calltree/https://launchpad.net/meliaehttp://pypi.python.org/pypi/line_profilerhttp://pypi.python.org/pypi/Dozer
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×