Meliea!
• Memory issues happen sometimes in
production (long living processes). Want to get
info, then analyze locally
• runsnakerun has support for it
What can you do to prevent this?
• From an API perspective:
– Maybe one WSGI process that is different
– Gets a small amount of requests (load balancing)
– This process takes care of doing profiling
(preferably in the hotshot way)
Finally
• You should know about this
• Part of your professional toolkit
• This should be in IDE’s!
– Komodo already has it, what about
PyCharm??(can you blog this Reinout? ;)
Introduction: - how many of you do profiling - those who don't, who would say they do more complex projects? - what do you do? - how well do you know your code?I'm sorry but you don't
Python has helped us tremendously to think on an abstract level about the problems we solve. I totally believe in being less worried about how code gets executed on a low level. Because most of the time I just don't care (TM). Why don't I care? Even though I'm an engineer, I care about solving problems for people (and making a buck out of it). We do Recharted, which means that our clients want to give their customers the best experience possible. That means we should return requested info in a timely manner. Ever had that you where waiting irritated because of waiting for gmail?We had an issue (talked about during my logging presentation) with SUDS soap library doing a LOT of logging debug calls. Which on a call to a SOAP api for booking flights costed us 10-15 seconds on complex responses!
Scalability vs. performanceWe hear a lot about scaling, but sometimes we forget performance. Scalability means you can do the same thing for a lot of people. And that more people has a small impact on your performance. But that still means you can have the same shitty baseline perfomance. Actually it is not at all hard to scale shitty perfomance almost :)
You can make time.sleep() scale very well (with the right server infrastructure of course)
So what is profiling.Basically profiling is running your code in the interpreter, but in a way that statistics are recorded during the actual run. (yes this has a performance impact, so you can't just do this in production. For that there are other ways). Then you look at those statistics. This actually gives you a lot of insight in what happens.I know what happens in the local scope? But what actually happens when an API wsgi request comes in until we deliver the response? You would actually be surprised to know how much stuff happens in between. This is part of actually getting to know your code. Btw your code isn't just you, what about libraries you use? Systems you interface with? This all has an impact on your performance.
Actually profiling allows you to zoom in on low hanging fruit, you should ALWAYS balance amount of work/changing code vs. relative wins in performance. Basically you'll most often find fixing the two top entries :)
Ncallsfor the number of calls,Tottimefor the total time spent in the given function (and excluding time made in calls to sub-functions),Percallis the quotient of tottime divided by ncallsCumtimeis the total time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.Percallis the quotient of cumtime divided by primitive callsfilename:lineno(function)provides the respective data of each function
Ncallsfor the number of calls,Tottimefor the total time spent in the given function (and excluding time made in calls to sub-functions),Percallis the quotient of tottime divided by ncallsCumtimeis the total time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.Percallis the quotient of cumtime divided by primitive callsfilename:lineno(function)provides the respective data of each function
Ncallsfor the number of calls,Tottimefor the total time spent in the given function (and excluding time made in calls to sub-functions),Percallis the quotient of tottime divided by ncallsCumtimeis the total time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.Percallis the quotient of cumtime divided by primitive callsfilename:lineno(function)provides the respective data of each function
Just because a language is garbage collected doesn't mean it can't leak memory - C modules could leak (harder to find) - Some globally available datastructure could live and grow without you knowing (actually django has a builtin memory leak while running in debug mode) - Cyclic reference funky stuff, letting the interpreter think that memory cannot be released.Also (like with profiling), knowing the memory profile of your application could help. Maybe you have application server instances that are 64 mb. If a quarter of that is unneccasary stuff, you could maybe run more instances on the same hardware! Leading towards faster world domination!
So now you find out that your code behaves beautifully on your local machine. And then when in production... borkbork.
In conclusion:I think every serious python developer should know about these things, it is part of your toolkit.