Diagnosing performance issues is usually one of the most difficult tasks for developers, specially when these performance issues are being reported by customers in production systems, which cannot easily be taken offline for instrumentation.
Current status of performance monitoring in Python application is divided into what we do before the systems go into production and what we do after. At Nexedi, before systems go into production, we perform extensive “functional” performance tests. That is, full-stack tests that simulate typical scenarios of users interfacing with the system, including realistic pauses between accesses. These tests are played in parallel on a library built on top of “mechanize”. The number of parallel user simulations match (and surpasses) the expected use of the system according to the SLA agreement wit the client.
After the system is put on production, performance issues invariably arrive, as not all combinations of system use can be predicted in advance. Aside of some Proactive Monitoring that can be added into the system, and of which we will talk more later on, the current status quo of performance monitoring can be summarized as: Wait for the angry mob to complain about the performance and then ask them for reproduction instructions. For without reproduction instructions it's very difficult to understand what is causing the system to be slow.
The “statprof” module, which can be downloaded from PyPI, works differently than cProfile. Instead of being a deterministic profile and recording all calls, it's a sampling profile. It samples the call stack from time to time, accumulating and reporting on the lines that are present more often in the call stack. This is a marked improvement over cProfile as it shows which lines of the code are actually slow, thereby preserving more clearly the relationship between callers and callees. Being a sampling profile, it's performance impact is a lot lighter. However, by discarding the stack traces and only aggregating statistics, it also fails to completely “tell the story” of why the performance is slow, or why certain functions where called in a certain way.
Without reproduction instructions, the best we can hope for is to be able to “corral” the performance issue by adding successive timing calls in the middle of the code. However doing this in the production environment is time consuming and is bound to cause conflicts with the system administrators who are, most of the time, responsible for making sure that you're not allowed to restart the server at your own fancy. In Zope land, there's a product called DeadlockDebugger, which is really useful when you can experience performance issues on your own, without the help from the client.
But before we talk about DeadlockDebugger, we briefly must go over the architecture of Zope. It has a single thread that listens to HTTP requests, parses them and hands them over to a few Worker threads. These Worker threads are responsible for turning a request into a method call of an object, which is usually stored into Zope's object oriented Database, the ZODB. The ZODB is clusterable, w
Logs do servidor HTTP Mostram URLs lentas Fila de Requisições → Falsos positivos Queries no DB Slow queries log
Executed directly on the Listening thread. Works even when the system is completely blocked.
Otherwise, you're just trying to train the angry mob to work for you.
“statprofile” throws away exactly what is most valuable: The stack traces
Captures even the most un-reproduceable Diagnose/Optimize before complaints
GIL contention could be alleviated by monitoring all threads from a single additional thread. But not if the sampling itself is too heavy.
sys._current_frames(), Radiografando seu software em tempo real