Debugging of
(C)Python applications
June the 20th, KharkivPy
Roman Podoliaka (@amd4ever)
http://bit.ly/1LpjXGL
Why debugging?
• open source cloud platform
• dozens of (micro-)services
• new features are important, but
making OpenStack stable, scalable
and HA is even more important
• every day performance testing on
hundreds of bare metal nodes
• nightly CI jobs running functional
and destructive tests
• things break… pretty much all the
time!
A little humble OpenStack
Typical environment
• CentOS 6 or Ubuntu 14.04
• CPython 2.6 or 2.7
• eventlet-based concurrency model for Python
services
• MySQL (Galera), memcache [, MongoDB]
• RabbitMQ
Credits
• “Debugging Python applications in Production”
by Vladimir Kirillov (https://www.youtube.com/
watch?v=F9FHIghn_Vk)
• Brendan Gregg’s Blog (http://
www.brendangregg.com/blog/index.html)
printf() debugging
printf() debugging: python-memcache
def _get_server(self, key):
if isinstance(key, tuple):
serverhash, key = key
else:
serverhash = serverHashFunction(key)
if not self.buckets:
return None, None
for i in range(Client._SERVER_RETRIES):
server = self.buckets[serverhash % len(self.buckets)]
if server.connect():
# print("(using server %s)" % server,)
return server, key
serverhash = serverHashFunction(str(serverhash) + str(i))
return None, None
printf() debugging: just don’t do that!
• the most primitive way of introspection at runtime
• either always enabled or explicitly commented in
the code
• limited to stdout/stderror streams
• information is only (barely) usable for developers
• pollutes the code when committed to VCS
repositories
Logging
Logging: basics
import logging
FORMAT = "%(asctime)-15s %(clientip)s %(user)-8s %(message)s"
logging.basicConfig(format=FORMAT)
d = {'clientip': '192.168.0.1', 'user': 'fbloggs'}
logging.warning("Protocol problem: %s", "connection reset", extra=d)
2006-02-08 22:20:02,165 192.168.0.1 fbloggs Protocol problem: connection reset
Logging: log levels
if is_pid_cmdline_correct(pid, conffile.split('/')[-1]):
try:
_execute('kill', '-HUP', pid, run_as_root=True)
_add_dnsmasq_accept_rules(dev)
return
except Exception as exc:
LOG.error(_LE('kill -HUP dnsmasq threw %s'), exc)
else:
LOG.debug('Pid %d is stale, relaunching dnsmasq', pid)
Level Numeric value
CRITICAL 50
ERROR 40
WARNING 30
INFO 20
DEBUG 10
NOTSET 0
Logging: log records propagation
import logging
LOG = logging.getLogger('sqlalchemy.orm')
...
LOG.debug('Instance changed state from `%(prev_state)s` to `%(new_state)s`',
prev_state=prev_state, new_state=new_state)
sqlalchemy.orm -> sqlalchemy -> (root)
Logging: context matters
cfg.StrOpt('logging_context_format_string',
default='%(asctime)s.%(msecs)03d %(process)d %(levelname)s '
'%(name)s [%(request_id)s %(user_identity)s] '
‘%(instance)s%(message)s’)
2015-06-10 12:42:00.765 27516 INFO nova.osapi_compute.wsgi.server [req-58f233ab-
f2b6-452f-b4fe-0c781ce8f8d0 None] 192.168.0.1 "GET /v2/
fc7f78f1c53d4443976514d2fd16e5cb/images/det
ail HTTP/1.1" status: 200 len: 905 time: 0.1043971
2015-06-10 12:41:57.004 2760 AUDIT nova.virt.block_device
[req-209db629-0d06-4f81-92ad-b910f1a72b36 None] [instance: a0d1c6ef-1fa8-46f9-a19d-
f8fb7d2df6a2] Booting with volume 8bad9533-9d6f-4be8-939d-b7a28a536a1a at /dev/vda
Logging: log processing
• logs are collected from different sources and parsed
(Logstash)
• then they are imported into a full-text search system
(ElasticSearch)
• Web UI is used for providing easy access to results and
querying (Kibana)
Logging: log processing
title: Kernel Neighbour table overflow
query: >
filename:kernel.log
AND level:warning
AND message:neighbour AND message:overflow
title: Neutron Skipping router removal
query: >
filename:neutron-l3-agent.log
AND location:neutron.agent.l3_agent
AND message:skipping AND message:removal
title: Neutron OVS lib errors and warnings
query: >
filename:neutron-openvswitch-agent.log
AND location:neutron.agent.linux.ovs_lib
AND level:(error OR warning)
title: Neutron race condition at subnet deletion
query: >
filename:neutron
AND level:trace
AND message:AttributeError
Logging: summary
• useful for both developers and operators
• developers define verbosity by the means of
logging levels
• configurable handlers (file, syslog, network, etc)
• advanced tooling for log processing / monitoring
Logging: useful links
• General info: https://docs.python.org/3.3/howto/
logging.html#logging-howto
• Adding contextual information: https://
docs.python.org/2/howto/logging-
cookbook.html#adding-contextual-information-to-
your-logging-output
• Logstash/ElasticSearch/Kibana: http://
www.logstash.net/docs/1.4.2/tutorials/getting-
started-with-logstash
pdb
pdb: basics
def _binary_search(arr, left, right, key):
if left == right:
return -1
middle = left + (right - left) / 2
if key == arr[middle]:
return middle
elif key > arr[middle]:
return _binary_search(arr, middle, right, key)
else:
return _binary_search(arr, left, middle, key)
def binary_search(arr, key):
return _binary_search(arr, 0, len(arr), key)
l = list(range(10))
assert binary_search(l, 5) == 5
assert binary_search(l, 0) == 0
assert binary_search(l, 9) == 9
assert binary_search(l, 10) == -1
assert binary_search(l, -5) == -1
pdb: basics
Romans-MacBook-Air:03-pdb malor$ python -m pdb basics.py
> /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py(1)<module>()
-> def _binary_search(arr, left, right, key):
(Pdb) break binary_search
Breakpoint 1 at /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py:15
(Pdb) continue
> /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py(16)binary_search()
-> return _binary_search(arr, 0, len(arr), key)
(Pdb) args
arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
key = 5
(Pdb) step
--Call--
> /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py(1)_binary_search()
-> def _binary_search(arr, left, right, key):
(Pdb) next
> /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py(2)_binary_search()
-> if left == right:
pdb: basics
(Pdb) list
1 def _binary_search(arr, left, right, key):
2 -> if left == right:
3 return -1
4
5 middle = left + (right - left) / 2
6
7 if key == arr[middle]:
8 return middle
9 elif key > arr[middle]:
10 return _binary_search(arr, middle, right, key)
11 else:
(Pdb) where
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
bdb.py(400)run()
-> exec cmd in globals, locals
<string>(1)<module>()
/Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py(20)<module>()
-> assert binary_search(l, 5) == 5
/Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py(16)binary_search()
-> return _binary_search(arr, 0, len(arr), key)
> /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py(2)_binary_search()
-> if left == right:
pdb: post-mortem debugging
Romans-MacBook-Air:03-pdb malor$ python -m pdb basics.py
> /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py(1)<module>()
-> def _binary_search(arr, left, right, key):
(Pdb) continue
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
pdb.py", line 1314, in main
pdb._runscript(mainpyfile)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
pdb.py", line 1233, in _runscript
self.run(statement)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
bdb.py", line 400, in run
exec cmd in globals, locals
File "<string>", line 1, in <module>
File "basics.py", line 1, in <module>
def _binary_search(arr, left, right, key):
File "basics.py", line 16, in binary_search
return _binary_search(arr, 0, len(arr), key)
…
RuntimeError: maximum recursion depth exceeded
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
-> return _binary_search(arr, middle, right, key)
py.test --pdb
nosetest --pdb -s
. . .
pdb: commands
(Pdb) break
Num Type Disp Enb Where
1 breakpoint keep yes at /Users/malor/Dropbox/talks/kharkivpy-debugging/
examples/03-pdb/basics.py:15
breakpoint already hit 2 times
(Pdb) commands 1
(com) args
(com) where
(com) end
(Pdb) continue
arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
key = 5
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/
bdb.py(400)run()
-> exec cmd in globals, locals
<string>(1)<module>()
/Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py(20)<module>()
-> assert binary_search(l, 5) == 5
> /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py(16)binary_search()
-> return _binary_search(arr, 0, len(arr), key)
> /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py(16)binary_search()
-> return _binary_search(arr, 0, len(arr), key)
pdb: conditional break points
(Pdb) break binary_search
Breakpoint 1 at /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py:15
(Pdb) break
Num Type Disp Enb Where
1 breakpoint keep yes at /Users/malor/Dropbox/talks/kharkivpy-debugging/
examples/03-pdb/basics.py:15
(Pdb) condition 1 key == 10
(Pdb) continue
> /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/
basics.py(16)binary_search()
-> return _binary_search(arr, 0, len(arr), key)
(Pdb) args
arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
key = 10
pdb: summary
• bread and butter of Python developers
• usually the easiest and the quickest way of debugging scripts/apps
• integrated with popular test runners
• greenlet-friendly
• requires stdin/stdout, thus not usable for debugging daemons or
embedded Python code (like Gimp or Blender plugins)
• not suitable for debugging of multithreaded/multiprocessing
applications
• can’t attach to a running process (if not modified in advance)
winpdb
winpdb: attaching to a process
rpodolyaka@rpodolyaka-pc:~/sandbox/debugging$ rpdb2 -d search.py
A password should be set to secure debugger client-server communication.
Please type a password:r00tme
Password has been set
rpodolyaka@rpodolyaka-pc:~$ rpdb2
RPDB2 - The Remote Python Debugger, version RPDB_2_4_8,
Copyright (C) 2005-2009 Nir Aides.
> password r00tme
Password is set to: "r00tme"
> attach
Connecting to 'localhost'...
Scripts to debug on 'localhost':
pid name
--------------------------
3706 /home/rpodolyaka/sandbox/debugging/search.py
> attach 3706
> *** Attaching to debuggee...
winpdb: attaching to a process
> bp binary_search
> bl
List of breakpoints:
Id State Line Filename-Scope-Condition-Encoding
------------------------------------------------------------------------------
0 enabled 15 /home/rpodolyaka/sandbox/debugging/search.py
binary_search
> go
> *** Debuggee is waiting at break point for further commands.
> stack
Stack trace for thread 140416296978176:
Frame File Name Line Function
------------------------------------------------------------------------------
> 0 ...ndbox/debugging/search.py 15 <module>
1 ....7/dist-packages/rpdb2.py 14220 StartServer
2 ....7/dist-packages/rpdb2.py 14470 main
3 /usr/bin/rpdb2 31 <module>
winpdb: embedded debugging
def add_lease(mac, ip_address):
"""Set the IP that was assigned by the DHCP server."""
import rpdb2; rpdb2.start_embedded_debugger('r00tme')
api = network_rpcapi.NetworkAPI()
api.lease_fixed_ip(context.get_admin_context(), ip_address, CONF.host)
dnsmasq daemon forks and executes this like:
nova-dhcpbridge add AA:BB:CC:DD:EE:FF 10.0.0.2
winpdb: debugging of threads
def allocate_ips(engine, host):
while True:
with engine.begin() as conn:
result = conn.execute(
ip_addresses.select() 
.where(ip_addresses.c.host.is_(None))
).first()
if result is None:
# no IPs left
break
id, address = result.id, result.address
rows = conn.execute(
ip_addresses.update() 
.values(host=host) 
.where(ip_addresses.c.id == id) 
.where(ip_addresses.c.address == address) 
.where(ip_addresses.c.host.is_(None))
)
if not rows:
# concurrent update
continue
winpdb: debugging of threads
t1 = threading.Thread(target=allocate_ips, args=(eng, 'host1'))
t1.start()
t2 = threading.Thread(target=allocate_ips, args=(eng, 'host2'))
t2.start()
t1.join()
t2.join()
> attach $PID
…
> thread
List of active threads known to the debugger:
No Tid Name State
-----------------------------------------------
0 140456866166528 MainThread waiting at break point
> 1 140456389068544 Thread-1 waiting at break point
2 140456380675840 Thread-2 waiting at break point
winpdb: debugging of threads
> thread 2
Focus was set to chosen thread.
> stack
Stack trace for thread 140456380675840:
Frame File Name Line Function
------------------------------------------------------------------------------
> 0 /home/rpodolyaka/sa.py 30 allocate_ips
1 ...ib/python2.7/threading.py 763 run
> go
> break
> *** Debuggee is waiting at break point for further commands.
> stack
Stack trace for thread 140456380675840:
Frame File Name Line Function
------------------------------------------------------------------------------
> 0 ...alchemy/engine/default.py 409 do_commit
1 ...sqlalchemy/engine/base.py 525 _commit_impl
2 ...sqlalchemy/engine/base.py 1364 _do_commit
winpdb: summary
• allows to debug multithreaded Python
applications
• remote debugging (which effectively means, no
stdout/stdint limitations as with pdb)
• wxWidgets-based GUI
• to attach to a running process you need to
modified it in advance (embedded debugging) or
start it with rpdb2
cProfile
cProfile: basics
def count_freq(stream):
res = {}
for i in iter(lambda: stream.read(1), ''):
try:
res[i] += 1
except KeyError:
res[i] = 1
return res
def build_tree(stream):
queue = [Node(freq=v, symb=k) for k, v in count_freq(stream).items()]
while len(queue) > 1:
queue.sort(key=lambda k: k.freq)
first = queue.pop(0)
second = queue.pop(0)
queue.append(
Node(freq=(first.freq + second.freq), left=first, right=second)
)
return queue[0]
cProfile: Amdahl's law
cProfile: basics
Romans-MacBook-Air:07-cprofile malor$ python -m cProfile -s cumtime huffman.py ~/
Downloads/kharkivpy-debugging.key
24868775 function calls in 14.059 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.008 0.008 14.059 14.059 huffman.py:1(<module>)
1 0.001 0.001 14.051 14.051 huffman.py:33(build_tree)
1 5.029 5.029 14.035 14.035 huffman.py:23(count_freq)
12417038 3.863 0.000 9.006 0.000 huffman.py:25(<lambda>)
12417038 5.143 0.000 5.143 0.000 {method 'read' of 'file' objects}
255 0.009 0.000 0.014 0.000 {method 'sort' of 'list' objects}
32895 0.005 0.000 0.005 0.000 huffman.py:36(<lambda>)
511 0.001 0.000 0.001 0.000 huffman.py:7(__init__)
510 0.000 0.000 0.000 0.000 {method 'pop' of 'list' objects}
1 0.000 0.000 0.000 0.000 functools.py:53(total_ordering)
1 0.000 0.000 0.000 0.000 {open}
256 0.000 0.000 0.000 0.000 {len}
255 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {dir}
1 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
3 0.000 0.000 0.000 0.000 {setattr}
3 0.000 0.000 0.000 0.000 {getattr}
1 0.000 0.000 0.000 0.000 {max}
cProfile: visualisation
cProfile: context matters
import cProfile as profiler
import gc, pstats, time
def profile(fn):
def wrapper(*args, **kw):
elapsed, stat_loader, result = _profile(“out.prof”, fn, *args, **kw)
stats = stat_loader()
stats.sort_stats('cumulative')
stats.print_stats()
return result
return wrapper
def _profile(filename, fn, *args, **kw):
load_stats = lambda: pstats.Stats(filename)
gc.collect()
began = time.time()
profiler.runctx('result = fn(*args, **kw)', globals(), locals(),
filename=filename)
ended = time.time()
return ended - began, load_stats, locals()['result']
cProfile: context matters
from werkzeug.contrib.profiler import ProfilerMiddleware
app = ProfilerMiddleware(app)
cProfile: context matters
PATH: '/6e0f43cd74db46f5b95f2142fe0c9431/flavors/detail'
2732 function calls (2602 primitive calls) in 1.294 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.287 1.287 /usr/lib/python2.7/dist-packages/nova/
api/compute_req_id.py:38(__call__)
2/1 0.008 0.004 1.287 1.287 /usr/lib/python2.7/dist-packages/
webob/request.py:1300(send)
2/1 0.000 0.000 1.287 1.287 /usr/lib/python2.7/dist-packages/
webob/request.py:1262(call_application)
1 0.000 0.000 1.287 1.287 /usr/lib/python2.7/dist-packages/nova/
api/openstack/__init__.py:121(__call__)
1 0.000 0.000 1.271 1.271 /usr/lib/python2.7/dist-packages/
keystonemiddleware/auth_token.py:686(__call__)
1 0.000 0.000 1.270 1.270 /usr/lib/python2.7/dist-packages/
keystonemiddleware/auth_token.py:829(_validate_token)
1 0.000 0.000 1.270 1.270 /usr/lib/python2.7/dist-packages/
keystonemiddleware/auth_token.py:1669(get)
1 0.000 0.000 1.270 1.270 /usr/lib/python2.7/dist-packages/
keystonemiddleware/auth_token.py:1726(_cache_get)
cProfile: summary
• easy CPU profiling of Python code with low
overhead
• text/binary representation of profiling results (the
latter can be used for merging results and/or
visualisation done by external tools)
• can’t attach to a running process
• can’t profile Python interpreter-level code
(Py_EvaluateFrameEx, etc)
objgraph
objgraph: basics
In [1]: import objgraph
In [2]: objgraph.show_most_common_types()
function 4530
dict 2483
tuple 1428
wrapper_descriptor 1260
weakref 981
list 911
builtin_function_or_method 897
method_descriptor 705
getset_descriptor 531
type 473
objgraph: basics
In [3]: objgraph.show_growth()
function 4530 +4530
dict 2412 +2412
tuple 1353 +1353
wrapper_descriptor 1272 +1272
weakref 985 +985
list 904 +904
builtin_function_or_method 897 +897
method_descriptor 706 +706
getset_descriptor 535 +535
type 473 +473
In [4]: objgraph.show_growth()
weakref 986 +1
list 905 +1
tuple 1354 +1
objgraph: graphs
>>> x = []
>>> y = [x, [x], {‘x’: x}]
>>> objgraph.show_refs([y], filename='sample-graph.png')
strace
strace: tracing syscalls
rpodolyaka@rpodolyaka-pc:~$ strace -e network python sa.py
. . .
socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(5, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
connect(5, {sa_family=AF_INET6, sin6_port=htons(5432), inet_pton(AF_INET6, "::1",
&sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EINPROGRESS (Operation now
in progress)
getsockopt(5, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
getsockname(5, {sa_family=AF_INET6, sin6_port=htons(36894), inet_pton(AF_INET6, "::
1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
sendto(5, "00010432226/", 8, MSG_NOSIGNAL, NULL, 0) = 8
recvfrom(5, "S", 16384, 0, NULL, NULL) = 1
. . .
strace: tracing syscalls
root@node-13:~# strace -p 1508 -s 4096 -tt
. . .
16:53:29.532770 epoll_wait(7, {}, 1023, 0) = 0
16:53:29.532832 epoll_wait(7, {}, 1023, 0) = 0
16:53:29.532892 epoll_wait(7, {}, 1023, 0) = 0
16:53:29.532953 epoll_wait(7, {}, 1023, 0) = 0
16:53:29.533022 epoll_wait(7, {{EPOLLIN, {u32=9, u64=39432335262744585}}}, 1023,
915) = 1
16:53:29.596409 epoll_ctl(7, EPOLL_CTL_DEL, 9, {EPOLLRDNORM|EPOLLWRBAND|EPOLLMSG|
0x28c45820, {u32=32644, u64=22396489217113988}}) = 0
16:53:29.596494 accept(9, 0x7ffe1ef32b10, [16]) = -1 EAGAIN (Resource temporarily
unavailable)
16:53:29.596638 epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP,
{u32=9, u64=39432335262744585}}) = 0
16:53:29.596747 epoll_wait(7, {{EPOLLIN, {u32=9, u64=39432335262744585}}}, 1023,
851) = 1
16:53:29.611852 epoll_ctl(7, EPOLL_CTL_DEL, 9, {EPOLLRDNORM|EPOLLWRBAND|EPOLLMSG|
0x28c45820, {u32=32644, u64=22396489217113988}}) = 0
16:53:29.611937 accept(9, 0x7ffe1ef32b10, [16]) = -1 EAGAIN (Resource temporarily
unavailable)
. . .
strace: summary
• allows tracing of applications interactions with `outside
world`
• points possible problems with performance (like
excessive system calls, polling of events with too
small timeout, etc)
• limited to tracing of system calls of one process and
its forks
• use cautiously on production environments as it
greatly affects performance
gdb
gdb: prerequisites
• Ubuntu/Debian:
• sudo apt-get install gdb python-dbg
• CentOS/RHEL/Fedora (separate debuginfo
package repository):
• sudo yum install gdb python-debuginfo
gdb: basics
• python-dbg is a CPython binary built with
‘--with-debug -g’ options. It’s slow and verbose
about memory management
• you can debug regular CPython processes in
production using the debug symbols shipped separately
• gdb has Python bindings to write scripts for it
• CPython is shipped with a gdb script allowing to
analyse interpreter-level stack frames to get app-level
backtraces
gdb: `hanging` app
def allocate_ips(eng, host):
while True:
with eng.begin() as conn:
row = conn.execute(
ip_addresses.select() 
.where(ip_addresses.c.host.is_(None))
).fetchone()
if row is None:
break
id, address = row.id, row.address
updated_rows = conn.execute(
ip_addresses.update() 
.values(host=host) 
.where(ip_addresses.c.id == id) 
.where(ip_addresses.c.host.is_(None))
)
if not updated_rows:
continue
t = threading.Thread(target=allocate_ips, args=(eng, 'host1'))
t.start()
t.join()
gdb: `hanging` app
rpodolyaka@rpodolyaka-pc:~$ strace -p 20267
Process 20267 attached
futex(0x7fea50000c10, FUTEX_WAIT_PRIVATE, 0, NULL
rpodolyaka@rpodolyaka-pc:~$ gdb /usr/bin/python3.4 -p 20216
(gdb) t a a frame
Thread 2 (Thread 0x7f7702c83700 (LWP 20353)):
#0 sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:101
101 ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S: No such file or
directory.
Thread 1 (Thread 0x7f770a03b700 (LWP 20350)):
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
85 ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S: No such file or directory.
gdb: `hanging` app
(gdb) t a 2 py-bt
Thread 2 (Thread 0x7f7702c83700 (LWP 20353)):
Traceback (most recent call first):
File "/usr/lib/python3.4/threading.py", line 294, in wait
gotit = waiter.acquire(True, timeout)
File "/home/rpodolyaka/venv3/lib/python3.4/site-packages/sqlalchemy/util/
queue.py", line 157, in get
self.not_empty.wait(remaining)
File "/home/rpodolyaka/venv3/lib/python3.4/site-packages/sqlalchemy/pool.py", line
1039, in _do_get
return self._pool.get(wait, self._timeout)
File "/home/rpodolyaka/venv3/lib/python3.4/site-packages/sqlalchemy/engine/
base.py", line 2037, in contextual_connect
self._wrap_pool_connect(self.pool.connect, None),
File "/home/rpodolyaka/venv3/lib/python3.4/site-packages/sqlalchemy/engine/
base.py", line 1906, in begin
conn = self.contextual_connect(close_with_result=close_with_result)
File "sa.py", line 31, in allocate_ips
with eng.begin() as conn:
File "/usr/lib/python3.4/threading.py", line 868, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner
self.run()
File "/usr/lib/python3.4/threading.py", line 888, in _bootstrap
self._bootstrap_inner()
gdb: virtualenv pitfalls
rpodolyaka@rpodolyaka-pc:~$ gdb -p 20656 # WARN: executable not passed!
(gdb) py-bt
Undefined command: "py-bt". Try "help".
(gdb) bt
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1 0x00000000004cdff5 in PyThread_acquire_lock_timed ()
#2 0x0000000000522039 in ?? ()
#3 0x00000000004ee01a in PyEval_EvalFrameEx ()
#4 0x00000000004ec9fc in PyEval_EvalCodeEx ()
#5 0x00000000004f25a9 in PyEval_EvalFrameEx ()
#6 0x00000000004ec9fc in PyEval_EvalCodeEx ()
#7 0x00000000004f25a9 in PyEval_EvalFrameEx ()
#8 0x00000000004ec9fc in PyEval_EvalCodeEx ()
#9 0x0000000000581115 in ?? ()
#10 0x00000000005ab019 in PyRun_FileExFlags ()
#11 0x00000000005aa194 in PyRun_SimpleFileExFlags ()
#12 0x00000000004cb4cb in Py_Main ()
#13 0x00000000004ca8ef in main ()
gdb: summary
• allows to debug multithreaded applications
• allows to attach to a running process at any given moment of time
• can be used for analysing of core dumps (e.g. if we don’t want to
stop a process, or if it died unexpectedly)
• can be used for debugging of C-extensions, CFFI calls, etc
• success depends on how CPython was built and whether you
have installed debug symbols or not
• used by pyringe to provide pdb-like experience (https://
github.com/google/pyringe)
htop
htop
lsof
lsof: lsof -p $PID
nova-api 5910 nova mem REG 252,0 141574 3586 /lib/x86_64-linux-
gnu/libpthread-2.19.so
nova-api 5910 nova mem REG 252,0 149120 3582 /lib/x86_64-linux-
gnu/ld-2.19.so
nova-api 5910 nova mem REG 252,0 26258 52555 /usr/lib/x86_64-
linux-gnu/gconv/gconv-modules.cache
nova-api 5910 nova 0u CHR 1,3 0t0 1029 /dev/null
nova-api 5910 nova 1u CHR 136,13 0t0 16 /dev/pts/13
nova-api 5910 nova 2u CHR 136,13 0t0 16 /dev/pts/13
nova-api 5910 nova 3w REG 252,0 34967268 135756 /var/log/nova/
nova-api.log
nova-api 5910 nova 4u unix 0xffff880850b92a00 0t0 260406 socket
nova-api 5910 nova 5r FIFO 0,8 0t0 260407 pipe
nova-api 5910 nova 6w FIFO 0,8 0t0 260407 pipe
nova-api 5910 nova 7u IPv4 260408 0t0 TCP
node-13.domain.tld:8773 (LISTEN)
nova-api 5910 nova 8r CHR 1,9 0t0 1034 /dev/urandom
nova-api 5910 nova 9u IPv4 260409 0t0 TCP
node-13.domain.tld:8774 (LISTEN)
nova-api 5910 nova 10u IPv4 260420 0t0 TCP *:8775 (LISTEN)
nova-api 5910 nova 15u 0000 0,9 0 7380 anon_inode
netstat
netstat: netstat -nlap
tcp 8 0 192.168.0.16:52819 192.168.0.11:5673 ESTABLISHED
5975/python
tcp 0 0 192.168.0.16:36901 192.168.0.11:5673 ESTABLISHED
1513/python
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
3042/sshd
tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN
13888/mysqld
tcp 0 0 0.0.0.0:25 0.0.0.0:* LISTEN
7433/master
tcp 0 0 0.0.0.0:3260 0.0.0.0:* LISTEN
19704/tgtd
tcp 0 0 192.168.0.16:35357 0.0.0.0:* LISTEN
5546/python
perf_events
perf_events: perf top
perf_events: perf trace
254.663 ( 0.001 ms): sshd/22802 clock_gettime(which_clock: 7, tp: 0x7ffd0e807970
) = 0
254.666 ( 0.003 ms): sshd/22802 read(fd: 14</dev/ptmx>, buf: 0x7ffd0e8038b0,
count: 16384 ) = 4095
254.672 ( 0.243 ms): chrome/11973 epoll_wait(epfd: 16, events: 0x6a6a1b73480,
maxevents: 32, timeout: 4294967295) = 1
254.678 ( 0.003 ms): chrome/11973 read(fd: 24<socket:[147806]>, buf:
0x6a6a2d5b018, count: 4096 ) = 32
254.685 ( 0.003 ms): chrome/11973 write(fd: 11<pipe:[147797]>, buf:
0x7f940dfa55e7, count: 1 ) = 1
254.688 ( 0.001 ms): chrome/11973 read(fd: 24<socket:[147806]>, buf:
0x6a6a2d5b018, count: 4096 ) = -1 EAGAIN Resource temporarily unavailable
254.691 ( 0.001 ms): chrome/11973 epoll_wait(epfd: 16, events: 0x6a6a1b73480,
maxevents: 32 ) = 0
254.693 ( 0.001 ms): chrome/11973 epoll_wait(epfd: 16, events: 0x6a6a1b73480,
maxevents: 32 ) = 0
perf_events: perf stat
Performance counter stats for 'python sa.py':
125.242831 task-clock (msec) # 0.004 CPUs utilized
945 context-switches # 0.008 M/sec
14 cpu-migrations # 0.112 K/sec
6,996 page-faults # 0.056 M/sec
408,133,256 cycles # 3.259 GHz
213,117,410 stalled-cycles-frontend # 52.22% frontend cycles idle
<not supported> stalled-cycles-backend
432,245,331 instructions # 1.06 insns per cycle
# 0.49 stalled cycles per insn
91,417,607 branches # 729.923 M/sec
3,937,108 branch-misses # 4.31% of all branches
30.130596204 seconds time elapsed
Questions?
slides: http://bit.ly/1LpjXGL
twitter: @amd4ever

Debugging of (C)Python applications

  • 1.
    Debugging of (C)Python applications Junethe 20th, KharkivPy Roman Podoliaka (@amd4ever) http://bit.ly/1LpjXGL
  • 2.
    Why debugging? • opensource cloud platform • dozens of (micro-)services • new features are important, but making OpenStack stable, scalable and HA is even more important • every day performance testing on hundreds of bare metal nodes • nightly CI jobs running functional and destructive tests • things break… pretty much all the time!
  • 3.
    A little humbleOpenStack
  • 4.
    Typical environment • CentOS6 or Ubuntu 14.04 • CPython 2.6 or 2.7 • eventlet-based concurrency model for Python services • MySQL (Galera), memcache [, MongoDB] • RabbitMQ
  • 5.
    Credits • “Debugging Pythonapplications in Production” by Vladimir Kirillov (https://www.youtube.com/ watch?v=F9FHIghn_Vk) • Brendan Gregg’s Blog (http:// www.brendangregg.com/blog/index.html)
  • 6.
  • 7.
    printf() debugging: python-memcache def_get_server(self, key): if isinstance(key, tuple): serverhash, key = key else: serverhash = serverHashFunction(key) if not self.buckets: return None, None for i in range(Client._SERVER_RETRIES): server = self.buckets[serverhash % len(self.buckets)] if server.connect(): # print("(using server %s)" % server,) return server, key serverhash = serverHashFunction(str(serverhash) + str(i)) return None, None
  • 8.
    printf() debugging: justdon’t do that! • the most primitive way of introspection at runtime • either always enabled or explicitly commented in the code • limited to stdout/stderror streams • information is only (barely) usable for developers • pollutes the code when committed to VCS repositories
  • 9.
  • 10.
    Logging: basics import logging FORMAT= "%(asctime)-15s %(clientip)s %(user)-8s %(message)s" logging.basicConfig(format=FORMAT) d = {'clientip': '192.168.0.1', 'user': 'fbloggs'} logging.warning("Protocol problem: %s", "connection reset", extra=d) 2006-02-08 22:20:02,165 192.168.0.1 fbloggs Protocol problem: connection reset
  • 11.
    Logging: log levels ifis_pid_cmdline_correct(pid, conffile.split('/')[-1]): try: _execute('kill', '-HUP', pid, run_as_root=True) _add_dnsmasq_accept_rules(dev) return except Exception as exc: LOG.error(_LE('kill -HUP dnsmasq threw %s'), exc) else: LOG.debug('Pid %d is stale, relaunching dnsmasq', pid) Level Numeric value CRITICAL 50 ERROR 40 WARNING 30 INFO 20 DEBUG 10 NOTSET 0
  • 12.
    Logging: log recordspropagation import logging LOG = logging.getLogger('sqlalchemy.orm') ... LOG.debug('Instance changed state from `%(prev_state)s` to `%(new_state)s`', prev_state=prev_state, new_state=new_state) sqlalchemy.orm -> sqlalchemy -> (root)
  • 13.
    Logging: context matters cfg.StrOpt('logging_context_format_string', default='%(asctime)s.%(msecs)03d%(process)d %(levelname)s ' '%(name)s [%(request_id)s %(user_identity)s] ' ‘%(instance)s%(message)s’) 2015-06-10 12:42:00.765 27516 INFO nova.osapi_compute.wsgi.server [req-58f233ab- f2b6-452f-b4fe-0c781ce8f8d0 None] 192.168.0.1 "GET /v2/ fc7f78f1c53d4443976514d2fd16e5cb/images/det ail HTTP/1.1" status: 200 len: 905 time: 0.1043971 2015-06-10 12:41:57.004 2760 AUDIT nova.virt.block_device [req-209db629-0d06-4f81-92ad-b910f1a72b36 None] [instance: a0d1c6ef-1fa8-46f9-a19d- f8fb7d2df6a2] Booting with volume 8bad9533-9d6f-4be8-939d-b7a28a536a1a at /dev/vda
  • 14.
    Logging: log processing •logs are collected from different sources and parsed (Logstash) • then they are imported into a full-text search system (ElasticSearch) • Web UI is used for providing easy access to results and querying (Kibana)
  • 15.
    Logging: log processing title:Kernel Neighbour table overflow query: > filename:kernel.log AND level:warning AND message:neighbour AND message:overflow title: Neutron Skipping router removal query: > filename:neutron-l3-agent.log AND location:neutron.agent.l3_agent AND message:skipping AND message:removal title: Neutron OVS lib errors and warnings query: > filename:neutron-openvswitch-agent.log AND location:neutron.agent.linux.ovs_lib AND level:(error OR warning) title: Neutron race condition at subnet deletion query: > filename:neutron AND level:trace AND message:AttributeError
  • 16.
    Logging: summary • usefulfor both developers and operators • developers define verbosity by the means of logging levels • configurable handlers (file, syslog, network, etc) • advanced tooling for log processing / monitoring
  • 17.
    Logging: useful links •General info: https://docs.python.org/3.3/howto/ logging.html#logging-howto • Adding contextual information: https:// docs.python.org/2/howto/logging- cookbook.html#adding-contextual-information-to- your-logging-output • Logstash/ElasticSearch/Kibana: http:// www.logstash.net/docs/1.4.2/tutorials/getting- started-with-logstash
  • 18.
  • 19.
    pdb: basics def _binary_search(arr,left, right, key): if left == right: return -1 middle = left + (right - left) / 2 if key == arr[middle]: return middle elif key > arr[middle]: return _binary_search(arr, middle, right, key) else: return _binary_search(arr, left, middle, key) def binary_search(arr, key): return _binary_search(arr, 0, len(arr), key) l = list(range(10)) assert binary_search(l, 5) == 5 assert binary_search(l, 0) == 0 assert binary_search(l, 9) == 9 assert binary_search(l, 10) == -1 assert binary_search(l, -5) == -1
  • 20.
    pdb: basics Romans-MacBook-Air:03-pdb malor$python -m pdb basics.py > /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py(1)<module>() -> def _binary_search(arr, left, right, key): (Pdb) break binary_search Breakpoint 1 at /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py:15 (Pdb) continue > /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py(16)binary_search() -> return _binary_search(arr, 0, len(arr), key) (Pdb) args arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] key = 5 (Pdb) step --Call-- > /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py(1)_binary_search() -> def _binary_search(arr, left, right, key): (Pdb) next > /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py(2)_binary_search() -> if left == right:
  • 21.
    pdb: basics (Pdb) list 1def _binary_search(arr, left, right, key): 2 -> if left == right: 3 return -1 4 5 middle = left + (right - left) / 2 6 7 if key == arr[middle]: 8 return middle 9 elif key > arr[middle]: 10 return _binary_search(arr, middle, right, key) 11 else: (Pdb) where /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ bdb.py(400)run() -> exec cmd in globals, locals <string>(1)<module>() /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py(20)<module>() -> assert binary_search(l, 5) == 5 /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py(16)binary_search() -> return _binary_search(arr, 0, len(arr), key) > /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py(2)_binary_search() -> if left == right:
  • 22.
    pdb: post-mortem debugging Romans-MacBook-Air:03-pdbmalor$ python -m pdb basics.py > /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py(1)<module>() -> def _binary_search(arr, left, right, key): (Pdb) continue Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ pdb.py", line 1314, in main pdb._runscript(mainpyfile) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ pdb.py", line 1233, in _runscript self.run(statement) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ bdb.py", line 400, in run exec cmd in globals, locals File "<string>", line 1, in <module> File "basics.py", line 1, in <module> def _binary_search(arr, left, right, key): File "basics.py", line 16, in binary_search return _binary_search(arr, 0, len(arr), key) … RuntimeError: maximum recursion depth exceeded Uncaught exception. Entering post mortem debugging Running 'cont' or 'step' will restart the program -> return _binary_search(arr, middle, right, key) py.test --pdb nosetest --pdb -s . . .
  • 23.
    pdb: commands (Pdb) break NumType Disp Enb Where 1 breakpoint keep yes at /Users/malor/Dropbox/talks/kharkivpy-debugging/ examples/03-pdb/basics.py:15 breakpoint already hit 2 times (Pdb) commands 1 (com) args (com) where (com) end (Pdb) continue arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] key = 5 /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ bdb.py(400)run() -> exec cmd in globals, locals <string>(1)<module>() /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py(20)<module>() -> assert binary_search(l, 5) == 5 > /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py(16)binary_search() -> return _binary_search(arr, 0, len(arr), key) > /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py(16)binary_search() -> return _binary_search(arr, 0, len(arr), key)
  • 24.
    pdb: conditional breakpoints (Pdb) break binary_search Breakpoint 1 at /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py:15 (Pdb) break Num Type Disp Enb Where 1 breakpoint keep yes at /Users/malor/Dropbox/talks/kharkivpy-debugging/ examples/03-pdb/basics.py:15 (Pdb) condition 1 key == 10 (Pdb) continue > /Users/malor/Dropbox/talks/kharkivpy-debugging/examples/03-pdb/ basics.py(16)binary_search() -> return _binary_search(arr, 0, len(arr), key) (Pdb) args arr = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] key = 10
  • 25.
    pdb: summary • breadand butter of Python developers • usually the easiest and the quickest way of debugging scripts/apps • integrated with popular test runners • greenlet-friendly • requires stdin/stdout, thus not usable for debugging daemons or embedded Python code (like Gimp or Blender plugins) • not suitable for debugging of multithreaded/multiprocessing applications • can’t attach to a running process (if not modified in advance)
  • 26.
  • 27.
    winpdb: attaching toa process rpodolyaka@rpodolyaka-pc:~/sandbox/debugging$ rpdb2 -d search.py A password should be set to secure debugger client-server communication. Please type a password:r00tme Password has been set rpodolyaka@rpodolyaka-pc:~$ rpdb2 RPDB2 - The Remote Python Debugger, version RPDB_2_4_8, Copyright (C) 2005-2009 Nir Aides. > password r00tme Password is set to: "r00tme" > attach Connecting to 'localhost'... Scripts to debug on 'localhost': pid name -------------------------- 3706 /home/rpodolyaka/sandbox/debugging/search.py > attach 3706 > *** Attaching to debuggee...
  • 28.
    winpdb: attaching toa process > bp binary_search > bl List of breakpoints: Id State Line Filename-Scope-Condition-Encoding ------------------------------------------------------------------------------ 0 enabled 15 /home/rpodolyaka/sandbox/debugging/search.py binary_search > go > *** Debuggee is waiting at break point for further commands. > stack Stack trace for thread 140416296978176: Frame File Name Line Function ------------------------------------------------------------------------------ > 0 ...ndbox/debugging/search.py 15 <module> 1 ....7/dist-packages/rpdb2.py 14220 StartServer 2 ....7/dist-packages/rpdb2.py 14470 main 3 /usr/bin/rpdb2 31 <module>
  • 29.
    winpdb: embedded debugging defadd_lease(mac, ip_address): """Set the IP that was assigned by the DHCP server.""" import rpdb2; rpdb2.start_embedded_debugger('r00tme') api = network_rpcapi.NetworkAPI() api.lease_fixed_ip(context.get_admin_context(), ip_address, CONF.host) dnsmasq daemon forks and executes this like: nova-dhcpbridge add AA:BB:CC:DD:EE:FF 10.0.0.2
  • 30.
    winpdb: debugging ofthreads def allocate_ips(engine, host): while True: with engine.begin() as conn: result = conn.execute( ip_addresses.select() .where(ip_addresses.c.host.is_(None)) ).first() if result is None: # no IPs left break id, address = result.id, result.address rows = conn.execute( ip_addresses.update() .values(host=host) .where(ip_addresses.c.id == id) .where(ip_addresses.c.address == address) .where(ip_addresses.c.host.is_(None)) ) if not rows: # concurrent update continue
  • 31.
    winpdb: debugging ofthreads t1 = threading.Thread(target=allocate_ips, args=(eng, 'host1')) t1.start() t2 = threading.Thread(target=allocate_ips, args=(eng, 'host2')) t2.start() t1.join() t2.join() > attach $PID … > thread List of active threads known to the debugger: No Tid Name State ----------------------------------------------- 0 140456866166528 MainThread waiting at break point > 1 140456389068544 Thread-1 waiting at break point 2 140456380675840 Thread-2 waiting at break point
  • 32.
    winpdb: debugging ofthreads > thread 2 Focus was set to chosen thread. > stack Stack trace for thread 140456380675840: Frame File Name Line Function ------------------------------------------------------------------------------ > 0 /home/rpodolyaka/sa.py 30 allocate_ips 1 ...ib/python2.7/threading.py 763 run > go > break > *** Debuggee is waiting at break point for further commands. > stack Stack trace for thread 140456380675840: Frame File Name Line Function ------------------------------------------------------------------------------ > 0 ...alchemy/engine/default.py 409 do_commit 1 ...sqlalchemy/engine/base.py 525 _commit_impl 2 ...sqlalchemy/engine/base.py 1364 _do_commit
  • 33.
    winpdb: summary • allowsto debug multithreaded Python applications • remote debugging (which effectively means, no stdout/stdint limitations as with pdb) • wxWidgets-based GUI • to attach to a running process you need to modified it in advance (embedded debugging) or start it with rpdb2
  • 34.
  • 35.
    cProfile: basics def count_freq(stream): res= {} for i in iter(lambda: stream.read(1), ''): try: res[i] += 1 except KeyError: res[i] = 1 return res def build_tree(stream): queue = [Node(freq=v, symb=k) for k, v in count_freq(stream).items()] while len(queue) > 1: queue.sort(key=lambda k: k.freq) first = queue.pop(0) second = queue.pop(0) queue.append( Node(freq=(first.freq + second.freq), left=first, right=second) ) return queue[0]
  • 36.
  • 37.
    cProfile: basics Romans-MacBook-Air:07-cprofile malor$python -m cProfile -s cumtime huffman.py ~/ Downloads/kharkivpy-debugging.key 24868775 function calls in 14.059 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.008 0.008 14.059 14.059 huffman.py:1(<module>) 1 0.001 0.001 14.051 14.051 huffman.py:33(build_tree) 1 5.029 5.029 14.035 14.035 huffman.py:23(count_freq) 12417038 3.863 0.000 9.006 0.000 huffman.py:25(<lambda>) 12417038 5.143 0.000 5.143 0.000 {method 'read' of 'file' objects} 255 0.009 0.000 0.014 0.000 {method 'sort' of 'list' objects} 32895 0.005 0.000 0.005 0.000 huffman.py:36(<lambda>) 511 0.001 0.000 0.001 0.000 huffman.py:7(__init__) 510 0.000 0.000 0.000 0.000 {method 'pop' of 'list' objects} 1 0.000 0.000 0.000 0.000 functools.py:53(total_ordering) 1 0.000 0.000 0.000 0.000 {open} 256 0.000 0.000 0.000 0.000 {len} 255 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {dir} 1 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects} 3 0.000 0.000 0.000 0.000 {setattr} 3 0.000 0.000 0.000 0.000 {getattr} 1 0.000 0.000 0.000 0.000 {max}
  • 38.
  • 39.
    cProfile: context matters importcProfile as profiler import gc, pstats, time def profile(fn): def wrapper(*args, **kw): elapsed, stat_loader, result = _profile(“out.prof”, fn, *args, **kw) stats = stat_loader() stats.sort_stats('cumulative') stats.print_stats() return result return wrapper def _profile(filename, fn, *args, **kw): load_stats = lambda: pstats.Stats(filename) gc.collect() began = time.time() profiler.runctx('result = fn(*args, **kw)', globals(), locals(), filename=filename) ended = time.time() return ended - began, load_stats, locals()['result']
  • 40.
    cProfile: context matters fromwerkzeug.contrib.profiler import ProfilerMiddleware app = ProfilerMiddleware(app)
  • 41.
    cProfile: context matters PATH:'/6e0f43cd74db46f5b95f2142fe0c9431/flavors/detail' 2732 function calls (2602 primitive calls) in 1.294 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.287 1.287 /usr/lib/python2.7/dist-packages/nova/ api/compute_req_id.py:38(__call__) 2/1 0.008 0.004 1.287 1.287 /usr/lib/python2.7/dist-packages/ webob/request.py:1300(send) 2/1 0.000 0.000 1.287 1.287 /usr/lib/python2.7/dist-packages/ webob/request.py:1262(call_application) 1 0.000 0.000 1.287 1.287 /usr/lib/python2.7/dist-packages/nova/ api/openstack/__init__.py:121(__call__) 1 0.000 0.000 1.271 1.271 /usr/lib/python2.7/dist-packages/ keystonemiddleware/auth_token.py:686(__call__) 1 0.000 0.000 1.270 1.270 /usr/lib/python2.7/dist-packages/ keystonemiddleware/auth_token.py:829(_validate_token) 1 0.000 0.000 1.270 1.270 /usr/lib/python2.7/dist-packages/ keystonemiddleware/auth_token.py:1669(get) 1 0.000 0.000 1.270 1.270 /usr/lib/python2.7/dist-packages/ keystonemiddleware/auth_token.py:1726(_cache_get)
  • 42.
    cProfile: summary • easyCPU profiling of Python code with low overhead • text/binary representation of profiling results (the latter can be used for merging results and/or visualisation done by external tools) • can’t attach to a running process • can’t profile Python interpreter-level code (Py_EvaluateFrameEx, etc)
  • 43.
  • 44.
    objgraph: basics In [1]:import objgraph In [2]: objgraph.show_most_common_types() function 4530 dict 2483 tuple 1428 wrapper_descriptor 1260 weakref 981 list 911 builtin_function_or_method 897 method_descriptor 705 getset_descriptor 531 type 473
  • 45.
    objgraph: basics In [3]:objgraph.show_growth() function 4530 +4530 dict 2412 +2412 tuple 1353 +1353 wrapper_descriptor 1272 +1272 weakref 985 +985 list 904 +904 builtin_function_or_method 897 +897 method_descriptor 706 +706 getset_descriptor 535 +535 type 473 +473 In [4]: objgraph.show_growth() weakref 986 +1 list 905 +1 tuple 1354 +1
  • 46.
    objgraph: graphs >>> x= [] >>> y = [x, [x], {‘x’: x}] >>> objgraph.show_refs([y], filename='sample-graph.png')
  • 47.
  • 48.
    strace: tracing syscalls rpodolyaka@rpodolyaka-pc:~$strace -e network python sa.py . . . socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 5 setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(5, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 connect(5, {sa_family=AF_INET6, sin6_port=htons(5432), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EINPROGRESS (Operation now in progress) getsockopt(5, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 getsockname(5, {sa_family=AF_INET6, sin6_port=htons(36894), inet_pton(AF_INET6, ":: 1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 sendto(5, "00010432226/", 8, MSG_NOSIGNAL, NULL, 0) = 8 recvfrom(5, "S", 16384, 0, NULL, NULL) = 1 . . .
  • 49.
    strace: tracing syscalls root@node-13:~#strace -p 1508 -s 4096 -tt . . . 16:53:29.532770 epoll_wait(7, {}, 1023, 0) = 0 16:53:29.532832 epoll_wait(7, {}, 1023, 0) = 0 16:53:29.532892 epoll_wait(7, {}, 1023, 0) = 0 16:53:29.532953 epoll_wait(7, {}, 1023, 0) = 0 16:53:29.533022 epoll_wait(7, {{EPOLLIN, {u32=9, u64=39432335262744585}}}, 1023, 915) = 1 16:53:29.596409 epoll_ctl(7, EPOLL_CTL_DEL, 9, {EPOLLRDNORM|EPOLLWRBAND|EPOLLMSG| 0x28c45820, {u32=32644, u64=22396489217113988}}) = 0 16:53:29.596494 accept(9, 0x7ffe1ef32b10, [16]) = -1 EAGAIN (Resource temporarily unavailable) 16:53:29.596638 epoll_ctl(7, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=9, u64=39432335262744585}}) = 0 16:53:29.596747 epoll_wait(7, {{EPOLLIN, {u32=9, u64=39432335262744585}}}, 1023, 851) = 1 16:53:29.611852 epoll_ctl(7, EPOLL_CTL_DEL, 9, {EPOLLRDNORM|EPOLLWRBAND|EPOLLMSG| 0x28c45820, {u32=32644, u64=22396489217113988}}) = 0 16:53:29.611937 accept(9, 0x7ffe1ef32b10, [16]) = -1 EAGAIN (Resource temporarily unavailable) . . .
  • 50.
    strace: summary • allowstracing of applications interactions with `outside world` • points possible problems with performance (like excessive system calls, polling of events with too small timeout, etc) • limited to tracing of system calls of one process and its forks • use cautiously on production environments as it greatly affects performance
  • 51.
  • 52.
    gdb: prerequisites • Ubuntu/Debian: •sudo apt-get install gdb python-dbg • CentOS/RHEL/Fedora (separate debuginfo package repository): • sudo yum install gdb python-debuginfo
  • 53.
    gdb: basics • python-dbgis a CPython binary built with ‘--with-debug -g’ options. It’s slow and verbose about memory management • you can debug regular CPython processes in production using the debug symbols shipped separately • gdb has Python bindings to write scripts for it • CPython is shipped with a gdb script allowing to analyse interpreter-level stack frames to get app-level backtraces
  • 54.
    gdb: `hanging` app defallocate_ips(eng, host): while True: with eng.begin() as conn: row = conn.execute( ip_addresses.select() .where(ip_addresses.c.host.is_(None)) ).fetchone() if row is None: break id, address = row.id, row.address updated_rows = conn.execute( ip_addresses.update() .values(host=host) .where(ip_addresses.c.id == id) .where(ip_addresses.c.host.is_(None)) ) if not updated_rows: continue t = threading.Thread(target=allocate_ips, args=(eng, 'host1')) t.start() t.join()
  • 55.
    gdb: `hanging` app rpodolyaka@rpodolyaka-pc:~$strace -p 20267 Process 20267 attached futex(0x7fea50000c10, FUTEX_WAIT_PRIVATE, 0, NULL rpodolyaka@rpodolyaka-pc:~$ gdb /usr/bin/python3.4 -p 20216 (gdb) t a a frame Thread 2 (Thread 0x7f7702c83700 (LWP 20353)): #0 sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:101 101 ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S: No such file or directory. Thread 1 (Thread 0x7f770a03b700 (LWP 20350)): #0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85 85 ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S: No such file or directory.
  • 56.
    gdb: `hanging` app (gdb)t a 2 py-bt Thread 2 (Thread 0x7f7702c83700 (LWP 20353)): Traceback (most recent call first): File "/usr/lib/python3.4/threading.py", line 294, in wait gotit = waiter.acquire(True, timeout) File "/home/rpodolyaka/venv3/lib/python3.4/site-packages/sqlalchemy/util/ queue.py", line 157, in get self.not_empty.wait(remaining) File "/home/rpodolyaka/venv3/lib/python3.4/site-packages/sqlalchemy/pool.py", line 1039, in _do_get return self._pool.get(wait, self._timeout) File "/home/rpodolyaka/venv3/lib/python3.4/site-packages/sqlalchemy/engine/ base.py", line 2037, in contextual_connect self._wrap_pool_connect(self.pool.connect, None), File "/home/rpodolyaka/venv3/lib/python3.4/site-packages/sqlalchemy/engine/ base.py", line 1906, in begin conn = self.contextual_connect(close_with_result=close_with_result) File "sa.py", line 31, in allocate_ips with eng.begin() as conn: File "/usr/lib/python3.4/threading.py", line 868, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner self.run() File "/usr/lib/python3.4/threading.py", line 888, in _bootstrap self._bootstrap_inner()
  • 57.
    gdb: virtualenv pitfalls rpodolyaka@rpodolyaka-pc:~$gdb -p 20656 # WARN: executable not passed! (gdb) py-bt Undefined command: "py-bt". Try "help". (gdb) bt #0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85 #1 0x00000000004cdff5 in PyThread_acquire_lock_timed () #2 0x0000000000522039 in ?? () #3 0x00000000004ee01a in PyEval_EvalFrameEx () #4 0x00000000004ec9fc in PyEval_EvalCodeEx () #5 0x00000000004f25a9 in PyEval_EvalFrameEx () #6 0x00000000004ec9fc in PyEval_EvalCodeEx () #7 0x00000000004f25a9 in PyEval_EvalFrameEx () #8 0x00000000004ec9fc in PyEval_EvalCodeEx () #9 0x0000000000581115 in ?? () #10 0x00000000005ab019 in PyRun_FileExFlags () #11 0x00000000005aa194 in PyRun_SimpleFileExFlags () #12 0x00000000004cb4cb in Py_Main () #13 0x00000000004ca8ef in main ()
  • 58.
    gdb: summary • allowsto debug multithreaded applications • allows to attach to a running process at any given moment of time • can be used for analysing of core dumps (e.g. if we don’t want to stop a process, or if it died unexpectedly) • can be used for debugging of C-extensions, CFFI calls, etc • success depends on how CPython was built and whether you have installed debug symbols or not • used by pyringe to provide pdb-like experience (https:// github.com/google/pyringe)
  • 59.
  • 60.
  • 61.
  • 62.
    lsof: lsof -p$PID nova-api 5910 nova mem REG 252,0 141574 3586 /lib/x86_64-linux- gnu/libpthread-2.19.so nova-api 5910 nova mem REG 252,0 149120 3582 /lib/x86_64-linux- gnu/ld-2.19.so nova-api 5910 nova mem REG 252,0 26258 52555 /usr/lib/x86_64- linux-gnu/gconv/gconv-modules.cache nova-api 5910 nova 0u CHR 1,3 0t0 1029 /dev/null nova-api 5910 nova 1u CHR 136,13 0t0 16 /dev/pts/13 nova-api 5910 nova 2u CHR 136,13 0t0 16 /dev/pts/13 nova-api 5910 nova 3w REG 252,0 34967268 135756 /var/log/nova/ nova-api.log nova-api 5910 nova 4u unix 0xffff880850b92a00 0t0 260406 socket nova-api 5910 nova 5r FIFO 0,8 0t0 260407 pipe nova-api 5910 nova 6w FIFO 0,8 0t0 260407 pipe nova-api 5910 nova 7u IPv4 260408 0t0 TCP node-13.domain.tld:8773 (LISTEN) nova-api 5910 nova 8r CHR 1,9 0t0 1034 /dev/urandom nova-api 5910 nova 9u IPv4 260409 0t0 TCP node-13.domain.tld:8774 (LISTEN) nova-api 5910 nova 10u IPv4 260420 0t0 TCP *:8775 (LISTEN) nova-api 5910 nova 15u 0000 0,9 0 7380 anon_inode
  • 63.
  • 64.
    netstat: netstat -nlap tcp8 0 192.168.0.16:52819 192.168.0.11:5673 ESTABLISHED 5975/python tcp 0 0 192.168.0.16:36901 192.168.0.11:5673 ESTABLISHED 1513/python tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 3042/sshd tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN 13888/mysqld tcp 0 0 0.0.0.0:25 0.0.0.0:* LISTEN 7433/master tcp 0 0 0.0.0.0:3260 0.0.0.0:* LISTEN 19704/tgtd tcp 0 0 192.168.0.16:35357 0.0.0.0:* LISTEN 5546/python
  • 65.
  • 66.
  • 67.
    perf_events: perf trace 254.663( 0.001 ms): sshd/22802 clock_gettime(which_clock: 7, tp: 0x7ffd0e807970 ) = 0 254.666 ( 0.003 ms): sshd/22802 read(fd: 14</dev/ptmx>, buf: 0x7ffd0e8038b0, count: 16384 ) = 4095 254.672 ( 0.243 ms): chrome/11973 epoll_wait(epfd: 16, events: 0x6a6a1b73480, maxevents: 32, timeout: 4294967295) = 1 254.678 ( 0.003 ms): chrome/11973 read(fd: 24<socket:[147806]>, buf: 0x6a6a2d5b018, count: 4096 ) = 32 254.685 ( 0.003 ms): chrome/11973 write(fd: 11<pipe:[147797]>, buf: 0x7f940dfa55e7, count: 1 ) = 1 254.688 ( 0.001 ms): chrome/11973 read(fd: 24<socket:[147806]>, buf: 0x6a6a2d5b018, count: 4096 ) = -1 EAGAIN Resource temporarily unavailable 254.691 ( 0.001 ms): chrome/11973 epoll_wait(epfd: 16, events: 0x6a6a1b73480, maxevents: 32 ) = 0 254.693 ( 0.001 ms): chrome/11973 epoll_wait(epfd: 16, events: 0x6a6a1b73480, maxevents: 32 ) = 0
  • 68.
    perf_events: perf stat Performancecounter stats for 'python sa.py': 125.242831 task-clock (msec) # 0.004 CPUs utilized 945 context-switches # 0.008 M/sec 14 cpu-migrations # 0.112 K/sec 6,996 page-faults # 0.056 M/sec 408,133,256 cycles # 3.259 GHz 213,117,410 stalled-cycles-frontend # 52.22% frontend cycles idle <not supported> stalled-cycles-backend 432,245,331 instructions # 1.06 insns per cycle # 0.49 stalled cycles per insn 91,417,607 branches # 729.923 M/sec 3,937,108 branch-misses # 4.31% of all branches 30.130596204 seconds time elapsed
  • 69.