This is a presentation that Eric Gazoni (CEO Adimian) gave at the first edition of FOSDEMx, at the University of Brussels on May, 3rd 2018.
The intent is to scratch the surface of what it takes to do CPU performance optimisation in Python, and give students a few first tools to get started.
8. CPU
Run code more efficiently
⬗ Reduce processing time (reporting, calculation)
⬗ Reduce response time (web pages)
⬗ Reduce energy consumption (and hosting costs)
8
9. ONLY ONE AT A TIME
⬗ Pick one category
⬗ Hack
⬗ Review
⬗ Rinse, repeat
Optimizing multiple domains at once = unpredictable results
9
11. TARGETS
Define clear targets or get lost in the performance maze
⬗ “This page must load below 200ms”
⬗ “One iteration of this loop must execute below 10ms”
⬗ “This must run on a controller with 8KB memory”
11
12. METRICS
⬗ You know if you improve or make things worse
◇ You can definitely make things worse !
⬗ You know if you reached your targets
12
13. 3 RULES OF OPTIMIZATION
⬗ Benchmark
⬗ Benchmark
⬗ Benchmark
“Gut feeling” vs Reality
13
15. IT’S A JUNGLE OUT THERE
15
User land
⬗ Your program
⬗ Implementation of the interpreter (py2/py3/pypy)
⬗ Implementation of the interpreter language standard lib
(C99/C11/…)
16. IT’S A JUNGLE OUT THERE
16
Operating system
⬗ Implementation of the OS kernel (linux/windows/unix/…)
⬗ Filesystem layout (ext4/NTFS/BTRFS/...)
⬗ Implementation of the hardware drivers (proprietary Nvidia
drivers)
17. IT’S A JUNGLE OUT THERE
17
Hardware
⬗ CPU architecture (x86/ARM/…)
⬗ CPU extensions (SSE/MMX/…)
⬗ Memory / hard drive technology (spinning/flash/…)
⬗ Temperature (GPU/CPU/RAM/…)
⬗ Network card (Optical/Copper)
18. SAFETY NETS
⬗ Version control: rewind, pinpoint exactly what you did
⬗ Code coverage: make sure you didn’t break something
18
23. CAPTURING PROFILE
⬗ Profilers will capture all calls during program execution
⬗ Only capture what you need (reduce noise)
⬗ Stats (or aggregated calls) can be dumped in pstats
binary format
23
24. PROFILING THE WHOLE PROGRAM
⬗ Will capture a lot of noise
⬗ Not invasive (can run out of any Python script)
$ python -m profile -o output.pstats myscript.py
24
25. NOTE ON PROFILERS
25
Running code with a profiler is similar to driving with the
parking brake!
Don’t forget to disable it when you are done!
29. ANALYSIS IF THE PROFILE
1. Dump stats into a file
2. Load the file into gprof2dot
3. Use dot (from graphviz package) to generate png/svg
representation
https://github.com/jrfonseca/gprof2dot
29
33. pytest-profiling
⬗ Useful to run against your unit-tests
⬗ Integrated generation of pstats + svg output
https://github.com/manahl/pytest-plugins/tree/master/pytest-profiling
$ py.test test_cracking.py --profile-svg
33
35. LOW HANGING FRUITS
⬗ Less intrusive
⬗ Low impact on maintenance
⬗ Usually bring the most significant improvements
E.g: reducing number of calls, removing nested loops
35
36. EXAMPLE: PASSWORD BRUTE-FORCING
36
⬗ CPU intensive
⬗ Straightforward
This is very bad cryptography, only for demonstration
purpose.
Don’t do this at home !
37. VOCABULARY
Hash: function that turns a given input in a given output
Brute-force: attempting random inputs in hope to find the one
used initially, by comparing against a known output
Salt: additional factor added to increase the size of the input
37
44. FINDING INVARIANTS
⬗ If A calls B
⬗ And B does not use any input from A’s scope
⬗ Then B does not vary in function of B
B could be called outside of A without affecting its output
B is invariant
44
53. “[...] an embarrassingly
parallel [...] problem [...]
is one where little or no
effort is needed to
separate the problem into
a number of parallel tasks.
Wikipedia
53
54. PARALLEL & SEQUENTIAL PROBLEMS
Parallel: if output from B does not depend on output from A
Sequential: if output from B depends on output from A
54
62. BETTER SPECS
CPU speed depends on:
⬗ Pipeline architecture
⬗ Clock speed
⬗ L2 cache
Non-parallel problems only need faster CPU clocks
62
63. PARALLEL + MORE CPUs = WIN
For parallel problems:
⬗ Add CPUs
⬗ Add more computers with more CPUs
◇ Need to think about networking, queues, failover, …
http://www.celeryproject.org/
63
65. UNDERSTANDING VECTORS
The iterative sum
⬗ Row after row
⬗ Each line can be different
65
The vectorized sum
⬗ Data is typed
⬗ Homogenous dataset
⬗ Optimized operations on rows
and columns
66. NUMPY
⬗ Centered around ndarray
⬗ Homogenous type (if possible)
⬗ Non-sparse arrays (shape = rows * columns)
⬗ Close to C / Fortran API
⬗ Efficient numerical operations
⬗ Good integration with Cython
http://www.numpy.org/
66
67. PANDAS
⬗ Heavily based on NumPy
⬗ Serie, DataFrame, Index
⬗ Batteries included:
◇ Integrations for reading/writing different formats
◇ Date/datetime/timezone handling
⬗ More user-friendly than NumPy
https://pandas.pydata.org/
67
73. WHY NOT JUST WRITE C ?
⬗ Write C code
⬗ Compile C code
⬗ Use CFFI or ctypes to load and call code
⬗ In “C land”
◇ Untangle PyObject yourself
◇ No exception mechanism
73
74. CYTHON
⬗ Precompile Python code in C
⬗ Automatically links and wraps the code so it can be
imported
⬗ Seamless transition between “C” and “Python” contexts
◇ Exceptions
◇ print()
◇ PyObject untangling
74
81. WHAT IS JIT OPTIMIZATION
CPython compiler optimize bytecode on guessed processing
What if the compiler could optimize for actual processing ?
Just In Time optimization monitors how the code is running
and suggest bytecode optimizations on the fly
81
82. PYPY
⬗ Alternative Python implementation
◇ 100% compatible with Python 2.7 & 3.5
◇ not 100% compatible with (some) C libraries
⬗ Automatically rewrites internal logic for performance
⬗ Needs lots of data to make better decisions
http://pypy.org/
82
83. Create 5 million “messages”, count them and check the last one
83
86. JIT PROs & CONs
Pros:
⬗ Works on existing codebase
⬗ Ridiculously fast
⬗ Support for NumPy (not yet for
Pandas)
Cons:
⬗ No support for pandas
⬗ Another interpreter
⬗ Works best with pure-Python
types
⬗ Needs “warm-up”
86
87. YOU CAN’T HAVE IT ALL
Optimization is always a trade-off with maintainability
87
91. Credits
Special thanks to all the people who made and released these
awesome resources for free:
⬗ Presentation template by SlidesCarnival
⬗ Photographs by Unsplash
91