SlideShare a Scribd company logo
FOSDEM 2013, Bruxelles
Victor Stinner
<victor.stinner@gmail.com>
Distributed under CC BY-SA license: http://creativecommons.org/licenses/by-sa/3.0/
Two projects to
optimize Python
CPython bytecode is inefficient
AST optimizer
Register-based bytecode
Agenda
Part I
CPython bytecode
is inefficient
Python is very dynamic, cannot be easily
optimized
CPython peephole optimizer only supports
basic optimizations like replacing 1+1 with 2
CPython bytecode is inefficient
CPython is inefficient
def func():
x = 33
return x
Inefficient bytecode
Given a simple function:
LOAD_CONST 1 (33)
STORE_FAST 0 (x)
LOAD_FAST 0 (x)
RETURN_VALUE
LOAD_CONST 1 (33)
RETURN_VALUE
RETURN_CONST 1 (33)
Inefficient bytecode
I get:
(4 instructions)
I expected:
(2 instructions)
Or even:
(1 instruction)
Parse the source code
Build an Abstract Syntax Tree (AST)
Emit Bytecode
Peephole optimizer
Evaluate bytecode
How Python works
Parse the source code
Build an Abstract Syntax Tree (AST)
→ astoptimizer
Emit Bytecode
Peephole optimizer
Evaluate bytecode
→ registervm
Let's optimize!
Part II
AST optimizer
AST is high-level and contains a lot of
information
Rewrite AST to get faster code
Disable dynamic features of Python to allow
more optimizations
Unpythonic optimizations are disabled by
default
AST optimizer
Call builtin functions and methods:
len("abc") → 3
(32).bit_length() → 6
math.log(32) / math.log(2) → 5.0
Evaluate str % args and print(arg1, arg2, ...)
"x=%s" % 5 → "x=5"
print(2.3) → print("2.3")
AST optimizations (1)
Simplify expressions (2 instructions => 1):
not(x in y) → x not in y
Optimize loops (Python 2 only):
while True: ... → while 1: ...
for x in range(10): ...
→ for x in xrange(10): ...
In Python 2, True requires a (slow) global
lookup, the number 1 is a constant
AST optimizations (2)
Replace list (build at runtime) with tuple
(constant):
for x in [1, 2, 3]: ...
→ for x in (1, 2, 3): ...
Replace list with set (Python 3 only):
if x in [1, 2, 3]: ...
→ if x in {1, 2, 3}: ...
In Python 3, {1,2,3} is converted to a
constant frozenset (if used in a test)
AST optimizations (3)
Evaluate operators:
"abcdef"[:3] → "abc"
def f(): return 2 if 4 < 5 else 3
→ def f(): return 2
Remove dead code:
if 0: ...
→ pass
AST optimizations (4)
"if DEBUG" and "if os.name == 'nt'"
have a cost at runtime
Tests can be removed at compile time:
cfg.add_constant('DEBUG', False)
cfg.add_constant('os.name',
os.name)
Pythonic preprocessor: no need to modify your
code, code works without the preprocessor
Used as a preprocessor
Constant folding: experimental support
(buggy)
Unroll (short) loops
Function inlining (is it possible?)
astoptimizer TODO list
Part III
Register-based
bytecode
Rewrite instructions to use registers instead of
the stack
Use single assignment form (SSA)
Build the control flow graph
Apply different optimizations
Register allocator
Emit bytecode
registervm
def func():
x = 33
return x + 1
LOAD_CONST 1 (33) # stack: [33]
STORE_FAST 0 (x) # stack: []
LOAD_FAST 0 (x) # stack: [33]
LOAD_CONST 2 (1) # stack: [33, 1]
BINARY_ADD # stack: [34]
RETURN_VALUE # stack: []
(6 instructions)
Stack-based bytecode
def func():
x = 33
return x + 1
LOAD_CONST_REG 'x', 33 (const#1)
LOAD_CONST_REG R0, 1 (const#2)
BINARY_ADD_REG R0, 'x', R0
RETURN_VALUE_REG R0
(4 instructions)
Register bytecode
Using registers allows more optimizations
Move constants loads and globals loads (slow)
out of loops:
return [str(item) for item in data]
Constant folding:
x=1; y=x; return y
→ y=1; return y
Remove duplicate load/store instructions:
constants, names, globals, etc.
registervm optim (1)
Stack-based bytecode :
return (len("a"), len("a"))
LOAD_GLOBAL 'len' (name#0)
LOAD_CONST 'a' (const#1)
CALL_FUNCTION (1 positional)
LOAD_GLOBAL 'len' (name#0)
LOAD_CONST 'a' (const#1)
CALL_FUNCTION (1 positional)
BUILD_TUPLE 2
RETURN_VALUE
Merge duplicate loads
Register-based bytecode :
return (len("a"), len("a"))
LOAD_GLOBAL_REG R0, 'len' (name#0)
LOAD_CONST_REG R1, 'a' (const#1)
CALL_FUNCTION_REG R2, R0, 1, R1
CALL_FUNCTION_REG R0, R0, 1, R1
CLEAR_REG R1
BUILD_TUPLE_REG R2, 2, R2, R0
RETURN_VALUE_REG R2
Merge duplicate loads
Remove unreachable instructions (dead code)
Remove useless jumps (relative jump + 0)
registervm optim (2)
BuiltinMethodLookup:
fewer instructions: 390 => 22
24 ms => 1 ms (24x faster)
NormalInstanceAttribute:
fewer instructions: 381 => 81
40 ms => 21 ms (1.9x faster)
StringPredicates:
fewer instructions: 303 => 92
42 ms => 24 ms (1.8x faster)
Pybench results
Pybench is a microbenchmark
Don't expect such speedup on your
applications
registervm is still experimental and emits
invalid code
Pybench results
PyPy and its amazing JIT
Pymothoa, Numba: JIT (LLVM)
WPython: "Wordcode-based" bytecode
Hotpy 2
Shedskin, Pythran, Nuitka: compile to C++
Other projects
Questions?
https://bitbucket.org/haypo/astoptimizer
http://hg.python.org/sandbox/registervm
Distributed under CC BY-SA license: http://creativecommons.org/licenses/by-sa/3.0/
Contact:
victor.stinner@gmail.com
Thanks to David Malcom
for the LibreOffice template
http://dmalcolm.livejournal.com/

More Related Content

What's hot

Bytes in the Machine: Inside the CPython interpreter
Bytes in the Machine: Inside the CPython interpreterBytes in the Machine: Inside the CPython interpreter
Bytes in the Machine: Inside the CPython interpreter
akaptur
 
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!..."A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...akaptur
 
Exploring slides
Exploring slidesExploring slides
Exploring slides
akaptur
 
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
it-people
 
When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)
Sylvain Hallé
 
.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming
Alex Moore
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
PyData
 
BeepBeep 3: A declarative event stream query engine (EDOC 2015)
BeepBeep 3: A declarative event stream query engine (EDOC 2015)BeepBeep 3: A declarative event stream query engine (EDOC 2015)
BeepBeep 3: A declarative event stream query engine (EDOC 2015)
Sylvain Hallé
 
Activity Recognition Through Complex Event Processing: First Findings
Activity Recognition Through Complex Event Processing: First Findings Activity Recognition Through Complex Event Processing: First Findings
Activity Recognition Through Complex Event Processing: First Findings
Sylvain Hallé
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005
Jules Krdenas
 
D vs OWKN Language at LLnagoya
D vs OWKN Language at LLnagoyaD vs OWKN Language at LLnagoya
D vs OWKN Language at LLnagoya
N Masahiro
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2goMoriyoshi Koizumi
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Hacker Thursdays: An introduction to binary exploitation
Hacker Thursdays: An introduction to binary exploitationHacker Thursdays: An introduction to binary exploitation
Hacker Thursdays: An introduction to binary exploitation
OWASP Hacker Thursday
 
Are we ready to Go?
Are we ready to Go?Are we ready to Go?
Are we ready to Go?
Adam Dudczak
 
GeoGebra JavaScript CheatSheet
GeoGebra JavaScript CheatSheetGeoGebra JavaScript CheatSheet
GeoGebra JavaScript CheatSheet
Jose Perez
 
Data Structures - Lecture 6 [queues]
Data Structures - Lecture 6 [queues]Data Structures - Lecture 6 [queues]
Data Structures - Lecture 6 [queues]
Muhammad Hammad Waseem
 
bpftrace - Tracing Summit 2018
bpftrace - Tracing Summit 2018bpftrace - Tracing Summit 2018
bpftrace - Tracing Summit 2018
AlastairRobertson9
 
Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & how
lucenerevolution
 

What's hot (20)

Bytes in the Machine: Inside the CPython interpreter
Bytes in the Machine: Inside the CPython interpreterBytes in the Machine: Inside the CPython interpreter
Bytes in the Machine: Inside the CPython interpreter
 
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!..."A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...
 
Exploring slides
Exploring slidesExploring slides
Exploring slides
 
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
 
When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)
 
.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
 
BeepBeep 3: A declarative event stream query engine (EDOC 2015)
BeepBeep 3: A declarative event stream query engine (EDOC 2015)BeepBeep 3: A declarative event stream query engine (EDOC 2015)
BeepBeep 3: A declarative event stream query engine (EDOC 2015)
 
Activity Recognition Through Complex Event Processing: First Findings
Activity Recognition Through Complex Event Processing: First Findings Activity Recognition Through Complex Event Processing: First Findings
Activity Recognition Through Complex Event Processing: First Findings
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005
 
D vs OWKN Language at LLnagoya
D vs OWKN Language at LLnagoyaD vs OWKN Language at LLnagoya
D vs OWKN Language at LLnagoya
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Hacker Thursdays: An introduction to binary exploitation
Hacker Thursdays: An introduction to binary exploitationHacker Thursdays: An introduction to binary exploitation
Hacker Thursdays: An introduction to binary exploitation
 
Are we ready to Go?
Are we ready to Go?Are we ready to Go?
Are we ready to Go?
 
GeoGebra JavaScript CheatSheet
GeoGebra JavaScript CheatSheetGeoGebra JavaScript CheatSheet
GeoGebra JavaScript CheatSheet
 
Data Structures - Lecture 6 [queues]
Data Structures - Lecture 6 [queues]Data Structures - Lecture 6 [queues]
Data Structures - Lecture 6 [queues]
 
bpftrace - Tracing Summit 2018
bpftrace - Tracing Summit 2018bpftrace - Tracing Summit 2018
bpftrace - Tracing Summit 2018
 
Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & how
 
Queue oop
Queue   oopQueue   oop
Queue oop
 

Viewers also liked

Python on Rails 2014
Python on Rails 2014Python on Rails 2014
Python on Rails 2014
Albert O'Connor
 
Dive into Python Class
Dive into Python ClassDive into Python Class
Dive into Python ClassJim Yeh
 
Python class
Python classPython class
Python class
건희 김
 
The future of async i/o in Python
The future of async i/o in PythonThe future of async i/o in Python
The future of async i/o in Python
Saúl Ibarra Corretgé
 
A deep dive into PEP-3156 and the new asyncio module
A deep dive into PEP-3156 and the new asyncio moduleA deep dive into PEP-3156 and the new asyncio module
A deep dive into PEP-3156 and the new asyncio module
Saúl Ibarra Corretgé
 
Python, do you even async?
Python, do you even async?Python, do you even async?
Python, do you even async?
Saúl Ibarra Corretgé
 
Comandos para ubuntu 400 que debes conocer
Comandos para ubuntu 400 que debes conocerComandos para ubuntu 400 que debes conocer
Comandos para ubuntu 400 que debes conocer
Geek Advisor Freddy
 
Python master class 3
Python master class 3Python master class 3
Python master class 3
Chathuranga Bandara
 
Python Async IO Horizon
Python Async IO HorizonPython Async IO Horizon
Python Async IO Horizon
Lukasz Dobrzanski
 
Practical continuous quality gates for development process
Practical continuous quality gates for development processPractical continuous quality gates for development process
Practical continuous quality gates for development process
Andrii Soldatenko
 
The Awesome Python Class Part-4
The Awesome Python Class Part-4The Awesome Python Class Part-4
The Awesome Python Class Part-4
Binay Kumar Ray
 
Async Tasks with Django Channels
Async Tasks with Django ChannelsAsync Tasks with Django Channels
Async Tasks with Django Channels
Albert O'Connor
 
Async programming and python
Async programming and pythonAsync programming and python
Async programming and python
Chetan Giridhar
 
Regexp
RegexpRegexp
Regexp
Ynon Perek
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
Andrii Soldatenko
 
Python as number crunching code glue
Python as number crunching code gluePython as number crunching code glue
Python as number crunching code glue
Jiahao Chen
 
Building social network with Neo4j and Python
Building social network with Neo4j and PythonBuilding social network with Neo4j and Python
Building social network with Neo4j and Python
Andrii Soldatenko
 
Async Web Frameworks in Python
Async Web Frameworks in PythonAsync Web Frameworks in Python
Async Web Frameworks in Python
Ryan Johnson
 
SylkServer: State of the art RTC application server
SylkServer: State of the art RTC application serverSylkServer: State of the art RTC application server
SylkServer: State of the art RTC application server
Saúl Ibarra Corretgé
 
Escalabilidad horizontal desde las trincheras
Escalabilidad horizontal desde las trincherasEscalabilidad horizontal desde las trincheras
Escalabilidad horizontal desde las trincheras
Saúl Ibarra Corretgé
 

Viewers also liked (20)

Python on Rails 2014
Python on Rails 2014Python on Rails 2014
Python on Rails 2014
 
Dive into Python Class
Dive into Python ClassDive into Python Class
Dive into Python Class
 
Python class
Python classPython class
Python class
 
The future of async i/o in Python
The future of async i/o in PythonThe future of async i/o in Python
The future of async i/o in Python
 
A deep dive into PEP-3156 and the new asyncio module
A deep dive into PEP-3156 and the new asyncio moduleA deep dive into PEP-3156 and the new asyncio module
A deep dive into PEP-3156 and the new asyncio module
 
Python, do you even async?
Python, do you even async?Python, do you even async?
Python, do you even async?
 
Comandos para ubuntu 400 que debes conocer
Comandos para ubuntu 400 que debes conocerComandos para ubuntu 400 que debes conocer
Comandos para ubuntu 400 que debes conocer
 
Python master class 3
Python master class 3Python master class 3
Python master class 3
 
Python Async IO Horizon
Python Async IO HorizonPython Async IO Horizon
Python Async IO Horizon
 
Practical continuous quality gates for development process
Practical continuous quality gates for development processPractical continuous quality gates for development process
Practical continuous quality gates for development process
 
The Awesome Python Class Part-4
The Awesome Python Class Part-4The Awesome Python Class Part-4
The Awesome Python Class Part-4
 
Async Tasks with Django Channels
Async Tasks with Django ChannelsAsync Tasks with Django Channels
Async Tasks with Django Channels
 
Async programming and python
Async programming and pythonAsync programming and python
Async programming and python
 
Regexp
RegexpRegexp
Regexp
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
Python as number crunching code glue
Python as number crunching code gluePython as number crunching code glue
Python as number crunching code glue
 
Building social network with Neo4j and Python
Building social network with Neo4j and PythonBuilding social network with Neo4j and Python
Building social network with Neo4j and Python
 
Async Web Frameworks in Python
Async Web Frameworks in PythonAsync Web Frameworks in Python
Async Web Frameworks in Python
 
SylkServer: State of the art RTC application server
SylkServer: State of the art RTC application serverSylkServer: State of the art RTC application server
SylkServer: State of the art RTC application server
 
Escalabilidad horizontal desde las trincheras
Escalabilidad horizontal desde las trincherasEscalabilidad horizontal desde las trincheras
Escalabilidad horizontal desde las trincheras
 

Similar to Faster Python, FOSDEM

What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
Henry Schreiner
 
The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184
Mahmoud Samir Fayed
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.ppt
ssuserd64918
 
User defined functions
User defined functionsUser defined functions
User defined functionsshubham_jangid
 
Porting to Python 3
Porting to Python 3Porting to Python 3
Porting to Python 3
Lennart Regebro
 
Building Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual MachinesBuilding Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual Machines
Guido Chari
 
The Ring programming language version 1.8 book - Part 86 of 202
The Ring programming language version 1.8 book - Part 86 of 202The Ring programming language version 1.8 book - Part 86 of 202
The Ring programming language version 1.8 book - Part 86 of 202
Mahmoud Samir Fayed
 
Advanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptAdvanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter ppt
Muhammad Sikandar Mustafa
 
Java VS Python
Java VS PythonJava VS Python
Java VS Python
Simone Federici
 
Chapter Eight(3)
Chapter Eight(3)Chapter Eight(3)
Chapter Eight(3)bolovv
 
Introducción a Elixir
Introducción a ElixirIntroducción a Elixir
Introducción a Elixir
Svet Ivantchev
 
python.ppt
python.pptpython.ppt
python.ppt
ramamoorthi24
 
20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugsComputer Science Club
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
Vyacheslav Arbuzov
 
The Ring programming language version 1.7 book - Part 30 of 196
The Ring programming language version 1.7 book - Part 30 of 196The Ring programming language version 1.7 book - Part 30 of 196
The Ring programming language version 1.7 book - Part 30 of 196
Mahmoud Samir Fayed
 
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
South Tyrol Free Software Conference
 
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)changehee lee
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
Wojciech Pituła
 

Similar to Faster Python, FOSDEM (20)

What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
 
The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.ppt
 
User defined functions
User defined functionsUser defined functions
User defined functions
 
Porting to Python 3
Porting to Python 3Porting to Python 3
Porting to Python 3
 
Building Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual MachinesBuilding Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual Machines
 
The Ring programming language version 1.8 book - Part 86 of 202
The Ring programming language version 1.8 book - Part 86 of 202The Ring programming language version 1.8 book - Part 86 of 202
The Ring programming language version 1.8 book - Part 86 of 202
 
Intro
IntroIntro
Intro
 
Advanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptAdvanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter ppt
 
Java VS Python
Java VS PythonJava VS Python
Java VS Python
 
Chapter Eight(3)
Chapter Eight(3)Chapter Eight(3)
Chapter Eight(3)
 
Introducción a Elixir
Introducción a ElixirIntroducción a Elixir
Introducción a Elixir
 
python.ppt
python.pptpython.ppt
python.ppt
 
20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
The Ring programming language version 1.7 book - Part 30 of 196
The Ring programming language version 1.7 book - Part 30 of 196The Ring programming language version 1.7 book - Part 30 of 196
The Ring programming language version 1.7 book - Part 30 of 196
 
scripting in Python
scripting in Pythonscripting in Python
scripting in Python
 
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
 
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
 

Faster Python, FOSDEM

  • 1. FOSDEM 2013, Bruxelles Victor Stinner <victor.stinner@gmail.com> Distributed under CC BY-SA license: http://creativecommons.org/licenses/by-sa/3.0/ Two projects to optimize Python
  • 2. CPython bytecode is inefficient AST optimizer Register-based bytecode Agenda
  • 4. Python is very dynamic, cannot be easily optimized CPython peephole optimizer only supports basic optimizations like replacing 1+1 with 2 CPython bytecode is inefficient CPython is inefficient
  • 5. def func(): x = 33 return x Inefficient bytecode Given a simple function:
  • 6. LOAD_CONST 1 (33) STORE_FAST 0 (x) LOAD_FAST 0 (x) RETURN_VALUE LOAD_CONST 1 (33) RETURN_VALUE RETURN_CONST 1 (33) Inefficient bytecode I get: (4 instructions) I expected: (2 instructions) Or even: (1 instruction)
  • 7. Parse the source code Build an Abstract Syntax Tree (AST) Emit Bytecode Peephole optimizer Evaluate bytecode How Python works
  • 8. Parse the source code Build an Abstract Syntax Tree (AST) → astoptimizer Emit Bytecode Peephole optimizer Evaluate bytecode → registervm Let's optimize!
  • 10. AST is high-level and contains a lot of information Rewrite AST to get faster code Disable dynamic features of Python to allow more optimizations Unpythonic optimizations are disabled by default AST optimizer
  • 11. Call builtin functions and methods: len("abc") → 3 (32).bit_length() → 6 math.log(32) / math.log(2) → 5.0 Evaluate str % args and print(arg1, arg2, ...) "x=%s" % 5 → "x=5" print(2.3) → print("2.3") AST optimizations (1)
  • 12. Simplify expressions (2 instructions => 1): not(x in y) → x not in y Optimize loops (Python 2 only): while True: ... → while 1: ... for x in range(10): ... → for x in xrange(10): ... In Python 2, True requires a (slow) global lookup, the number 1 is a constant AST optimizations (2)
  • 13. Replace list (build at runtime) with tuple (constant): for x in [1, 2, 3]: ... → for x in (1, 2, 3): ... Replace list with set (Python 3 only): if x in [1, 2, 3]: ... → if x in {1, 2, 3}: ... In Python 3, {1,2,3} is converted to a constant frozenset (if used in a test) AST optimizations (3)
  • 14. Evaluate operators: "abcdef"[:3] → "abc" def f(): return 2 if 4 < 5 else 3 → def f(): return 2 Remove dead code: if 0: ... → pass AST optimizations (4)
  • 15. "if DEBUG" and "if os.name == 'nt'" have a cost at runtime Tests can be removed at compile time: cfg.add_constant('DEBUG', False) cfg.add_constant('os.name', os.name) Pythonic preprocessor: no need to modify your code, code works without the preprocessor Used as a preprocessor
  • 16. Constant folding: experimental support (buggy) Unroll (short) loops Function inlining (is it possible?) astoptimizer TODO list
  • 18. Rewrite instructions to use registers instead of the stack Use single assignment form (SSA) Build the control flow graph Apply different optimizations Register allocator Emit bytecode registervm
  • 19. def func(): x = 33 return x + 1 LOAD_CONST 1 (33) # stack: [33] STORE_FAST 0 (x) # stack: [] LOAD_FAST 0 (x) # stack: [33] LOAD_CONST 2 (1) # stack: [33, 1] BINARY_ADD # stack: [34] RETURN_VALUE # stack: [] (6 instructions) Stack-based bytecode
  • 20. def func(): x = 33 return x + 1 LOAD_CONST_REG 'x', 33 (const#1) LOAD_CONST_REG R0, 1 (const#2) BINARY_ADD_REG R0, 'x', R0 RETURN_VALUE_REG R0 (4 instructions) Register bytecode
  • 21. Using registers allows more optimizations Move constants loads and globals loads (slow) out of loops: return [str(item) for item in data] Constant folding: x=1; y=x; return y → y=1; return y Remove duplicate load/store instructions: constants, names, globals, etc. registervm optim (1)
  • 22. Stack-based bytecode : return (len("a"), len("a")) LOAD_GLOBAL 'len' (name#0) LOAD_CONST 'a' (const#1) CALL_FUNCTION (1 positional) LOAD_GLOBAL 'len' (name#0) LOAD_CONST 'a' (const#1) CALL_FUNCTION (1 positional) BUILD_TUPLE 2 RETURN_VALUE Merge duplicate loads
  • 23. Register-based bytecode : return (len("a"), len("a")) LOAD_GLOBAL_REG R0, 'len' (name#0) LOAD_CONST_REG R1, 'a' (const#1) CALL_FUNCTION_REG R2, R0, 1, R1 CALL_FUNCTION_REG R0, R0, 1, R1 CLEAR_REG R1 BUILD_TUPLE_REG R2, 2, R2, R0 RETURN_VALUE_REG R2 Merge duplicate loads
  • 24. Remove unreachable instructions (dead code) Remove useless jumps (relative jump + 0) registervm optim (2)
  • 25. BuiltinMethodLookup: fewer instructions: 390 => 22 24 ms => 1 ms (24x faster) NormalInstanceAttribute: fewer instructions: 381 => 81 40 ms => 21 ms (1.9x faster) StringPredicates: fewer instructions: 303 => 92 42 ms => 24 ms (1.8x faster) Pybench results
  • 26. Pybench is a microbenchmark Don't expect such speedup on your applications registervm is still experimental and emits invalid code Pybench results
  • 27. PyPy and its amazing JIT Pymothoa, Numba: JIT (LLVM) WPython: "Wordcode-based" bytecode Hotpy 2 Shedskin, Pythran, Nuitka: compile to C++ Other projects
  • 28. Questions? https://bitbucket.org/haypo/astoptimizer http://hg.python.org/sandbox/registervm Distributed under CC BY-SA license: http://creativecommons.org/licenses/by-sa/3.0/ Contact: victor.stinner@gmail.com
  • 29. Thanks to David Malcom for the LibreOffice template http://dmalcolm.livejournal.com/