SlideShare a Scribd company logo
1 of 29
Download to read offline
FOSDEM 2013, Bruxelles
Victor Stinner
<victor.stinner@gmail.com>
Distributed under CC BY-SA license: http://creativecommons.org/licenses/by-sa/3.0/
Two projects to
optimize Python
CPython bytecode is inefficient
AST optimizer
Register-based bytecode
Agenda
Part I
CPython bytecode
is inefficient
Python is very dynamic, cannot be easily
optimized
CPython peephole optimizer only supports
basic optimizations like replacing 1+1 with 2
CPython bytecode is inefficient
CPython is inefficient
def func():
x = 33
return x
Inefficient bytecode
Given a simple function:
LOAD_CONST 1 (33)
STORE_FAST 0 (x)
LOAD_FAST 0 (x)
RETURN_VALUE
LOAD_CONST 1 (33)
RETURN_VALUE
RETURN_CONST 1 (33)
Inefficient bytecode
I get:
(4 instructions)
I expected:
(2 instructions)
Or even:
(1 instruction)
Parse the source code
Build an Abstract Syntax Tree (AST)
Emit Bytecode
Peephole optimizer
Evaluate bytecode
How Python works
Parse the source code
Build an Abstract Syntax Tree (AST)
→ astoptimizer
Emit Bytecode
Peephole optimizer
Evaluate bytecode
→ registervm
Let's optimize!
Part II
AST optimizer
AST is high-level and contains a lot of
information
Rewrite AST to get faster code
Disable dynamic features of Python to allow
more optimizations
Unpythonic optimizations are disabled by
default
AST optimizer
Call builtin functions and methods:
len("abc") → 3
(32).bit_length() → 6
math.log(32) / math.log(2) → 5.0
Evaluate str % args and print(arg1, arg2, ...)
"x=%s" % 5 → "x=5"
print(2.3) → print("2.3")
AST optimizations (1)
Simplify expressions (2 instructions => 1):
not(x in y) → x not in y
Optimize loops (Python 2 only):
while True: ... → while 1: ...
for x in range(10): ...
→ for x in xrange(10): ...
In Python 2, True requires a (slow) global
lookup, the number 1 is a constant
AST optimizations (2)
Replace list (build at runtime) with tuple
(constant):
for x in [1, 2, 3]: ...
→ for x in (1, 2, 3): ...
Replace list with set (Python 3 only):
if x in [1, 2, 3]: ...
→ if x in {1, 2, 3}: ...
In Python 3, {1,2,3} is converted to a
constant frozenset (if used in a test)
AST optimizations (3)
Evaluate operators:
"abcdef"[:3] → "abc"
def f(): return 2 if 4 < 5 else 3
→ def f(): return 2
Remove dead code:
if 0: ...
→ pass
AST optimizations (4)
"if DEBUG" and "if os.name == 'nt'"
have a cost at runtime
Tests can be removed at compile time:
cfg.add_constant('DEBUG', False)
cfg.add_constant('os.name',
os.name)
Pythonic preprocessor: no need to modify your
code, code works without the preprocessor
Used as a preprocessor
Constant folding: experimental support
(buggy)
Unroll (short) loops
Function inlining (is it possible?)
astoptimizer TODO list
Part III
Register-based
bytecode
Rewrite instructions to use registers instead of
the stack
Use single assignment form (SSA)
Build the control flow graph
Apply different optimizations
Register allocator
Emit bytecode
registervm
def func():
x = 33
return x + 1
LOAD_CONST 1 (33) # stack: [33]
STORE_FAST 0 (x) # stack: []
LOAD_FAST 0 (x) # stack: [33]
LOAD_CONST 2 (1) # stack: [33, 1]
BINARY_ADD # stack: [34]
RETURN_VALUE # stack: []
(6 instructions)
Stack-based bytecode
def func():
x = 33
return x + 1
LOAD_CONST_REG 'x', 33 (const#1)
LOAD_CONST_REG R0, 1 (const#2)
BINARY_ADD_REG R0, 'x', R0
RETURN_VALUE_REG R0
(4 instructions)
Register bytecode
Using registers allows more optimizations
Move constants loads and globals loads (slow)
out of loops:
return [str(item) for item in data]
Constant folding:
x=1; y=x; return y
→ y=1; return y
Remove duplicate load/store instructions:
constants, names, globals, etc.
registervm optim (1)
Stack-based bytecode :
return (len("a"), len("a"))
LOAD_GLOBAL 'len' (name#0)
LOAD_CONST 'a' (const#1)
CALL_FUNCTION (1 positional)
LOAD_GLOBAL 'len' (name#0)
LOAD_CONST 'a' (const#1)
CALL_FUNCTION (1 positional)
BUILD_TUPLE 2
RETURN_VALUE
Merge duplicate loads
Register-based bytecode :
return (len("a"), len("a"))
LOAD_GLOBAL_REG R0, 'len' (name#0)
LOAD_CONST_REG R1, 'a' (const#1)
CALL_FUNCTION_REG R2, R0, 1, R1
CALL_FUNCTION_REG R0, R0, 1, R1
CLEAR_REG R1
BUILD_TUPLE_REG R2, 2, R2, R0
RETURN_VALUE_REG R2
Merge duplicate loads
Remove unreachable instructions (dead code)
Remove useless jumps (relative jump + 0)
registervm optim (2)
BuiltinMethodLookup:
fewer instructions: 390 => 22
24 ms => 1 ms (24x faster)
NormalInstanceAttribute:
fewer instructions: 381 => 81
40 ms => 21 ms (1.9x faster)
StringPredicates:
fewer instructions: 303 => 92
42 ms => 24 ms (1.8x faster)
Pybench results
Pybench is a microbenchmark
Don't expect such speedup on your
applications
registervm is still experimental and emits
invalid code
Pybench results
PyPy and its amazing JIT
Pymothoa, Numba: JIT (LLVM)
WPython: "Wordcode-based" bytecode
Hotpy 2
Shedskin, Pythran, Nuitka: compile to C++
Other projects
Questions?
https://bitbucket.org/haypo/astoptimizer
http://hg.python.org/sandbox/registervm
Distributed under CC BY-SA license: http://creativecommons.org/licenses/by-sa/3.0/
Contact:
victor.stinner@gmail.com
Thanks to David Malcom
for the LibreOffice template
http://dmalcolm.livejournal.com/

More Related Content

What's hot

Bytes in the Machine: Inside the CPython interpreter
Bytes in the Machine: Inside the CPython interpreterBytes in the Machine: Inside the CPython interpreter
Bytes in the Machine: Inside the CPython interpreterakaptur
 
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!..."A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...akaptur
 
Exploring slides
Exploring slidesExploring slides
Exploring slidesakaptur
 
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrainsit-people
 
When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)Sylvain Hallé
 
.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel ProgrammingAlex Moore
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014PyData
 
BeepBeep 3: A declarative event stream query engine (EDOC 2015)
BeepBeep 3: A declarative event stream query engine (EDOC 2015)BeepBeep 3: A declarative event stream query engine (EDOC 2015)
BeepBeep 3: A declarative event stream query engine (EDOC 2015)Sylvain Hallé
 
Activity Recognition Through Complex Event Processing: First Findings
Activity Recognition Through Complex Event Processing: First Findings Activity Recognition Through Complex Event Processing: First Findings
Activity Recognition Through Complex Event Processing: First Findings Sylvain Hallé
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005Jules Krdenas
 
D vs OWKN Language at LLnagoya
D vs OWKN Language at LLnagoyaD vs OWKN Language at LLnagoya
D vs OWKN Language at LLnagoyaN Masahiro
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2goMoriyoshi Koizumi
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Hacker Thursdays: An introduction to binary exploitation
Hacker Thursdays: An introduction to binary exploitationHacker Thursdays: An introduction to binary exploitation
Hacker Thursdays: An introduction to binary exploitationOWASP Hacker Thursday
 
Are we ready to Go?
Are we ready to Go?Are we ready to Go?
Are we ready to Go?Adam Dudczak
 
GeoGebra JavaScript CheatSheet
GeoGebra JavaScript CheatSheetGeoGebra JavaScript CheatSheet
GeoGebra JavaScript CheatSheetJose Perez
 
bpftrace - Tracing Summit 2018
bpftrace - Tracing Summit 2018bpftrace - Tracing Summit 2018
bpftrace - Tracing Summit 2018AlastairRobertson9
 
Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & howlucenerevolution
 

What's hot (20)

Bytes in the Machine: Inside the CPython interpreter
Bytes in the Machine: Inside the CPython interpreterBytes in the Machine: Inside the CPython interpreter
Bytes in the Machine: Inside the CPython interpreter
 
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!..."A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...
"A 1,500 line (!!) switch statement powers your Python!" - Allison Kaptur, !!...
 
Exploring slides
Exploring slidesExploring slides
Exploring slides
 
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
«Отладка в Python 3.6: Быстрее, Выше, Сильнее» Елизавета Шашкова, JetBrains
 
When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)
 
.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
 
BeepBeep 3: A declarative event stream query engine (EDOC 2015)
BeepBeep 3: A declarative event stream query engine (EDOC 2015)BeepBeep 3: A declarative event stream query engine (EDOC 2015)
BeepBeep 3: A declarative event stream query engine (EDOC 2015)
 
Activity Recognition Through Complex Event Processing: First Findings
Activity Recognition Through Complex Event Processing: First Findings Activity Recognition Through Complex Event Processing: First Findings
Activity Recognition Through Complex Event Processing: First Findings
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005
 
D vs OWKN Language at LLnagoya
D vs OWKN Language at LLnagoyaD vs OWKN Language at LLnagoya
D vs OWKN Language at LLnagoya
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Hacker Thursdays: An introduction to binary exploitation
Hacker Thursdays: An introduction to binary exploitationHacker Thursdays: An introduction to binary exploitation
Hacker Thursdays: An introduction to binary exploitation
 
Are we ready to Go?
Are we ready to Go?Are we ready to Go?
Are we ready to Go?
 
GeoGebra JavaScript CheatSheet
GeoGebra JavaScript CheatSheetGeoGebra JavaScript CheatSheet
GeoGebra JavaScript CheatSheet
 
Data Structures - Lecture 6 [queues]
Data Structures - Lecture 6 [queues]Data Structures - Lecture 6 [queues]
Data Structures - Lecture 6 [queues]
 
bpftrace - Tracing Summit 2018
bpftrace - Tracing Summit 2018bpftrace - Tracing Summit 2018
bpftrace - Tracing Summit 2018
 
Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & how
 
Queue oop
Queue   oopQueue   oop
Queue oop
 

Viewers also liked

Dive into Python Class
Dive into Python ClassDive into Python Class
Dive into Python ClassJim Yeh
 
A deep dive into PEP-3156 and the new asyncio module
A deep dive into PEP-3156 and the new asyncio moduleA deep dive into PEP-3156 and the new asyncio module
A deep dive into PEP-3156 and the new asyncio moduleSaúl Ibarra Corretgé
 
Comandos para ubuntu 400 que debes conocer
Comandos para ubuntu 400 que debes conocerComandos para ubuntu 400 que debes conocer
Comandos para ubuntu 400 que debes conocerGeek Advisor Freddy
 
Practical continuous quality gates for development process
Practical continuous quality gates for development processPractical continuous quality gates for development process
Practical continuous quality gates for development processAndrii Soldatenko
 
The Awesome Python Class Part-4
The Awesome Python Class Part-4The Awesome Python Class Part-4
The Awesome Python Class Part-4Binay Kumar Ray
 
Async Tasks with Django Channels
Async Tasks with Django ChannelsAsync Tasks with Django Channels
Async Tasks with Django ChannelsAlbert O'Connor
 
Async programming and python
Async programming and pythonAsync programming and python
Async programming and pythonChetan Giridhar
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?Andrii Soldatenko
 
Python as number crunching code glue
Python as number crunching code gluePython as number crunching code glue
Python as number crunching code glueJiahao Chen
 
Building social network with Neo4j and Python
Building social network with Neo4j and PythonBuilding social network with Neo4j and Python
Building social network with Neo4j and PythonAndrii Soldatenko
 
Async Web Frameworks in Python
Async Web Frameworks in PythonAsync Web Frameworks in Python
Async Web Frameworks in PythonRyan Johnson
 
SylkServer: State of the art RTC application server
SylkServer: State of the art RTC application serverSylkServer: State of the art RTC application server
SylkServer: State of the art RTC application serverSaúl Ibarra Corretgé
 
Escalabilidad horizontal desde las trincheras
Escalabilidad horizontal desde las trincherasEscalabilidad horizontal desde las trincheras
Escalabilidad horizontal desde las trincherasSaúl Ibarra Corretgé
 

Viewers also liked (20)

Python on Rails 2014
Python on Rails 2014Python on Rails 2014
Python on Rails 2014
 
Dive into Python Class
Dive into Python ClassDive into Python Class
Dive into Python Class
 
Python class
Python classPython class
Python class
 
The future of async i/o in Python
The future of async i/o in PythonThe future of async i/o in Python
The future of async i/o in Python
 
A deep dive into PEP-3156 and the new asyncio module
A deep dive into PEP-3156 and the new asyncio moduleA deep dive into PEP-3156 and the new asyncio module
A deep dive into PEP-3156 and the new asyncio module
 
Python, do you even async?
Python, do you even async?Python, do you even async?
Python, do you even async?
 
Comandos para ubuntu 400 que debes conocer
Comandos para ubuntu 400 que debes conocerComandos para ubuntu 400 que debes conocer
Comandos para ubuntu 400 que debes conocer
 
Python master class 3
Python master class 3Python master class 3
Python master class 3
 
Python Async IO Horizon
Python Async IO HorizonPython Async IO Horizon
Python Async IO Horizon
 
Practical continuous quality gates for development process
Practical continuous quality gates for development processPractical continuous quality gates for development process
Practical continuous quality gates for development process
 
The Awesome Python Class Part-4
The Awesome Python Class Part-4The Awesome Python Class Part-4
The Awesome Python Class Part-4
 
Async Tasks with Django Channels
Async Tasks with Django ChannelsAsync Tasks with Django Channels
Async Tasks with Django Channels
 
Async programming and python
Async programming and pythonAsync programming and python
Async programming and python
 
Regexp
RegexpRegexp
Regexp
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
Python as number crunching code glue
Python as number crunching code gluePython as number crunching code glue
Python as number crunching code glue
 
Building social network with Neo4j and Python
Building social network with Neo4j and PythonBuilding social network with Neo4j and Python
Building social network with Neo4j and Python
 
Async Web Frameworks in Python
Async Web Frameworks in PythonAsync Web Frameworks in Python
Async Web Frameworks in Python
 
SylkServer: State of the art RTC application server
SylkServer: State of the art RTC application serverSylkServer: State of the art RTC application server
SylkServer: State of the art RTC application server
 
Escalabilidad horizontal desde las trincheras
Escalabilidad horizontal desde las trincherasEscalabilidad horizontal desde las trincheras
Escalabilidad horizontal desde las trincheras
 

Similar to Faster Python, FOSDEM

What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11Henry Schreiner
 
The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184Mahmoud Samir Fayed
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.pptssuserd64918
 
User defined functions
User defined functionsUser defined functions
User defined functionsshubham_jangid
 
Building Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual MachinesBuilding Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual MachinesGuido Chari
 
The Ring programming language version 1.8 book - Part 86 of 202
The Ring programming language version 1.8 book - Part 86 of 202The Ring programming language version 1.8 book - Part 86 of 202
The Ring programming language version 1.8 book - Part 86 of 202Mahmoud Samir Fayed
 
Advanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptAdvanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptMuhammad Sikandar Mustafa
 
Chapter Eight(3)
Chapter Eight(3)Chapter Eight(3)
Chapter Eight(3)bolovv
 
Introducción a Elixir
Introducción a ElixirIntroducción a Elixir
Introducción a ElixirSvet Ivantchev
 
20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugsComputer Science Club
 
The Ring programming language version 1.7 book - Part 30 of 196
The Ring programming language version 1.7 book - Part 30 of 196The Ring programming language version 1.7 book - Part 30 of 196
The Ring programming language version 1.7 book - Part 30 of 196Mahmoud Samir Fayed
 
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...South Tyrol Free Software Conference
 
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)changehee lee
 

Similar to Faster Python, FOSDEM (20)

What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
 
The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184The Ring programming language version 1.5.3 book - Part 25 of 184
The Ring programming language version 1.5.3 book - Part 25 of 184
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.ppt
 
User defined functions
User defined functionsUser defined functions
User defined functions
 
Porting to Python 3
Porting to Python 3Porting to Python 3
Porting to Python 3
 
Building Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual MachinesBuilding Efficient and Highly Run-Time Adaptable Virtual Machines
Building Efficient and Highly Run-Time Adaptable Virtual Machines
 
The Ring programming language version 1.8 book - Part 86 of 202
The Ring programming language version 1.8 book - Part 86 of 202The Ring programming language version 1.8 book - Part 86 of 202
The Ring programming language version 1.8 book - Part 86 of 202
 
Intro
IntroIntro
Intro
 
Advanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptAdvanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter ppt
 
Java VS Python
Java VS PythonJava VS Python
Java VS Python
 
Chapter Eight(3)
Chapter Eight(3)Chapter Eight(3)
Chapter Eight(3)
 
Introducción a Elixir
Introducción a ElixirIntroducción a Elixir
Introducción a Elixir
 
python.ppt
python.pptpython.ppt
python.ppt
 
20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
The Ring programming language version 1.7 book - Part 30 of 196
The Ring programming language version 1.7 book - Part 30 of 196The Ring programming language version 1.7 book - Part 30 of 196
The Ring programming language version 1.7 book - Part 30 of 196
 
scripting in Python
scripting in Pythonscripting in Python
scripting in Python
 
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
 
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
개발 과정 최적화 하기 내부툴로 더욱 강력한 개발하기 Stephen kennedy _(11시40분_103호)
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
 

Faster Python, FOSDEM

  • 1. FOSDEM 2013, Bruxelles Victor Stinner <victor.stinner@gmail.com> Distributed under CC BY-SA license: http://creativecommons.org/licenses/by-sa/3.0/ Two projects to optimize Python
  • 2. CPython bytecode is inefficient AST optimizer Register-based bytecode Agenda
  • 4. Python is very dynamic, cannot be easily optimized CPython peephole optimizer only supports basic optimizations like replacing 1+1 with 2 CPython bytecode is inefficient CPython is inefficient
  • 5. def func(): x = 33 return x Inefficient bytecode Given a simple function:
  • 6. LOAD_CONST 1 (33) STORE_FAST 0 (x) LOAD_FAST 0 (x) RETURN_VALUE LOAD_CONST 1 (33) RETURN_VALUE RETURN_CONST 1 (33) Inefficient bytecode I get: (4 instructions) I expected: (2 instructions) Or even: (1 instruction)
  • 7. Parse the source code Build an Abstract Syntax Tree (AST) Emit Bytecode Peephole optimizer Evaluate bytecode How Python works
  • 8. Parse the source code Build an Abstract Syntax Tree (AST) → astoptimizer Emit Bytecode Peephole optimizer Evaluate bytecode → registervm Let's optimize!
  • 10. AST is high-level and contains a lot of information Rewrite AST to get faster code Disable dynamic features of Python to allow more optimizations Unpythonic optimizations are disabled by default AST optimizer
  • 11. Call builtin functions and methods: len("abc") → 3 (32).bit_length() → 6 math.log(32) / math.log(2) → 5.0 Evaluate str % args and print(arg1, arg2, ...) "x=%s" % 5 → "x=5" print(2.3) → print("2.3") AST optimizations (1)
  • 12. Simplify expressions (2 instructions => 1): not(x in y) → x not in y Optimize loops (Python 2 only): while True: ... → while 1: ... for x in range(10): ... → for x in xrange(10): ... In Python 2, True requires a (slow) global lookup, the number 1 is a constant AST optimizations (2)
  • 13. Replace list (build at runtime) with tuple (constant): for x in [1, 2, 3]: ... → for x in (1, 2, 3): ... Replace list with set (Python 3 only): if x in [1, 2, 3]: ... → if x in {1, 2, 3}: ... In Python 3, {1,2,3} is converted to a constant frozenset (if used in a test) AST optimizations (3)
  • 14. Evaluate operators: "abcdef"[:3] → "abc" def f(): return 2 if 4 < 5 else 3 → def f(): return 2 Remove dead code: if 0: ... → pass AST optimizations (4)
  • 15. "if DEBUG" and "if os.name == 'nt'" have a cost at runtime Tests can be removed at compile time: cfg.add_constant('DEBUG', False) cfg.add_constant('os.name', os.name) Pythonic preprocessor: no need to modify your code, code works without the preprocessor Used as a preprocessor
  • 16. Constant folding: experimental support (buggy) Unroll (short) loops Function inlining (is it possible?) astoptimizer TODO list
  • 18. Rewrite instructions to use registers instead of the stack Use single assignment form (SSA) Build the control flow graph Apply different optimizations Register allocator Emit bytecode registervm
  • 19. def func(): x = 33 return x + 1 LOAD_CONST 1 (33) # stack: [33] STORE_FAST 0 (x) # stack: [] LOAD_FAST 0 (x) # stack: [33] LOAD_CONST 2 (1) # stack: [33, 1] BINARY_ADD # stack: [34] RETURN_VALUE # stack: [] (6 instructions) Stack-based bytecode
  • 20. def func(): x = 33 return x + 1 LOAD_CONST_REG 'x', 33 (const#1) LOAD_CONST_REG R0, 1 (const#2) BINARY_ADD_REG R0, 'x', R0 RETURN_VALUE_REG R0 (4 instructions) Register bytecode
  • 21. Using registers allows more optimizations Move constants loads and globals loads (slow) out of loops: return [str(item) for item in data] Constant folding: x=1; y=x; return y → y=1; return y Remove duplicate load/store instructions: constants, names, globals, etc. registervm optim (1)
  • 22. Stack-based bytecode : return (len("a"), len("a")) LOAD_GLOBAL 'len' (name#0) LOAD_CONST 'a' (const#1) CALL_FUNCTION (1 positional) LOAD_GLOBAL 'len' (name#0) LOAD_CONST 'a' (const#1) CALL_FUNCTION (1 positional) BUILD_TUPLE 2 RETURN_VALUE Merge duplicate loads
  • 23. Register-based bytecode : return (len("a"), len("a")) LOAD_GLOBAL_REG R0, 'len' (name#0) LOAD_CONST_REG R1, 'a' (const#1) CALL_FUNCTION_REG R2, R0, 1, R1 CALL_FUNCTION_REG R0, R0, 1, R1 CLEAR_REG R1 BUILD_TUPLE_REG R2, 2, R2, R0 RETURN_VALUE_REG R2 Merge duplicate loads
  • 24. Remove unreachable instructions (dead code) Remove useless jumps (relative jump + 0) registervm optim (2)
  • 25. BuiltinMethodLookup: fewer instructions: 390 => 22 24 ms => 1 ms (24x faster) NormalInstanceAttribute: fewer instructions: 381 => 81 40 ms => 21 ms (1.9x faster) StringPredicates: fewer instructions: 303 => 92 42 ms => 24 ms (1.8x faster) Pybench results
  • 26. Pybench is a microbenchmark Don't expect such speedup on your applications registervm is still experimental and emits invalid code Pybench results
  • 27. PyPy and its amazing JIT Pymothoa, Numba: JIT (LLVM) WPython: "Wordcode-based" bytecode Hotpy 2 Shedskin, Pythran, Nuitka: compile to C++ Other projects
  • 28. Questions? https://bitbucket.org/haypo/astoptimizer http://hg.python.org/sandbox/registervm Distributed under CC BY-SA license: http://creativecommons.org/licenses/by-sa/3.0/ Contact: victor.stinner@gmail.com
  • 29. Thanks to David Malcom for the LibreOffice template http://dmalcolm.livejournal.com/