SlideShare a Scribd company logo
Good news,
everybody!
Understanding Guile 2.2 Performance
FOSDEM 2016
Andy Wingo
wingo@{igalia,pobox}.com
https://wingolog.org
Good news, everybody!
Guile is faster! Woo hoo!
Bad news, everybody!
“The expert Lisp programmer eventually
develops a good ’efficiency
model’.”—Peter Norvig, PAIP
Your efficiency model is out of date
Recap: 1.8
Simple
Approximate efficiency model:
Cost O(n) in number of reductions❧
Example:
Which is slower, (+ 1 (+ 2 3)) (+ 1 5) ?❧
No compilation, so macros cost at run-
time
Recap: 2.0
Compilation: macro use has no run-time
cost
Partial evaluation (peval)
Cost of (+ 1 (+ 2 3)) and 6 are same❧
Some contification
More on this later❧
No allocation when building
environments
Recap: 2.0
Cost O(n) in number of instructions
But instructions do not map nicely to
Scheme
❧
Inspect the n in O(n):
,optimize (effect of peval)❧
,disassemble (instructions, effect of
contification)
❧
A note on peval
> ,optimize (let lp ((n 5))
(if (zero? n)
n
(+ n (lp (1- n)))))
$6 = 15
Function inlining, loop unrolling
(recursive or iterative), constant folding,
constant propagation, beta reduction,
strength reduction
Essentially lexical in nature
Understanding peval is another talk
Guile 2.2
Many improvements of degree
Some improvements of kind
Understanding needed to re-develop
efficiency model
Improvements of kind
A lambda is not always a closure
Names don’t keep data alive
Unlimited recursion
Dramatically better loops
Lower footprint
Unboxed arithmetic
A lambda is not always a
closure
A lambda expression defines a function
That function may or may not exist at
run-time
A lambda is not always a
closure
Gone
Inlined
Contified
Code pointer
Closure
Lambda: Gone
peval can kill unreachable lambdas
> ,opt (let ((f (lambda ()
(launch-the-missiles!))))
42)
42
> ,opt (let ((launch? #f)
(f (lambda ()
(launch-the-missiles!))))
(if launch? (f) 'just-kidding))
just-kidding
Lambda: Inlined
peval can inline small or called-once
lambdas
> ,opt (let ((launch? #t)
(f (lambda ()
(launch-the-missiles!))))
(if launch? (f) 'just-kidding))
(launch-the-missiles!)
Lambda: Contified
Many of Guile 2.2’s optimizations can’t
be represented in Scheme
> (define (count-down n)
(define loop
(lambda (n out)
(let ((out (cons n out)))
(if (zero? n)
out
(loop (1- n) out)))))
(loop n '()))
> ,x count-down
Disassembly of #<procedure count-down (n)> at
[...]
L1:
10 (cons 2 1 2)
11 (br-if-u64-=-scm 0 1 #f 5) ;; -> L2
14 (sub/immediate 1 1 1)
15 (br -5) ;; -> L1
L2:
[...]
loop function was contified: incorporated
into body of count-down
Lambda: Contified
Inline : Copy :: Contify : Rewire
Contification always an optimization
Never causes code growth❧
Enables other optimizations❧
Can contify a set of functions if
All callers visible to compiler❧
Always called with same continuation❧
Reliable: Expect this optimization
Lambda: Code pointer
(define (thing)
(define (log what)
(format #t "Very important log message: ~an" what)
;; If `log' is too short, it will be inlined. Make it bigger.
(format #t "Did I ever tell you about my chickensn")
(format #t "I was going to name one Donkeyn")
(format #t "I always wanted a donkeyn")
(format #t "In the end we called her Raveonetten")
(format #t "Donkey is not a great name for a chickenn")
(newline) (newline) (newline) (newline) (newline))
(log "ohai")
(log "kittens")
(log "donkeys"))
Lambda: Code pointer
,x thing
Disassembly of #<procedure thing ()> at #x97d
[...]
Disassembly of log at #x97d754:
[...]
Two functions, we prevented inlining,
whew
Lambda: Code pointer
,x thing
Disassembly of #<procedure thing ()> at #x97d
[...]
12 (call-label 3 2 8) ;; lo
Call procedure at known offset (+8 in
this case)
Cheaper call
Precondition: All callers known
Lambda: Code pointer
,x thing
Disassembly of #<procedure thing ()> at #x97d
[...]
12 (call-label 3 2 8) ;; lo
No need for procedure-as-value
Guile currently has a uniform calling
convention
Callee-as-a-value passed as arg 0❧
Arg 0 provides access to free vars, if
any
❧
Lambda: Code pointer
If you don’t need the code pointer...
No free vars? Pass any value as arg 0
1 free var? Pass free variable as arg 0
2 free vars? Free vars in pair, pass that
pair as arg 0
3 or more? Free vars in vector
Mutually recursive set of procedures?
One free var representation for union of
free variables of all functions
Lambda: Closure
Not all callees known? Closure
Closure: an object containing a code
pointer and free vars
Though...
0 free variables? Use statically allocated
closure
Entry point of mutually recursive set of
functions, and all other functions are
well-known? Share closure
Lambda: It’s complicated
Names don’t keep data
alive
2.0: Named variables kept alive
In particular, procedure arguments and
the closure
(define (foo x)
;; Should I try to "free" x here?
;; (set! x #f)
(deep-recursive-call)
#f)
(foo (compute-big-vector))
Names don’t keep data
alive
2.2: Only live data is live
User-visible change: less retention...
...though, backtraces sometimes missing
arguments
Be (space-)safe out there
Unlimited recursion
Guile 2.0: Default stack size 64K values
Could raise or lower with
GUILE_STACK_SIZE
Little buffer at end for handling errors,
but quite flaky
Unlimited recursion
Guile 2.2: Stack starts at one page
Stack grows as needed
When stack shrinks, excess pages
returned to OS, at GC
See manual
Recurse away :)
Dramatically better loops
Compiler in Guile 2.2 can reason about
loops
Contification produces loops
Improvements of degree: CSE, DCE, etc
over loops
Improvements of kind: hoisting
Dramatically better loops
One entry? Hoist effect-free or always-
reachable expressions (LICM)
One entry and one exit? Hoisting of all
idempotent expressions (peeling)
(define (vector-fill! v x)
(let lp ((n 0))
(when (< n (vector-length v))
(vector-set! v n x)
(lp (1+ n)))))
Disassembly needed to see.
Footprint
Guile 2.0
3.38 MiB overhead per process❧
13.5 ms startup time❧
(Overhead: Dirty memory, 64 bit)
Guile 2.2
2.04 MiB overhead per process❧
7.5 ms startup time❧
ELF shareable static data allocation
Lazy stack growth (per-thread win too!)
Unboxed arithmetic
Guile 2.0
All floating-point numbers are heap-
allocated
❧
Guile 2.2
Sometimes we can use raw floating-
point arithmetic
❧
Sometimes 64-bit integers are
unboxed too
❧
Unboxed arithmetic
> (define (f32vector-double! v)
(let lp ((i 0))
(when (< i (bytevector-length v))
(let ((f32 (bytevector-ieee-single-native-ref v i)))
(bytevector-ieee-single-native-set! v i (* f32 2))
(lp (+ i 4))))))
Unboxed arithmetic
> (define v (make-f32vector #e1e6 1.0))
> ,time (f32vector-double! v)
Guile 2.0: 152ms, 71ms in GC
Guile 2.2: 15.2ms, 0ms in GC
10X improvement!
> ,x f32vector-double!
...
L1:
18 (bv-f32-ref 0 3 1)
19 (fadd 0 0 0)
20 (bv-f32-set! 3 1 0)
21 (uadd/immediate 1 1 4)
22 (br-if-u64-< 1 4 #f -4) ;; -> L1
Index and f32 values unboxed
Length computation hoisted
Strength reduction on the double
Loop inverted
Unboxed arithmetic
Details gnarly.
Why not:
> (define (f32vector-map! v f)
(let lp ((i 0))
(when (< i (f32vector-length v))
(let ((f32 (f32vector-ref v i)))
(f32vector-set! v i (f f32))
(lp (+ i 1))))))
Unboxed arithmetic
(when (< i (f32vector-length v))
(let ((f32 (f32vector-ref v i)))
(f32vector-set! v i (f f32))
(lp (+ i 1))))
Current compiler limitation: doesn’t
undersand f32vector-length, which
asserts bytevector length divisible by 4
Compiler can’t see through (f f32): has
to box f32
... unless f is inlined
Unboxed arithmetic
In practice: 10X speedups, if your
efficiency model is accurate
Odd consequence: type checks are back
(unless (= x (logand x #xffffffff))
(error "not a uint32"))
Allows Guile to unbox x as integer
Useful on function arguments
Unboxed arithmetic
-- LuaJIT
local uint32 = ffi.new('uint32[1]')
local function to_uint32(x)
uint32[0] = x
return uint32[0]
end
For floats, use f64vectors :)
Summary
Guile 2.0: Cost is O(n) in number of
instructions
Guile 2.2: Same, but to understand
performance
Mapping from Scheme to instructions
more complex
❧
,disassemble necessary to verify❧
Pay more attention to allocation❧
Or just come along for the ride and enjoy
the speedups :)

More Related Content

What's hot

Async await in C++
Async await in C++Async await in C++
Async await in C++
cppfrug
 
Kotlin boost yourproductivity
Kotlin boost yourproductivityKotlin boost yourproductivity
Kotlin boost yourproductivity
nklmish
 
The Cryptol Epilogue: Swift and Bulletproof VHDL
The Cryptol Epilogue: Swift and Bulletproof VHDLThe Cryptol Epilogue: Swift and Bulletproof VHDL
The Cryptol Epilogue: Swift and Bulletproof VHDL
Ulisses Costa
 
To Swift 2...and Beyond!
To Swift 2...and Beyond!To Swift 2...and Beyond!
To Swift 2...and Beyond!
Scott Gardner
 
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in JavaRuntime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Juan Fumero
 
Re-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for XtextRe-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for Xtext
Edward Willink
 
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
confluent
 
Gor Nishanov, C++ Coroutines – a negative overhead abstraction
Gor Nishanov,  C++ Coroutines – a negative overhead abstractionGor Nishanov,  C++ Coroutines – a negative overhead abstraction
Gor Nishanov, C++ Coroutines – a negative overhead abstraction
Sergey Platonov
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaX
Kernel TLV
 
Optimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleOptimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with Truffle
Stefan Marr
 
R and C++
R and C++R and C++
R and C++
Romain Francois
 
Highly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core SystemHighly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core System
James Gan
 
DWARF Data Representation
DWARF Data RepresentationDWARF Data Representation
DWARF Data Representation
Wang Hsiangkai
 
Cache aware hybrid sorter
Cache aware hybrid sorterCache aware hybrid sorter
Cache aware hybrid sorterManchor Ko
 
Fork and join framework
Fork and join frameworkFork and join framework
Fork and join framework
Minh Tran
 
Actor Concurrency
Actor ConcurrencyActor Concurrency
Actor Concurrency
Alex Miller
 
How to Think in RxJava Before Reacting
How to Think in RxJava Before ReactingHow to Think in RxJava Before Reacting
How to Think in RxJava Before Reacting
IndicThreads
 
A closure ekon16
A closure ekon16A closure ekon16
A closure ekon16
Max Kleiner
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
Rob Skillington
 
Arduino C maXbox web of things slide show
Arduino C maXbox web of things slide showArduino C maXbox web of things slide show
Arduino C maXbox web of things slide show
Max Kleiner
 

What's hot (20)

Async await in C++
Async await in C++Async await in C++
Async await in C++
 
Kotlin boost yourproductivity
Kotlin boost yourproductivityKotlin boost yourproductivity
Kotlin boost yourproductivity
 
The Cryptol Epilogue: Swift and Bulletproof VHDL
The Cryptol Epilogue: Swift and Bulletproof VHDLThe Cryptol Epilogue: Swift and Bulletproof VHDL
The Cryptol Epilogue: Swift and Bulletproof VHDL
 
To Swift 2...and Beyond!
To Swift 2...and Beyond!To Swift 2...and Beyond!
To Swift 2...and Beyond!
 
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in JavaRuntime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
 
Re-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for XtextRe-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for Xtext
 
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
Exactly Once Semantics Revisited (Jason Gustafson, Confluent) Kafka Summit NY...
 
Gor Nishanov, C++ Coroutines – a negative overhead abstraction
Gor Nishanov,  C++ Coroutines – a negative overhead abstractionGor Nishanov,  C++ Coroutines – a negative overhead abstraction
Gor Nishanov, C++ Coroutines – a negative overhead abstraction
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaX
 
Optimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleOptimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with Truffle
 
R and C++
R and C++R and C++
R and C++
 
Highly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core SystemHighly Scalable Java Programming for Multi-Core System
Highly Scalable Java Programming for Multi-Core System
 
DWARF Data Representation
DWARF Data RepresentationDWARF Data Representation
DWARF Data Representation
 
Cache aware hybrid sorter
Cache aware hybrid sorterCache aware hybrid sorter
Cache aware hybrid sorter
 
Fork and join framework
Fork and join frameworkFork and join framework
Fork and join framework
 
Actor Concurrency
Actor ConcurrencyActor Concurrency
Actor Concurrency
 
How to Think in RxJava Before Reacting
How to Think in RxJava Before ReactingHow to Think in RxJava Before Reacting
How to Think in RxJava Before Reacting
 
A closure ekon16
A closure ekon16A closure ekon16
A closure ekon16
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
 
Arduino C maXbox web of things slide show
Arduino C maXbox web of things slide showArduino C maXbox web of things slide show
Arduino C maXbox web of things slide show
 

Similar to Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)

Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsPragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Marina Kolpakova
 
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Anne Nicolas
 
Redis Lua Scripts
Redis Lua ScriptsRedis Lua Scripts
Redis Lua Scripts
Itamar Haber
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5PRADEEP
 
openMP loop parallelization
openMP loop parallelizationopenMP loop parallelization
openMP loop parallelization
Albert DeFusco
 
Javascript engine performance
Javascript engine performanceJavascript engine performance
Javascript engine performanceDuoyi Wu
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Marina Kolpakova
 
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander NasonovMultiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
eurobsdcon
 
Boosting Developer Productivity with Clang
Boosting Developer Productivity with ClangBoosting Developer Productivity with Clang
Boosting Developer Productivity with Clang
Samsung Open Source Group
 
Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)
Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)
Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)
Igalia
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.ppt
ssuserd64918
 
02 functions, variables, basic input and output of c++
02   functions, variables, basic input and output of c++02   functions, variables, basic input and output of c++
02 functions, variables, basic input and output of c++
Manzoor ALam
 
Giorgio zoppi cpp11concurrency
Giorgio zoppi cpp11concurrencyGiorgio zoppi cpp11concurrency
Giorgio zoppi cpp11concurrency
Giorgio Zoppi
 
A Spin-off: Firebird Checked by PVS-Studio
A Spin-off: Firebird Checked by PVS-StudioA Spin-off: Firebird Checked by PVS-Studio
A Spin-off: Firebird Checked by PVS-Studio
Andrey Karpov
 
掀起 Swift 的面紗
掀起 Swift 的面紗掀起 Swift 的面紗
掀起 Swift 的面紗
Pofat Tseng
 
00_Introduction to Java.ppt
00_Introduction to Java.ppt00_Introduction to Java.ppt
00_Introduction to Java.ppt
HongAnhNguyn285885
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
AMD Developer Central
 
Chapter 2 Part2 C
Chapter 2 Part2 CChapter 2 Part2 C
Chapter 2 Part2 Cececourse
 
Rubinius @ RubyAndRails2010
Rubinius @ RubyAndRails2010Rubinius @ RubyAndRails2010
Rubinius @ RubyAndRails2010Dirkjan Bussink
 

Similar to Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016) (20)

Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsPragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
 
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
 
Redis Lua Scripts
Redis Lua ScriptsRedis Lua Scripts
Redis Lua Scripts
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
 
openMP loop parallelization
openMP loop parallelizationopenMP loop parallelization
openMP loop parallelization
 
Javascript engine performance
Javascript engine performanceJavascript engine performance
Javascript engine performance
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander NasonovMultiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
 
Boosting Developer Productivity with Clang
Boosting Developer Productivity with ClangBoosting Developer Productivity with Clang
Boosting Developer Productivity with Clang
 
Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)
Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)
Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)
 
sonam Kumari python.ppt
sonam Kumari python.pptsonam Kumari python.ppt
sonam Kumari python.ppt
 
02 functions, variables, basic input and output of c++
02   functions, variables, basic input and output of c++02   functions, variables, basic input and output of c++
02 functions, variables, basic input and output of c++
 
Giorgio zoppi cpp11concurrency
Giorgio zoppi cpp11concurrencyGiorgio zoppi cpp11concurrency
Giorgio zoppi cpp11concurrency
 
A Spin-off: Firebird Checked by PVS-Studio
A Spin-off: Firebird Checked by PVS-StudioA Spin-off: Firebird Checked by PVS-Studio
A Spin-off: Firebird Checked by PVS-Studio
 
掀起 Swift 的面紗
掀起 Swift 的面紗掀起 Swift 的面紗
掀起 Swift 的面紗
 
Faster Python, FOSDEM
Faster Python, FOSDEMFaster Python, FOSDEM
Faster Python, FOSDEM
 
00_Introduction to Java.ppt
00_Introduction to Java.ppt00_Introduction to Java.ppt
00_Introduction to Java.ppt
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Chapter 2 Part2 C
Chapter 2 Part2 CChapter 2 Part2 C
Chapter 2 Part2 C
 
Rubinius @ RubyAndRails2010
Rubinius @ RubyAndRails2010Rubinius @ RubyAndRails2010
Rubinius @ RubyAndRails2010
 

More from Igalia

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
Igalia
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
Igalia
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to Maintenance
Igalia
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdf
Igalia
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
Igalia
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
Igalia
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Igalia
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
Igalia
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
Igalia
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
Igalia
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
Igalia
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
Igalia
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Igalia
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
Igalia
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
Igalia
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
Igalia
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
Igalia
 

More from Igalia (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to Maintenance
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdf
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
 

Recently uploaded

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

Good news, everybody! Guile 2.2 performance notes (FOSDEM 2016)

  • 1. Good news, everybody! Understanding Guile 2.2 Performance FOSDEM 2016 Andy Wingo wingo@{igalia,pobox}.com https://wingolog.org
  • 2. Good news, everybody! Guile is faster! Woo hoo!
  • 3. Bad news, everybody! “The expert Lisp programmer eventually develops a good ’efficiency model’.”—Peter Norvig, PAIP Your efficiency model is out of date
  • 4. Recap: 1.8 Simple Approximate efficiency model: Cost O(n) in number of reductions❧ Example: Which is slower, (+ 1 (+ 2 3)) (+ 1 5) ?❧ No compilation, so macros cost at run- time
  • 5. Recap: 2.0 Compilation: macro use has no run-time cost Partial evaluation (peval) Cost of (+ 1 (+ 2 3)) and 6 are same❧ Some contification More on this later❧ No allocation when building environments
  • 6. Recap: 2.0 Cost O(n) in number of instructions But instructions do not map nicely to Scheme ❧ Inspect the n in O(n): ,optimize (effect of peval)❧ ,disassemble (instructions, effect of contification) ❧
  • 7. A note on peval > ,optimize (let lp ((n 5)) (if (zero? n) n (+ n (lp (1- n))))) $6 = 15 Function inlining, loop unrolling (recursive or iterative), constant folding, constant propagation, beta reduction, strength reduction Essentially lexical in nature Understanding peval is another talk
  • 8. Guile 2.2 Many improvements of degree Some improvements of kind Understanding needed to re-develop efficiency model
  • 9. Improvements of kind A lambda is not always a closure Names don’t keep data alive Unlimited recursion Dramatically better loops Lower footprint Unboxed arithmetic
  • 10. A lambda is not always a closure A lambda expression defines a function That function may or may not exist at run-time
  • 11. A lambda is not always a closure Gone Inlined Contified Code pointer Closure
  • 12. Lambda: Gone peval can kill unreachable lambdas > ,opt (let ((f (lambda () (launch-the-missiles!)))) 42) 42 > ,opt (let ((launch? #f) (f (lambda () (launch-the-missiles!)))) (if launch? (f) 'just-kidding)) just-kidding
  • 13. Lambda: Inlined peval can inline small or called-once lambdas > ,opt (let ((launch? #t) (f (lambda () (launch-the-missiles!)))) (if launch? (f) 'just-kidding)) (launch-the-missiles!)
  • 14. Lambda: Contified Many of Guile 2.2’s optimizations can’t be represented in Scheme > (define (count-down n) (define loop (lambda (n out) (let ((out (cons n out))) (if (zero? n) out (loop (1- n) out))))) (loop n '()))
  • 15. > ,x count-down Disassembly of #<procedure count-down (n)> at [...] L1: 10 (cons 2 1 2) 11 (br-if-u64-=-scm 0 1 #f 5) ;; -> L2 14 (sub/immediate 1 1 1) 15 (br -5) ;; -> L1 L2: [...] loop function was contified: incorporated into body of count-down
  • 16. Lambda: Contified Inline : Copy :: Contify : Rewire Contification always an optimization Never causes code growth❧ Enables other optimizations❧ Can contify a set of functions if All callers visible to compiler❧ Always called with same continuation❧ Reliable: Expect this optimization
  • 17. Lambda: Code pointer (define (thing) (define (log what) (format #t "Very important log message: ~an" what) ;; If `log' is too short, it will be inlined. Make it bigger. (format #t "Did I ever tell you about my chickensn") (format #t "I was going to name one Donkeyn") (format #t "I always wanted a donkeyn") (format #t "In the end we called her Raveonetten") (format #t "Donkey is not a great name for a chickenn") (newline) (newline) (newline) (newline) (newline)) (log "ohai") (log "kittens") (log "donkeys"))
  • 18. Lambda: Code pointer ,x thing Disassembly of #<procedure thing ()> at #x97d [...] Disassembly of log at #x97d754: [...] Two functions, we prevented inlining, whew
  • 19. Lambda: Code pointer ,x thing Disassembly of #<procedure thing ()> at #x97d [...] 12 (call-label 3 2 8) ;; lo Call procedure at known offset (+8 in this case) Cheaper call Precondition: All callers known
  • 20. Lambda: Code pointer ,x thing Disassembly of #<procedure thing ()> at #x97d [...] 12 (call-label 3 2 8) ;; lo No need for procedure-as-value Guile currently has a uniform calling convention Callee-as-a-value passed as arg 0❧ Arg 0 provides access to free vars, if any ❧
  • 21. Lambda: Code pointer If you don’t need the code pointer... No free vars? Pass any value as arg 0 1 free var? Pass free variable as arg 0 2 free vars? Free vars in pair, pass that pair as arg 0 3 or more? Free vars in vector Mutually recursive set of procedures? One free var representation for union of free variables of all functions
  • 22. Lambda: Closure Not all callees known? Closure Closure: an object containing a code pointer and free vars Though... 0 free variables? Use statically allocated closure Entry point of mutually recursive set of functions, and all other functions are well-known? Share closure
  • 24. Names don’t keep data alive 2.0: Named variables kept alive In particular, procedure arguments and the closure (define (foo x) ;; Should I try to "free" x here? ;; (set! x #f) (deep-recursive-call) #f) (foo (compute-big-vector))
  • 25. Names don’t keep data alive 2.2: Only live data is live User-visible change: less retention... ...though, backtraces sometimes missing arguments Be (space-)safe out there
  • 26. Unlimited recursion Guile 2.0: Default stack size 64K values Could raise or lower with GUILE_STACK_SIZE Little buffer at end for handling errors, but quite flaky
  • 27. Unlimited recursion Guile 2.2: Stack starts at one page Stack grows as needed When stack shrinks, excess pages returned to OS, at GC See manual Recurse away :)
  • 28. Dramatically better loops Compiler in Guile 2.2 can reason about loops Contification produces loops Improvements of degree: CSE, DCE, etc over loops Improvements of kind: hoisting
  • 29. Dramatically better loops One entry? Hoist effect-free or always- reachable expressions (LICM) One entry and one exit? Hoisting of all idempotent expressions (peeling) (define (vector-fill! v x) (let lp ((n 0)) (when (< n (vector-length v)) (vector-set! v n x) (lp (1+ n))))) Disassembly needed to see.
  • 30. Footprint Guile 2.0 3.38 MiB overhead per process❧ 13.5 ms startup time❧ (Overhead: Dirty memory, 64 bit) Guile 2.2 2.04 MiB overhead per process❧ 7.5 ms startup time❧ ELF shareable static data allocation Lazy stack growth (per-thread win too!)
  • 31. Unboxed arithmetic Guile 2.0 All floating-point numbers are heap- allocated ❧ Guile 2.2 Sometimes we can use raw floating- point arithmetic ❧ Sometimes 64-bit integers are unboxed too ❧
  • 32. Unboxed arithmetic > (define (f32vector-double! v) (let lp ((i 0)) (when (< i (bytevector-length v)) (let ((f32 (bytevector-ieee-single-native-ref v i))) (bytevector-ieee-single-native-set! v i (* f32 2)) (lp (+ i 4))))))
  • 33. Unboxed arithmetic > (define v (make-f32vector #e1e6 1.0)) > ,time (f32vector-double! v) Guile 2.0: 152ms, 71ms in GC Guile 2.2: 15.2ms, 0ms in GC 10X improvement!
  • 34. > ,x f32vector-double! ... L1: 18 (bv-f32-ref 0 3 1) 19 (fadd 0 0 0) 20 (bv-f32-set! 3 1 0) 21 (uadd/immediate 1 1 4) 22 (br-if-u64-< 1 4 #f -4) ;; -> L1 Index and f32 values unboxed Length computation hoisted Strength reduction on the double Loop inverted
  • 35. Unboxed arithmetic Details gnarly. Why not: > (define (f32vector-map! v f) (let lp ((i 0)) (when (< i (f32vector-length v)) (let ((f32 (f32vector-ref v i))) (f32vector-set! v i (f f32)) (lp (+ i 1))))))
  • 36. Unboxed arithmetic (when (< i (f32vector-length v)) (let ((f32 (f32vector-ref v i))) (f32vector-set! v i (f f32)) (lp (+ i 1)))) Current compiler limitation: doesn’t undersand f32vector-length, which asserts bytevector length divisible by 4 Compiler can’t see through (f f32): has to box f32 ... unless f is inlined
  • 37. Unboxed arithmetic In practice: 10X speedups, if your efficiency model is accurate Odd consequence: type checks are back (unless (= x (logand x #xffffffff)) (error "not a uint32")) Allows Guile to unbox x as integer Useful on function arguments
  • 38. Unboxed arithmetic -- LuaJIT local uint32 = ffi.new('uint32[1]') local function to_uint32(x) uint32[0] = x return uint32[0] end For floats, use f64vectors :)
  • 39. Summary Guile 2.0: Cost is O(n) in number of instructions Guile 2.2: Same, but to understand performance Mapping from Scheme to instructions more complex ❧ ,disassemble necessary to verify❧ Pay more attention to allocation❧ Or just come along for the ride and enjoy the speedups :)