Naked Performance
(with Clojure)
Tommi Reiman 27.9.2019
ClojuTRE
Programmers waste enormous amounts of
time thinking about, or worrying about, the
speed of noncritical parts of their programs,
and these attempts at efficiency actually
have a strong negative impact when
debugging and maintenance are
considered. We should forget about small
efficiencies, say about 97% of the time:
premature optimization is the root of all
evil. Yet we should not pass up our
opportunities in that critical 3%.
On Performance Optimization
Donald Knuth
Programmers waste enormous amounts of
time thinking about, or worrying about, the
speed of noncritical parts of their programs,
and these attempts at efficiency actually
have a strong negative impact when
debugging and maintenance are
considered. We should forget about small
efficiencies, say about 97% of the time:
premature optimization is the root of all
evil. Yet we should not pass up our
opportunities in that critical 3%.
On Performance Optimization
Donald Knuth
Programmers waste enormous amounts of
time thinking about, or worrying about, the
speed of noncritical parts of their programs,
and these attempts at efficiency actually
have a strong negative impact when
debugging and maintenance are
considered. We should forget about small
efficiencies, say about 97% of the time:
premature optimization is the root of all
evil. Yet we should not pass up our
opportunities in that critical 3%.
On Performance Optimization
Donald Knuth
• The Language
• The Libraries
• The Frameworks
• (The Application)
The 3% in Clojure
TechEmpower Web Framework Benchmarks 2017
5x slower than Java :(
Best we can do?
TechEmpower Web Framework Benchmarks 2019
:)
direct-compilation
fastest java server
reitit-based stack
~600ns budget
1:The LittleThings
Performant Clojure Code
• Reflection, Boxed Math, DynamicVars, …
• Measure and read the source
• Cost of Abstractions
• Cost of Immutability
(defrecord Request [request-method uri])
(import (java.util HashMap))
(let [req1 (hash-map :request-method :get, :uri "/ping")
req2 (array-map :request-method :get, :uri "/ping")
req3 (HashMap. {:request-method :get, :uri "/ping"})
req4 (!"Request :get "/ping")]
#$ 17ns (hash-map)
(cc/quick-bench (:uri req1))
#$ 7ns (array-map)
(cc/quick-bench (:uri req2))
#$ 5ns (mutable-map)
(cc/quick-bench (.get req3 :uri))
#$ 3ns (record)
(cc/quick-bench (:uri req4))
)
#$ 200ns
(cc/quick-bench
(merge {} {}))
#$ 400ns
(cc/quick-bench
(merge {} {} {}))
#$ 600ns
(cc/quick-bench
(merge {} {} {} {}))
#$ 800ns
(cc/quick-bench
(merge {} {} {} {} {}))
#$ 1ϡ &' 200ns
2: Data & Compilers
(def Event
[:map
[:id string?]
[:tags [:set keyword?]]
[:address
[:map
[:street string?]
[:lonlat [:tuple double? double?]]]]])
;; run transformation compiler
(def json->Event
(m/transformer Event mt/json-transformer))
;; 500ns (vs 95µs in spec-tools)
(json->Event some-json)
(r/router
[["/ping"]
["/:user-id/orders"]
["/bulk/:bulk-id"]
["/public()path"]
["/:version/status"]]
{:exception pretty/exception})
(require '[compojure.core :as c])
(defn ping-handler [_]
{:status 200
:headers {"Content-Type" "text/plain"}
:body "pong"})
(def app
(c/context "/api" []
(c/context "/status" []
(c/GET "/ping" [] ping-handler))))
(app {:request-method :get, :uri "/api/status/ping"})
;{:status 200
; :headers {"Content-Type" "text/plain"}
; :body "pong"}
/api
/status
/ping
2.2µs
2.4µs
2.2µs
7.2µs
(require '[reitit.ring :as ring])
(def app
(ring/ring-handler
(ring/router
["/api"
["/status"
["/ping" {:get ping-handler}]]])))
(app {:request-method :get, :uri "/api/status/ping"})
;{:status 200
; :headers {"Content-Type" "text/plain"}
; :body "hello"}
/api/status/ping
126ns
3: Right Algorithms
• SingleStaticPath
• Lookup
• Trie
• Mixed
• Quarantine
• Linear (Brute Force)
Routing Algorithms
Let’s go Crazy
• Create a router with 1000 generated routes, each with 10-20
fragments, generated from clojure.core function names + arguments
• Add a health check route /api/ping as the last route
• Benchmark health check + longest generated path
(!" router (r/routes) (count))
; *+ 1001
(!" router (r/router-name))
; *+ :mixed-router
(r/match-by-path router "/api/ping")
;#Match{:template "/api/ping",
; :data {:name :user/ping},
; :result nil,
; :path-params {},
; :path "/api/ping"}
(r/match-by-path router "/defonce/unchecked-char/rand/unchecked-float/assoc/:val/
remove-all-methods/with-loading-context/boolean-array/alter/:args/char-name-string/
unchecked-negate-int/ex-info/aset/:val/extend-type/:specs/unchecked-multiply-int/
next/chars/prefer-method/:dispatch-val-y/shuffle")
;#Match{:template "/defonce/unchecked-char/rand/,-."
; :data {:name :779},
; :result nil,
; :path-params {:val ":val"
; :args ":args"
; :specs ":specs"
; :dispatch-val-y ":dispatch-val-y"},
; :path "/defonce/unchecked-char/rand/,-."}
health-check: lookup-router (13ns)
longest-path: trie-router (460ns) ~2ϡ
4: Embrace Java
• Undertow/Java
• XNIO, ByteBuffers
• Threads
• java.util.concurrent
• ZeroCopy Requests
• 3.9M text/plain req/sec
(require '[pohjavirta.server :as server])
(defn handler [_]
{:status 200,
:headers {"Content-Type" "text/plain"}
:body "Hello, World!"})
;; create and start the server
(-> handler server/create server/start)
$ wrk -t16 -c16 -d2s http://127.0.0.1:8080
Running 2s test @ http://127.0.0.1:8080
16 threads and 16 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 125.96us 37.69us 1.61ms 94.95%
Req/Sec 7.82k 530.24 8.55k 83.63%
261483 requests in 2.10s, 32.42MB read
Requests/sec: 124548.75
Transfer/sec: 15.44MB
Pohjavirta
partial-
140000
• Writing JSON
• data.json (Clojure)
• Cheshire (Clojure+Java)
• Jsonista (Java + Clojure api)
• Jackson (Java)
Jsonista
(require '[jsonista.core :as j])
(j/write-value-as-string {"hello" 1})
;; => "{"hello":1}"
(j/read-value *1)
;; => {"hello" 1}
~zero-overhead encoding
• Fast Mapping of RDB Data
• Transformation Compiler
• Async, JDBC & jdbc.next
• 450k req/sec onTFB
• 55k req/sec on MacBook
Porsas
(require '[porsas.async :as pa])
(require '[promesa.core :as p])
(def pool
(pa/pool
{:uri "postgresql://localhost:5432/hello_world"
:user "benchmarkdbuser"
:password "benchmarkdbpass"
:size 16}))
(-> (pa/query-one
pool
[“SELECT name from WORLD where id=$1" 1])
(p/chain :name prn))
; #<Promise[~]>
; prints “kikka”
~zero-overhead mapping
What’s Next?
Welcome to the Future!
• GraalVM Profile-Guided
Optimizations (PGO)
• Project Loom: Fibers and
Continuations for the Java
Virtual Machine
• Phoenix Framework
The Libraries
Non-blocking HTTP Server
Data-driven Router
Java-fast JSON Formatter
Java-fast Sync & Async SQL (and jdbc.next)
Fast Interceptors for Clojure/Script
Data-Driven models & validation
Beautiful error messages
Check these out
• ZachTellman - Predictably Fast Clojure
• Tom Crayford - Performance and Lies
• Jonas Östlund - Faster Computations with Generative Expressions
• Thomas Wuerhinger - Maximizing Performance with GraalVM
Naked Performance
• We can create fast libraries and web apps with Clojure
1. Mind the LittleThings (for the 3% of code)
2. Data and Compilers (also better design & DX)
3. Right Algorithms
4. Embrase Java (and the GraalVM)
• Always Measure, http://clojure-goes-fast.com/
Thanks.
Tommi Reiman
tommi@metosin.fi
@ikitommi
https://www.github.com/metosin
talvi.io

Naked Performance With Clojure

  • 1.
    Naked Performance (with Clojure) TommiReiman 27.9.2019 ClojuTRE
  • 2.
    Programmers waste enormousamounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. On Performance Optimization Donald Knuth
  • 3.
    Programmers waste enormousamounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. On Performance Optimization Donald Knuth
  • 4.
    Programmers waste enormousamounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. On Performance Optimization Donald Knuth
  • 5.
    • The Language •The Libraries • The Frameworks • (The Application) The 3% in Clojure
  • 6.
    TechEmpower Web FrameworkBenchmarks 2017 5x slower than Java :( Best we can do?
  • 7.
    TechEmpower Web FrameworkBenchmarks 2019 :) direct-compilation fastest java server reitit-based stack ~600ns budget
  • 8.
  • 9.
    Performant Clojure Code •Reflection, Boxed Math, DynamicVars, … • Measure and read the source • Cost of Abstractions • Cost of Immutability
  • 11.
    (defrecord Request [request-methoduri]) (import (java.util HashMap)) (let [req1 (hash-map :request-method :get, :uri "/ping") req2 (array-map :request-method :get, :uri "/ping") req3 (HashMap. {:request-method :get, :uri "/ping"}) req4 (!"Request :get "/ping")] #$ 17ns (hash-map) (cc/quick-bench (:uri req1)) #$ 7ns (array-map) (cc/quick-bench (:uri req2)) #$ 5ns (mutable-map) (cc/quick-bench (.get req3 :uri)) #$ 3ns (record) (cc/quick-bench (:uri req4)) )
  • 12.
    #$ 200ns (cc/quick-bench (merge {}{})) #$ 400ns (cc/quick-bench (merge {} {} {})) #$ 600ns (cc/quick-bench (merge {} {} {} {})) #$ 800ns (cc/quick-bench (merge {} {} {} {} {})) #$ 1ϡ &' 200ns
  • 13.
    2: Data &Compilers
  • 14.
    (def Event [:map [:id string?] [:tags[:set keyword?]] [:address [:map [:street string?] [:lonlat [:tuple double? double?]]]]]) ;; run transformation compiler (def json->Event (m/transformer Event mt/json-transformer)) ;; 500ns (vs 95µs in spec-tools) (json->Event some-json)
  • 16.
  • 20.
    (require '[compojure.core :asc]) (defn ping-handler [_] {:status 200 :headers {"Content-Type" "text/plain"} :body "pong"}) (def app (c/context "/api" [] (c/context "/status" [] (c/GET "/ping" [] ping-handler)))) (app {:request-method :get, :uri "/api/status/ping"}) ;{:status 200 ; :headers {"Content-Type" "text/plain"} ; :body "pong"}
  • 21.
  • 22.
    (require '[reitit.ring :asring]) (def app (ring/ring-handler (ring/router ["/api" ["/status" ["/ping" {:get ping-handler}]]]))) (app {:request-method :get, :uri "/api/status/ping"}) ;{:status 200 ; :headers {"Content-Type" "text/plain"} ; :body "hello"}
  • 23.
  • 24.
  • 25.
    • SingleStaticPath • Lookup •Trie • Mixed • Quarantine • Linear (Brute Force) Routing Algorithms
  • 26.
    Let’s go Crazy •Create a router with 1000 generated routes, each with 10-20 fragments, generated from clojure.core function names + arguments • Add a health check route /api/ping as the last route • Benchmark health check + longest generated path
  • 27.
    (!" router (r/routes)(count)) ; *+ 1001 (!" router (r/router-name)) ; *+ :mixed-router (r/match-by-path router "/api/ping") ;#Match{:template "/api/ping", ; :data {:name :user/ping}, ; :result nil, ; :path-params {}, ; :path "/api/ping"} (r/match-by-path router "/defonce/unchecked-char/rand/unchecked-float/assoc/:val/ remove-all-methods/with-loading-context/boolean-array/alter/:args/char-name-string/ unchecked-negate-int/ex-info/aset/:val/extend-type/:specs/unchecked-multiply-int/ next/chars/prefer-method/:dispatch-val-y/shuffle") ;#Match{:template "/defonce/unchecked-char/rand/,-." ; :data {:name :779}, ; :result nil, ; :path-params {:val ":val" ; :args ":args" ; :specs ":specs" ; :dispatch-val-y ":dispatch-val-y"}, ; :path "/defonce/unchecked-char/rand/,-."}
  • 28.
  • 29.
  • 30.
    • Undertow/Java • XNIO,ByteBuffers • Threads • java.util.concurrent • ZeroCopy Requests • 3.9M text/plain req/sec (require '[pohjavirta.server :as server]) (defn handler [_] {:status 200, :headers {"Content-Type" "text/plain"} :body "Hello, World!"}) ;; create and start the server (-> handler server/create server/start) $ wrk -t16 -c16 -d2s http://127.0.0.1:8080 Running 2s test @ http://127.0.0.1:8080 16 threads and 16 connections Thread Stats Avg Stdev Max +/- Stdev Latency 125.96us 37.69us 1.61ms 94.95% Req/Sec 7.82k 530.24 8.55k 83.63% 261483 requests in 2.10s, 32.42MB read Requests/sec: 124548.75 Transfer/sec: 15.44MB Pohjavirta partial- 140000
  • 31.
    • Writing JSON •data.json (Clojure) • Cheshire (Clojure+Java) • Jsonista (Java + Clojure api) • Jackson (Java) Jsonista (require '[jsonista.core :as j]) (j/write-value-as-string {"hello" 1}) ;; => "{"hello":1}" (j/read-value *1) ;; => {"hello" 1} ~zero-overhead encoding
  • 33.
    • Fast Mappingof RDB Data • Transformation Compiler • Async, JDBC & jdbc.next • 450k req/sec onTFB • 55k req/sec on MacBook Porsas (require '[porsas.async :as pa]) (require '[promesa.core :as p]) (def pool (pa/pool {:uri "postgresql://localhost:5432/hello_world" :user "benchmarkdbuser" :password "benchmarkdbpass" :size 16})) (-> (pa/query-one pool [“SELECT name from WORLD where id=$1" 1]) (p/chain :name prn)) ; #<Promise[~]> ; prints “kikka” ~zero-overhead mapping
  • 35.
  • 36.
    Welcome to theFuture! • GraalVM Profile-Guided Optimizations (PGO) • Project Loom: Fibers and Continuations for the Java Virtual Machine • Phoenix Framework
  • 37.
    The Libraries Non-blocking HTTPServer Data-driven Router Java-fast JSON Formatter Java-fast Sync & Async SQL (and jdbc.next) Fast Interceptors for Clojure/Script Data-Driven models & validation Beautiful error messages
  • 39.
    Check these out •ZachTellman - Predictably Fast Clojure • Tom Crayford - Performance and Lies • Jonas Östlund - Faster Computations with Generative Expressions • Thomas Wuerhinger - Maximizing Performance with GraalVM
  • 40.
    Naked Performance • Wecan create fast libraries and web apps with Clojure 1. Mind the LittleThings (for the 3% of code) 2. Data and Compilers (also better design & DX) 3. Right Algorithms 4. Embrase Java (and the GraalVM) • Always Measure, http://clojure-goes-fast.com/
  • 41.