Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Speed bumps ahead

152 views

Published on

Things in Clojure that can slow down your program and ways to deal with them. Slides from ClojureX 2018 talk.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Speed bumps ahead

  1. 1. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 1/75 1 . 1
  2. 2. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 2/75 WHENCE?WHENCE? NLP Infrastructure Technical Lead @ Grammarly Clojure, Common Lisp, Java Services that improve writing of 30 million users (15 million daily) 2 . 1
  3. 3. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 3/75 WHY SHOULD YOU CARE ABOUTWHY SHOULD YOU CARE ABOUT PERFORMANCE?PERFORMANCE? premature optimization is the root of all evil. — Donald Knuth 3 . 1
  4. 4. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 4/75 WHY SHOULD YOU CARE ABOUTWHY SHOULD YOU CARE ABOUT PERFORMANCE?PERFORMANCE? "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." — Donald Knuth 4 . 1
  5. 5. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 5/75 PERFORMANCE OPTIMIZATION FALLACIESPERFORMANCE OPTIMIZATION FALLACIES "Hardware is cheap, programmers are expensive." A.K.A. "Just throw more machines into it." 5 . 1
  6. 6. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 6/75 PERFORMANCE OPTIMIZATION FALLACIESPERFORMANCE OPTIMIZATION FALLACIES   3 c5.9xlarge EC2 instances:    $3,351 monthly 10 c5.9xlarge EC2 instances: $11,170 monthly Is it worth to spend 1 person-month to optimize from 10 to 3? Probably. 6 . 1
  7. 7. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 7/75 PERFORMANCE OPTIMIZATION FALLACIESPERFORMANCE OPTIMIZATION FALLACIES "Hardware is cheap, programmers are expensive." A.K.A. "Just throw more machines into it." "Docker/Kubernetes/microservices/cloud/whatever allows you to scale horizontally." 7 . 1
  8. 8. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 8/75 PERFORMANCE OPTIMIZATION FALLACIESPERFORMANCE OPTIMIZATION FALLACIES There's no such thing as effortless horizontal scaling. At each next order of magnitude you get new headaches: More infrastructure (balancers, service discovery, queues, …) Configuration management Observability Deployment story Debugging story Complexity of setting up testing environments Whole bunch of second-order effects Mental tax You hire more devops/platform engineers/SREs to deal with this. 8 . 1
  9. 9. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 9/75 PERFORMANCE OPTIMIZATION FALLACIESPERFORMANCE OPTIMIZATION FALLACIES "Hardware is cheap, programmers are expensive." A.K.A. "Just throw more machines into it." "Docker/Kubernetes/microservices/cloud/whatever allow us to scale horizontally." 9 . 1
  10. 10. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 10/75 WHY SHOULD YOU CARE ABOUTWHY SHOULD YOU CARE ABOUT PERFORMANCE?PERFORMANCE? Ability to distinguish between those 97% and 3% is crucial in building effective so ware. That ability requires: Knowledge Tools Experience Experience comes from practice. 10 . 1
  11. 11. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 11/75 WHAT CLOJURE HAS TO DO WITH ANY OFWHAT CLOJURE HAS TO DO WITH ANY OF THIS?THIS? 11 . 1
  12. 12. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 12/75 CLOJURE IS FASTCLOJURE IS FAST Dynamically compiled language World-class JVM JIT for free Data structures with performance in mind Conservative polymorphism features Ability to drop down to Java where necessary 12 . 1
  13. 13. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 13/75 CLOJURE IS VERSATILECLOJURE IS VERSATILE REPL is the best so ware design tool you can get. Applies to performance work too. Hundreds of people work on creating tools for measuring and improving performance on JVM. Easily usable from Clojure. 13 . 1
  14. 14. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 14/75 WAYS TO MEASURE HOW FAST/SLOW ISWAYS TO MEASURE HOW FAST/SLOW IS SOMETHINGSOMETHING 1. "Feels slow" 2. Wrist stopwatch 3. (time ...) 4. (time (dotimes [_ 10000] ...) 5. Criterium 14 . 1
  15. 15. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 15/75 REFLECTIONREFLECTION 15 . 1
  16. 16. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 16/75 REFLECTIONREFLECTION 185x speedup! But why? (require '[criterium.core :as crit]) (def s "This gotta be good") (crit/quick-bench (.substring s 5 18)) ;; Execution time mean : 2.760464 µs (crit/quick-bench (.substring ^String s 5 18)) ;; Execution time mean : 14.897897 ns 16 . 1
  17. 17. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 17/75 REFLECTIONREFLECTION Reflection is Java's introspection mechanism for resolving and calling the program's building blocks (classes, fields, methods) at runtime. In the same spirit as Clojure's resolve, ns-publics, apply. Common explanation is "reflection is slow". 17 . 1
  18. 18. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 18/75 REFLECTIONREFLECTION We can use Java Reflection directly from Clojure. Turns out the reflective call itself is not that slow. Maybe it's the resolution of the method? (def m (.getDeclaredMethod String "substring" (into-array Class [Integer/TYPE Integer/TYPE]))) ;; returns java.lang.reflect.Method object (crit/quick-bench (.invoke ^Method m s (object-array [(Integer. 5) (Integer. 18)]))) ;; Execution time mean : 107.801748 ns (crit/quick-bench (let [^Method m (.getDeclaredMethod String "substring" (into-array Class [Integer/TYPE Integer/TYPE]))] (.invoke m string (object-array [(Integer. 5) (Integer. 18)])))) ;; Execution time mean : 648.579085 ns 18 . 1
  19. 19. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 19/75 REFLECTIONREFLECTION What's really going on when Clojure performs a reflective call? One way is to dig into clojure.lang.Compiler (9k SLOC). Another way is to use clj-java-decompiler library. 19 . 1
  20. 20. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 20/75 CLJ-JAVA-DECOMPILERCLJ-JAVA-DECOMPILER Non-reflective call: And you'll get: https://github.com/clojure-goes-fast/clj-java-decompiler (require '[clj-java-decompiler.core :refer [decompile]]) (decompile (.substring ^String s 5 18)) Var const__0 = RT.var("slides", "s"); // ... ((String)const__0.getRawRoot()).substring(RT.intCast(5L), RT.intCast(18L)); 20 . 1
  21. 21. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 21/75 CLJ-JAVA-DECOMPILERCLJ-JAVA-DECOMPILER Reflective call: (decompile (.substring s 5 18)) Var const__0 = RT.var("slides", "s"); Object const__1 = 5L; Object const__2 = 18L; // ... Reflector.invokeInstanceMethod(const__0.getRawRoot(), "substring", new Object[] { const__1, const__2 }); 21 . 1
  22. 22. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 22/75 INSIDE CLOJURE/LANG/REFLECTOR.JAVAINSIDE CLOJURE/LANG/REFLECTOR.JAVA static Object invokeInstanceMethod(Object target, String methodName, Object[] args) { Class c = target.getClass(); List methods = getMethods(c, args.length, methodName, false); return invokeMatchingMethod(methodName, methods, target, args); } static List getMethods(Class c, int arity, String name, boolean getStatics) { ArrayList methods = new ArrayList(); for (Method m : c.getMethods()) if (name.equals(method.getName())) methods.add(method); return methods; } static Object invokeMatchingMethod(String methodName, List methods, Object target, Object[] args) { Method foundm = null; for (Method m : methods) { Class[] params = m.getParameterTypes(); if(isCongruent(params, args)) foundm = m; } foundm.invoke(target, args); } 22 . 1
  23. 23. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 23/75 REFLECTIONREFLECTION On a reflective call, Clojure looks through all methods of the class linearly, at runtime. No wonder why reflective calls are so slow! 23 . 1
  24. 24. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 24/75 WAYS TO COMBAT REFLECTIONWAYS TO COMBAT REFLECTION Enable *warn-on-reflection* Use type hints And occasionally check with clj-java-decompiler. (set! *warn-on-reflection* true) (.substring s 5 18) ;; Reflection warning, .../slides.clj:114:12 - call to ;; method substring can't be resolved (target class is unknown). 24 . 1
  25. 25. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 25/75 SHOULD REFLECTION BE WEEDED OUTSHOULD REFLECTION BE WEEDED OUT EVERYWHERE?EVERYWHERE? There's nothing wrong with having zero-reflection policy. But a few stray reflection calls won't hurt if they aren't called o en. You should profile to know for sure. 25 . 1
  26. 26. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 26/75 CLJ-ASYNC-PROFILERCLJ-ASYNC-PROFILER The most convenient profiler-as-a-library for Clojure. https://github.com/clojure-goes-fast/clj-async-profiler (require '[clj-async-profiler.core :as prof]) (prof/profile (crit/quick-bench (.substring s 5 18))) 26 . 1
  27. 27. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 27/75 Flame Graph Search clojure/lang/RestFn.applyTo clojure/core$eval.invoke java/lang/Class$MethodAr.. criterium/core$benchmark_STAR_.invokeStatic clojure/lang/AFn.applyToHelper criterium/core$run_benchmark.invokeStatic clojure/lang/Reflector.getMethods cl.. nrepl/middleware/interruptible_eval$evaluate$fn__1732.invoke jav.. java/util/A.. criterium/core$quick_benchmark_STAR_.invokeStatic c.. crite.. cloju.. criterium/core$execute_expr_core_timed_part.invoke clojure/main$repl$read_eval_print__8572$fn__8575.invoke jav.. cloju.. criterium/core$warmup_for_jit.invoke java/lang/Class$M.. slides$eval15051.invoke criterium/core$execute_expr.invoke jav..java/lang/Class.privateGetPublicMethods criterium/core$warmup_for_jit.invokeStatic java/util/A.. clojure/lang/Compiler.eval clojure/lang/AFn.applyTo clojure/main$repl.doInvoke clojure/lang/RestFn.invoke java/lang/Class$MethodAr.. j.. clojure/core$apply.invoke criterium/core$execute_expr_core_timed_part$fn__14416.invoke clojure/core$apply.invokeStatic jav.. slides$eval15051$fn__15052.invoke cr.. cloj.. j.. clojure/core$with_bindings_STAR_.invokeStatic crite.. criterium/core$execute_expr_core_timed_part.invokeStatic slide.. criterium/core$quick_benchmark_STAR_.invoke slides$eval15051$fn__15052$fn__15053.invoke java/lang/Class.getMethods clojure/main$repl.invokeStatic crite.. sl.. clojure/main$repl$read_eval_print__8572.invoke refactor_nrepl/ns/slam/hound/regrow$wrap_clojure_repl$fn__10916.doInvoke criter.. clojure/lang/Reflector.invokeInstanceMethod cr.. clojure/core$eval.invokeStatic criterium/core$execute_expr.invokeStatic slides$eval15051.invokeStatic cr.. java/uti.. cr.. cr.. clojure/lang/RestFn.invoke clojure/lang/Compiler.eval crite.. clojure/core$with_bindings_STAR_.doInvoke clojure/core$apply.invokeStatic jav.. clojure/main$repl$fn__8581.invoke cr.. java/lang/Class$M.. criterium/core$run_benchmark.invoke criterium/core$benchmark_STAR_.invoke crite.. cr.. c.. jav.. jav.. cr.. j.. jav.. cr.. criter..
  28. 28. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 28/75 nrepl/middleware/interruptible_eval$interruptible_eval$fn__1775$fn__1778.invoke clojure/lang/AFn.run java/util/concurrent/ThreadPoolExecutor.runWorker nrepl/middleware/interruptible_eval$evaluate.invoke java/lang/Thread.run java/util/concurrent/ThreadPoolExecutor$Worker.run nrepl/middleware/interruptible_eval$evaluate.invokeStatic nrepl/middleware/interruptible_eval$run_next$fn__1770.invoke 27 . 1
  29. 29. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 29/75 Flame Graph Search j.. ja.. cl.. clojure/core$eval.invoke clojure/edn$read_string.invokeStatic j.. c.. jav.. cl.. clojure/lang/EdnReader.readDelimitedList java/.. c.. clojure/lang/Compiler.eval cloj.. clojure/lang/EdnRead.. clojure/lang/EdnReader.readString ja.. boot/user$eval16156$fn__16157.invoke clojure/la.. j.. boot/user$eval16156.invoke cheshire/parse$parse.inv.. java/util/re.. cloj.. ja.. clojure/lang/EdnReader.readDelimitedList j.. java.. cheshire/parse$parse.inv.. clojure/.. clojure/lang/EdnReader.readDelimitedList java/.. cl.. clojure/lang/EdnReader.read cl.. clojure/lang/EdnReader$.. clojure/edn$read_string.invokeStatic cheshire/parse$p.. clojure/lang/Compiler.eval cl.. clojure/lang/EdnReader$MapReader.invoke clojure/lang/EdnReader.readDelimitedList cloju.. c.. clojure/lang/EdnReader$MapReader.invoke ja.. cheshire/parse.. java/util/r.. co.. clojure/main$repl$read_eval_print__8572$fn__8575.invoke ja.. clojure/lang/EdnReader.. c.. cl.. cheshire/core$parse_string.i.. c.. j.. c.. clojure/lang/EdnReader.read jav.. boot/user$parse_edn.invoke ja.. boot/user$run.invokeStatic c.. cloj.. j..clojure/lang/EdnRead.. j.. ja.. ja.. ja.. ja.. boot/user$parse_json.invokeSt.. clojure/lang/RestFn.applyTo clojure.. ja.. clojure/lang/EdnReader.read j.. j.. clojure/lang/EdnReader.readDelimitedList clojure/lang/EdnReader$MapReader.invoke clojure/lang/EdnReader$MapReader.invoke ja.. c.. java/util/r.. cloj.. boot/user$eval16156.invokeStatic c.. cheshire/core$parse_string.i.. clojure/main$repl$read_eval_print__8572.invoke clojure/core$eval.invokeStatic j..java/util/re.. cheshire/parse$pars.. cloj.. cl.. cheshire/parse$parse_ST.. boot/user$parse_json.invoke cheshire/parse$pars.. clojure.. cloj.. ches.. ja.. j.. j.. cloju.. clojure/lang/EdnReader$MapReader.invoke c..ches.. cheshire/parse$p.. java.. boot/user$parse_edn.invokeStatic cl.. clojure/edn$read_string.invoke cheshire/parse$parse_.. boot/user$run.invoke java/io/.. clojure/main$repl.invokeStatic cl.. clo.. cheshire/parse.. jav.. java/util/re.. clojure/main$repl$fn__8581.invoke cheshire/core$parse_string.i.. cheshire/parse$parse_ST.. cheshire/core$parse_string.i.. cloj.. cloju.. clojure/main$repl.doInvoke cheshire/parse$parse_..
  30. 30. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 30/75 nrepl/middleware/interruptible_eval$interruptible_eval$fn__1775$fn__1778.invoke clojure/core$apply.invokeStatic java/util/concurrent/ThreadPoolExecutor.runWorker nrepl/middleware/interruptible_eval$run_next$fn__1770.invoke java/util/concurrent/ThreadPoolExecutor$Worker.run clojure/lang/AFn.run clojure/lang/RestFn.invoke clojure/lang/AFn.applyToHelper clojure/lang/AFn.applyTo clojure/core$apply.invoke nrepl/middleware/interruptible_eval$evaluate.invoke nrepl/middleware/interruptible_eval$evaluate.invokeStatic clojure/lang/RestFn.invoke clojure/core$with_bindings_STAR_.invokeStatic clojure/core$apply.invokeStatic nrepl/middleware/interruptible_eval$evaluate$fn__1732.invoke refactor_nrepl/ns/slam/hound/regrow$wrap_clojure_repl$fn__10916.doInvoke java/lang/Thread.run clojure/core$with_bindings_STAR_.doInvoke 28 . 1
  31. 31. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 31/75 CLJ-ASYNC-PROFILERCLJ-ASYNC-PROFILER Profiler that is controllable from your code. Instant feedback without leaving the REPL. Flamegraphs are a great representation. Intuitive and portable. 29 . 1
  32. 32. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 32/75 BOXINGBOXING 30 . 1
  33. 33. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 33/75 BOXINGBOXING 31 . 1
  34. 34. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 34/75 BOXINGBOXING Boxing means wrapping primitive types into objects. 19x difference — not bad! (let [nums (vec (range 1e6))] (crit/quick-bench (reduce + nums))) ;; Execution time mean : 18.384708 ms (let [^longs nums (into-array Long/TYPE (range 1e6))] (crit/quick-bench (areduce nums i acc 0 (+ acc (aget nums i))))) ;; Execution time mean : 971.487253 µs 32 . 1
  35. 35. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 35/75 BOXINGBOXING (decompile (let [^longs nums (into-array Long/TYPE (range 1e6))] (areduce nums i acc 0 (+ acc (aget nums i))))) final Object nums = core$into_array.invokeStatic( (Object)Long.TYPE, core$range.invokeStatic(100000)); final int lng = ((long[])nums).length; long i = 0L; long acc = 0L; while (i < lng) { final long n = RT.intCast(i) + 1; acc = Numbers.add(acc, ((long[])nums)[RT.intCast(i)]); i = n; } return Numbers.num(acc); 33 . 1
  36. 36. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 36/75 WAYS TO COMBAT BOXINGWAYS TO COMBAT BOXING Profile to ensure that boxing is really a problem. Arrays instead of lists and vectors. Primitive type hints and casts. (set! *unchecked-math* :warn-on-boxed) 34 . 1
  37. 37. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 37/75 :WARN-ON-BOXED:WARN-ON-BOXED (set! *unchecked-math* :warn-on-boxed) (let [init (fn [] 1)] (loop [i (init), res (init)] (if (< i 10) (recur (inc i) (* res i)) res))) ;; Boxed math warning, ../slides.clj:25:9 - ;; call: clojure.lang.Numbers.lt(Object,long). ;; Boxed math warning, ../slides.clj:26:14 - ;; call: clojure.lang.Numbers.unchecked_inc(Object). ;; Boxed math warning, ../slides.clj:26:22 - ;; call: clojure.lang.Numbers.unchecked_multiply(Object,Object). 35 . 1
  38. 38. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 38/75 :WARN-ON-BOXED:WARN-ON-BOXED (set! *unchecked-math* :warn-on-boxed) (let [init (fn [] 1)] (loop [i (long (init)), res (long (init))] (if (< i 10) (recur (inc i) (* res i)) res))) 36 . 1
  39. 39. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 39/75 WAYS TO COMBAT BOXINGWAYS TO COMBAT BOXING Profile to ensure that boxing is really a problem. Arrays instead of lists and vectors. Primitive type hints. (set! *unchecked-math* :warn-on-boxed) clj-java-decompiler 37 . 1
  40. 40. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 40/75 CLJ-JAVA-DECOMPILERCLJ-JAVA-DECOMPILER (decompile (let [init (fn [] 1)] (loop [i (init), res (init)] (if (< i 10) (recur (inc i) (* res i)) res)))) Object init = new slides$fn__17198$init__17199(); Object i = ((IFn)init).invoke(); Object res = ((IFn)init).invoke(); while (Numbers.lt(i, 10L)) { final Object i2 = Numbers.inc(i); res = Numbers.multiply(res, i); i = i2; } return res; 38 . 1
  41. 41. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 41/75 CLJ-JAVA-DECOMPILERCLJ-JAVA-DECOMPILER (decompile (let [init (fn [] 1)] (loop [i (long (init)), res (long (init))] (if (< i 10) (recur (inc i) (* res i)) res)))) Object init = new slides$fn__17198$init__17199(); long i = ((IFn)init).invoke(); long res = ((IFn)init).invoke(); while (Numbers.lt(i, 10L)) { final Object i2 = Numbers.inc(i); res = Numbers.multiply(res, i); i = i2; } return res; 39 . 1
  42. 42. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 42/75 WAYS TO COMBAT BOXINGWAYS TO COMBAT BOXING Profile to ensure that boxing is really a problem. Arrays instead of lists and vectors. Primitive type hints. (set! *unchecked-math* :warn-on-boxed) clj-java-decompiler Write number crunching in Java. 40 . 1
  43. 43. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 43/75 WRITE JAVA IN STYLEWRITE JAVA IN STYLE Compile Java code without leaving or restarting your REPL. Use new classes immediately in your Clojure code. You still have the access to all Clojure development tools. https://github.com/ztellman/virgil 41 . 1
  44. 44. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 44/75 INSUFFICIENT MEMORYINSUFFICIENT MEMORY 42 . 1
  45. 45. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 45/75 INSUFFICIENT MEMORYINSUFFICIENT MEMORY If you don't specify -Xmx, JVM will start with default heap size. Usually, it's 1/4 of available RAM. $ java -XX:+PrintFlagsFinal -version | grep MaxHeapSize uintx MaxHeapSize := 2353004544 43 . 1
  46. 46. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 46/75 INSUFFICIENT MEMORYINSUFFICIENT MEMORY If there's not enough free memory, GC might get too busy and slow you down. Same code, 7x slower without any apparent reason. $ clj -J-Xmx2g user=> (time (dotimes [_ 10] (reduce + (vec (repeat 5e7 1))))) ;; Elapsed time: 17084.184121 msecs user=> (def live-set (byte-array 1.3e9)) user=> (time (dotimes [_ 10] (reduce + (vec (repeat 5e7 1))))) ;; Elapsed time: 125614.873082 msecs 44 . 1
  47. 47. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 47/75 WAYS TO DETECT MEMORY SHORTAGEWAYS TO DETECT MEMORY SHORTAGE (In development) VisualVM 45 . 1
  48. 48. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 48/75 VISUALVMVISUALVM Normal: 46 . 1
  49. 49. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 49/75 VISUALVMVISUALVM GC is overworked: 47 . 1
  50. 50. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 50/75 WAYS TO DETECT MEMORY SHORTAGEWAYS TO DETECT MEMORY SHORTAGE (In development) VisualVM (In production) VisualVM over JMX, jstat, … clj-memory-meter to understand what occupies memory. 48 . 1
  51. 51. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 51/75 CLJ-MEMORY-METERCLJ-MEMORY-METER Reports how much heap an object consumes. https://github.com/clojure-goes-fast/clj-memory-meter (require '[clj-memory-meter.core :as mm]) (mm/measure "Hello, world!") ;; "72 B" (mm/measure (reduce #(assoc %1 %2 (str %2)) {} (range 100))) ;; "9.6 KB" (mm/measure (vec (repeat 5e7 1))) ;; "258.4 MB" 49 . 1
  52. 52. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 52/75 IMMUTABILITYIMMUTABILITY 50 . 1
  53. 53. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 53/75 IMMUTABILITYIMMUTABILITY We love immutability, but sometimes, it is unnecessary. (crit/quick-bench (let [obj (Object.)] (loop [i 0, res []] (if (< i 1e6) (recur (inc i) (conj res obj)) res)))) ;; Execution time mean : 31.455536 ms 51 . 1
  54. 54. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 54/75 WAYS TO COMBAT IMMUTABILITYWAYS TO COMBAT IMMUTABILITY Profiler Transients 2.2x speedup. (crit/quick-bench (let [obj (Object.)] (loop [i 0, res (transient [])] (if (< i 1e6) (recur (inc i) (conj! res obj)) (persistent! res))))) ;; Execution time mean : 14.115719 ms 52 . 1
  55. 55. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 55/75 WAYS TO COMBAT IMMUTABILITYWAYS TO COMBAT IMMUTABILITY Profiler Transients Mutable Java collections 5x speedup. (crit/quick-bench (let [obj (Object.) res (ArrayList.)] (loop [i 0] (when (< i 1e6) (.add res obj) (recur (inc i)))) res)) ;; Execution time mean : 6.344132 ms 53 . 1
  56. 56. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 56/75 CAVEAT EMPTORCAVEAT EMPTOR If you need the resulting collection to be a Clojure structure, transients are more efficient than Java classes. (crit/quick-bench (let [obj (Object.) res (ArrayList.)] (loop [i 0] (when (< i 1e6) (.add res obj) (recur (inc i)))) (vec res))) ;; Execution time mean : 19.435359 ms 54 . 1
  57. 57. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 57/75 LAZINESSLAZINESS 55 . 1
  58. 58. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 58/75 LAZINESSLAZINESS Increases allocation pressure -> more work for GC Worse memory locality Harder to debug and profile 56 . 1
  59. 59. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 59/75 LAZINESSLAZINESS Everyone did this at least once in their career: Wow, Clojure is fast! /s (time (dotimes [_ 1e6] (map inc (range 1e6)))) ;; Elapsed time: 30.931708 msecs 57 . 1
  60. 60. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 60/75 LAZINESSLAZINESS (defn burn-cpu [] (let [start (System/nanoTime)] (loop [res 0] (if (< (- (System/nanoTime) start) 1e9) (recur (inc res)) res)))) (prof/profile (let [nums (map (fn [_] (burn-cpu)) (range 10))] (reduce + nums))) 58 . 1
  61. 61. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 61/75 LAZINESSLAZINESS Flame Graph Search clojure/core$apply.invoke clojure/core$eval.invoke nrepl/middleware/interruptible_eval$evaluate.invoke clojure/core/protocols$seq_reduce.invokeStatic clojure/core/protocols$fn__7835.invokeStatic clojure/lang/RestFn.invoke clojure/core$reduce.invoke clojure/lang/Compiler.eval clojure/core/protocols$fn__7781$G__7776__7794.invoke slides$eval16450$fn__16451$fn__16452.invoke nrepl/middleware/interruptible_eval$interruptible_eval$fn__1775$fn__1778.invoke clojure/lang/Numbe.. clojure/core$with_bindings_STAR_.doInvoke clojure/core$map$fn__5587.invoke clojure/main$repl$read_eval_print__8572.invoke clojure/main$repl$read_eval_print__8572$fn__8575.invoke slides$eval16450.invoke clojure/lang/RestFn.applyTo clojure/main$repl$fn__8581.invoke slides$burn_cpu.invoke clojure/core$eval.invokeStatic clojure/main$repl.doInvoke clojure/core/protocols$fn__7835.invoke nrepl/middleware/interruptible_eval$evaluate.invokeStatic clojure/core$reduce.invokeStatic clojure/lang/LazySeq.sval refactor_nrepl/ns/slam/hound/regrow$wrap_clojure_repl$fn__10916.doInvoke clojure/core$seq__5124.invokeStatic slides$eval16450.invokeStatic clojure/core$apply.invokeStatic clojure/lang/AFn.applyTo clojure/lang/RT.seq clojure/lang/Compiler.eval clojure/core$with_bindings_STAR_.invokeStatic slides$eval16450$fn__16451.invoke clojure/lang/RestFn.invoke clojure/core$apply.invokeStatic clojure/lang/Numbers.minus clojure/lang/LazySeq.seq clojure/main$repl.invokeStatic nrepl/middleware/interruptible_eval$evaluate$fn__1732.invoke slides$burn_cpu.invokeStatic clojure/lang/AFn.applyToHelper
  62. 62. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 62/75 java/lang/Thread.run nrepl/middleware/interruptible_eval$run_next$fn__1770.invoke clojure/lang/AFn.run java/util/concurrent/ThreadPoolExecutor$Worker.run java/util/concurrent/ThreadPoolExecutor.runWorker 59 . 1
  63. 63. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 63/75 WAYS TO COMBAT LAZINESSWAYS TO COMBAT LAZINESS doall, mapv, filterv Transducers 60 . 1
  64. 64. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 64/75 TOP SPEED BUMPS SO FARTOP SPEED BUMPS SO FAR Reflection Boxing Insufficient memory Immutability Laziness 61 . 1
  65. 65. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 65/75 TOP SPEED BUMPS SO FARTOP SPEED BUMPS SO FAR Reflection Boxing Insufficient memory Immutability Laziness Redundant allocations Coarsely-synchronized data structures Context switching overhead … 62 . 1
  66. 66. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 66/75 TOP SPEED BUMPS SO FARTOP SPEED BUMPS SO FAR Reflection Boxing Insufficient memory Immutability Laziness Redundant allocations Coarsely-synchronized data structures Context switching overhead … GC pauses Megamorphic callsites Heap fragmentation … 63 . 1
  67. 67. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 67/75 TOP SPEED BUMPS SO FARTOP SPEED BUMPS SO FAR Reflection Boxing Insufficient memory Immutability Laziness Redundant allocations Coarsely-synchronized data structures Context switching overhead … GC pauses Megamorphic callsites Heap fragmentation … Cache incoherence TLB misses (page walks) Branch misprediction NUMA foreign access … 64 . 1
  68. 68. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 68/75 TOP SPEED BUMPS SO FARTOP SPEED BUMPS SO FAR Reflection Boxing Insufficient memory Immutability Laziness Redundant allocations Coarsely-synchronized data structures Context switching overhead … GC pauses Megamorphic callsites Heap fragmentation … Cache incoherence TLB misses (page walks) Branch misprediction NUMA foreign access … Magnetic disturbances CPU overheating … 65 . 1
  69. 69. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 69/75 PERFORMANCE IS HARDPERFORMANCE IS HARD Abstractions are constantly leaking. The more you learn, the less you know. Your assumptions are constantly getting invalidated. 66 . 1
  70. 70. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 70/75 PERFORMANCE IS FUN AND USEFULPERFORMANCE IS FUN AND USEFUL You learn things behind those leaky abstractions. You get a more holistic view of the system. You save money and the environment. 67 . 1
  71. 71. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 71/75 PERFORMANCE PROBLEMS ARE NOTPERFORMANCE PROBLEMS ARE NOT UNIQUE TO CLOJUREUNIQUE TO CLOJURE But we are in a great position to solve them. There is plenty of prior art, especially for JVM. Tools, blogposts, experiments, reports. REPL allows us to use all of this much more easily. 68 . 1
  72. 72. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 72/75 TOOLSTOOLS Criterium: clj-java-decompiler: clj-async-profiler: clj-memory-meter: virgil: VisualVM: JMH: https://github.com/hugoduncan/criterium https://github.com/clojure-goes-fast/clj-java- decompiler https://github.com/clojure-goes-fast/clj-async- profiler https://github.com/clojure-goes-fast/clj- memory-meter https://github.com/ztellman/virgil https://visualvm.github.io http://clojure-goes-fast.com/blog/using-jmh-with-clojure- part1/ 69 . 1
  73. 73. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 73/75 RESOURCESRESOURCES Aleksey Shipilëv's blog: Nitsan Wakart's blog: http://clojure-goes-fast.com https://groups.google.com/forum/#!forum/mechanical-sympathy https://shipilev.net/ http://psy-lob-saw.blogspot.com/ 70 . 1
  74. 74. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 74/75 INSTEAD OF A CONCLUSIONINSTEAD OF A CONCLUSION First, make it work. Then, make it right. Then, make it fast. But please, don't stop at the first. 71 . 1
  75. 75. 12/4/2018 Speed bumps ahead http://localhost:3002/clojurex-2018/?print-pdf 75/75 (into [] (map answer) questions) 72 . 1

×