Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Profiling and optimizing go programs

7,721 views

Published on

Из презентации вы узнаете:
про большинство утилит из арсенала Go, предназначенных для оптимизации производительности;
— как и когда их (утилиты) использовать, а также мы посмотрим как они устроены внутри;
— про применимость linux утилиты perf для оптимизации программ на Go.
Кроме того, устроим небольшой crash course, в рамках которого поэтапно соптимизируем несколько небольших программ на Go с использованием вышеперечисленных утилит.

Published in: Technology

Profiling and optimizing go programs

  1. 1. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 1/84 Pro ling and optimizing Go programs 14 July 2016 Marko Kevac Software Engineer, Badoo
  2. 2. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 2/84 Introduction
  3. 3. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 3/84 What is pro ling and optimization?
  4. 4. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 4/84 Pro ling on Linux
  5. 5. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 5/84 Pro ling on OSX OSX pro ling xed in El Capitan. Previous versions need binary patch. godoc.org/rsc.io/pprof_mac_ x(https://godoc.org/rsc.io/pprof_mac_ x)
  6. 6. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 6/84 CPU github.com/gperftools/gperftools(https://github.com/gperftools/gperftools)
  7. 7. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 7/84 CPU pprof is a sampling pro ler. All pro lers in Go can be started in a di erent ways, but all of them can be broken into collection and visualization phase. Example.
  8. 8. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 8/84 Example packageperftest import( "regexp" "strings" "testing" ) varhaystack=`Loremipsumdolorsitamet...auctor...elit...` funcBenchmarkSubstring(b*testing.B){ fori:=0;i<b.N;i++{ strings.Contains(haystack,"auctor") } } funcBenchmarkRegex(b*testing.B){ fori:=0;i<b.N;i++{ regexp.MatchString("auctor",haystack) } }
  9. 9. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 9/84 Benchmark $gotest-bench=. testing:warning:noteststorun BenchmarkSubstring-8 10000000 194ns/op BenchmarkRegex-8 200000 7516ns/op PASS ok github.com/mkevac/perftest00 3.789s
  10. 10. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 10/84 Pro ling $GOGC=offgotest-bench=BenchmarkRegex-cpuprofilecpu.out testing:warning:noteststorun BenchmarkRegex-8 200000 6773ns/op PASS ok github.com/mkevac/perftest00 1.491s GOGC=o turns o garbage collector Turning o GC can be bene cial for short programs. When started with -cpupro le, go test puts binary in our working dir.
  11. 11. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 11/84 Visualization Linux $gotoolpprofperftest00.testcpu.out (pprof)web OSX $openhttps://www.xquartz.org $ssh-Yserver $gotoolpprofperftest00.testcpu.out (pprof)web Other $gotoolpprof-svg./perftest00.test./cpu.out>cpu.svg $scp... $opencpu.svg
  12. 12. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 12/84 Visualization
  13. 13. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 13/84
  14. 14. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 14/84 Visualization
  15. 15. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 15/84 Fix packageperftest import( "regexp" "strings" "testing" ) varhaystack=`Loremipsumdolorsitamet...auctor...elit...` varpattern=regexp.MustCompile("auctor") funcBenchmarkSubstring(b*testing.B){ fori:=0;i<b.N;i++{ strings.Contains(haystack,"auctor") } } funcBenchmarkRegex(b*testing.B){ fori:=0;i<b.N;i++{ pattern.MatchString(haystack) } }
  16. 16. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 16/84 Benchmark $gotest-bench=. testing:warning:noteststorun BenchmarkSubstring-8 10000000 170ns/op BenchmarkRegex-8 5000000 297ns/op PASS ok github.com/mkevac/perftest01 3.685s What about call graph?
  17. 17. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 17/84 Visualization We don't see compilation at all.
  18. 18. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 18/84 Ways to start CPU pro ler 1. go test -cpupro le=cpu.out 2. pprof.StartCPUPro le() and pprof.StopCPUPro le() or Dave Cheney great package github.com/pkg/pro le(https://github.com/pkg/pro le) 3. import _ "net/http/pprof" Example
  19. 19. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 19/84 Example packagemain import( "net/http" _"net/http/pprof" ) funccpuhogger(){ varaccuint64 for{ acc+=1 ifacc&1==0{ acc<<=1 } } } funcmain(){ gohttp.ListenAndServe("0.0.0.0:8080",nil) cpuhogger() }
  20. 20. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 20/84 Visualization $gotoolpprofhttp://localhost:8080/debug/pprof/profile?seconds=5 (pprof)web (pprof)top 4.99sof4.99stotal( 100%) flat flat% sum% cum cum% 4.99s 100% 100% 4.99s 100% main.cpuhogger 0 0% 100% 4.99s 100% runtime.goexit 0 0% 100% 4.99s 100% runtime.main (pprof)listcpuhogger Total:4.99s Nosourceinformationformain.cpuhogger No disassembly? No source code? We need binary.
  21. 21. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 21/84 Visualization $gotoolpprofpproftesthttp://localhost:8080/debug/pprof/profile?seconds=5 (pprof)listcpuhogger Total:4.97s ROUTINE========================main.cpuhoggerin/home/marko/goprojects/src/github.com/mkevac/pproft 4.97s 4.97s(flat,cum) 100%ofTotal . . 6:) . . 7: . . 8:funccpuhogger(){ . . 9: varaccuint64 . . 10: for{ 2.29s 2.29s 11: acc+=1 1.14s 1.14s 12: ifacc&1==0{ 1.54s 1.54s 13: acc<<=1 . . 14: } . . 15: } . . 16:} . . 17: . . 18:funcmain(){
  22. 22. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 22/84 Visualization (pprof)disasmcpuhogger Total:4.97s ROUTINE========================main.cpuhogger 4.97s 4.97s(flat,cum) 100%ofTotal . . 401000:XORLAX,AX 1.75s 1.75s 401002:INCQAX 1.14s 1.14s 401005:TESTQ$0x1,AX . . 40100b:JNE0x401002 1.54s 1.54s 40100d:SHLQ$0x1,AX 540ms 540ms 401010:JMP0x401002 . . 401012:INT$0x3 Why? Let's dig deeper.
  23. 23. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 23/84 Why? $curlhttp://localhost:8080/debug/pprof/profile?seconds=5-o/tmp/cpu.log $strings/tmp/cpu.log|grepcpuhogger /debug/pprof/symbol for acquiring symbols binary for disassembly binary and source code for source code Currently there is no way to specify path to source code (same as "dir" command in gdb) :-( Binary that you give to pprof and binary that is running must be the same! Not deep enough?
  24. 24. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 24/84 How pprof works? 1. Current desktop and server OS's implement preemptive scheduling (https://en.wikipedia.org/wiki/Preemption_(computing))or preemptive multitasking (oposing to cooperative multitasking). 2. Hardware sends signal to OS and OS executes scheduler which can preempt working process and put other process on it's place. 3. pprof works in similar fashion. 4. man setitimer(http://man7.org/linux/man-pages/man2/setitimer.2.html)and SIGPROF 5. Go sets handler for SIGPROF which gets and saves stack traces for all goroutines/threads. 6. Separate goroutine gives this data to user. Bug in SIGPROF signal delivery(http://research.swtch.com/macpprof)was the reason why pro ling on OSX pre El Capitain did not work.
  25. 25. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 25/84 How pprof works? Cons 1. Signals are not cheap. Do not expect more than 500 signals per second. Default frequency in Go runtime is 100 HZ. 2. In non standard builds (-buildmode=c-archive or -buildmode=c-shared) pro ler do not work by default. 3. User space process do not have access to kernel stack trace. Pros Go runtime has all the knowledge about internal stu .
  26. 26. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 26/84 Linux system pro lers varhaystack=`Loremipsumdolorsitamet...auctor...elit...` funcUsingSubstring()bool{ found:=strings.Contains(haystack,"auctor") returnfound } funcUsingRegex()bool{ found,_:=regexp.MatchString("auctor",haystack) returnfound } funcmain(){ gofunc(){ for{ UsingSubstring() } }() for{ UsingRegex() } }
  27. 27. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 27/84 Systemtap Systemtap script -> C code -> Kernel module stap utility do all these things for you. Including kernel module loading and unloading.
  28. 28. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 28/84 Systemtap Getting probe list: $stap-l'process("systemtap").function("main.*")' process("systemtap").function("main.UsingRegex@main.go:16") process("systemtap").function("main.UsingSubstring@main.go:11") process("systemtap").function("main.init@main.go:32") process("systemtap").function("main.main.func1@main.go:22") process("systemtap").function("main.main@main.go:21") Getting probe list with function arguments $stap-L'process("systemtap").function("runtime.mallocgc")' process("systemtap").function("runtime.mallocgc@src/runtime/malloc.go:553") $shouldhelpgc:bool$noscan:bool$scanSize:uintptr$dataSize:uintptr$x:void*$s:structruntime.mspan* runtime.g*$size:uintptr$typ:runtime._type*$needzero:bool$~r3:void* Systemtap do not understand where Go keeps return value, so we can get in manually: printf("%dn",user_int64(register("rsp")+8))
  29. 29. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 29/84 Systemtap globaletime globalintervals probe$1.call { etime=gettimeofday_ns() } probe$1.return{ intervals<<<(gettimeofday_ns()-etime)/1000 } probeend{ printf("Durationmin:%dusavg:%dusmax:%duscount:%dn", @min(intervals),@avg(intervals),@max(intervals), @count(intervals)) printf("Duration(us):n") print(@hist_log(intervals)); printf("n") }
  30. 30. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 30/84 Systemtap $sudostapmain.stap'process("systemtap").function("main.UsingSubstring")' ^CDurationmin:0usavg:1usmax:586uscount:1628362 Duration(us): value|--------------------------------------------------count 0| 10 1|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1443040 2|@@@@@ 173089 4| 6982 8| 4321 16| 631 32| 197 64| 74 128| 13 256| 4 512| 1 1024| 0 2048| 0
  31. 31. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 31/84 Systemtap $./systemtap runtime:unexpectedreturnpcformain.UsingSubstringcalledfrom0x7fffffffe000 fatalerror:unknowncallerpc runtimestack: runtime.throw(0x494e40,0x11) /home/marko/go/src/runtime/panic.go:566+0x8b runtime.gentraceback(0xffffffffffffffff,0xc8200337a8,0x0,0xc820001d40,0x0,0x0,0x7fffffff,0x7fff /home/marko/go/src/runtime/traceback.go:311+0x138c runtime.scanstack(0xc820001d40) /home/marko/go/src/runtime/mgcmark.go:755+0x249 runtime.scang(0xc820001d40) /home/marko/go/src/runtime/proc.go:836+0x132 runtime.markroot.func1() /home/marko/go/src/runtime/mgcmark.go:234+0x55 runtime.systemstack(0x4e4f00) /home/marko/go/src/runtime/asm_amd64.s:298+0x79 runtime.mstart() /home/marko/go/src/runtime/proc.go:1087
  32. 32. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 32/84 Systemtap Crash when Go's garbage collector gets its call trace. Probably caused by trampoline that systemtap puts in our code to handle its probes. goo.gl/N8XH3p(https://goo.gl/N8XH3p) No x yet. But Go is not alone. There are problems with uretprobes trampoline in C++ too (https://sourceware.org/bugzilla/show_bug.cgi?id=12275)(2010-)
  33. 33. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 33/84 Systemtap packagemain import( "bytes" "fmt" "math/rand" "time" ) funcToString(numberint)string{ returnfmt.Sprintf("%d",number) } funcmain(){ r:=rand.New(rand.NewSource(time.Now().UnixNano())) varbufbytes.Buffer fori:=0;i<1000;i++{ value:=r.Int()%1000 value=value-500 buf.WriteString(ToString(value)) } }
  34. 34. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 34/84 Systemtap globalintervals probeprocess("systemtap02").function("main.ToString").call { intervals<<<$number } probeend{ printf("Variablesmin:%dusavg:%dusmax:%duscount:%dn", @min(intervals),@avg(intervals),@max(intervals), @count(intervals)) printf("Variables:n") print(@hist_log(intervals)); printf("n") }
  35. 35. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 35/84 Systemtap Variablesmin:-499usavg:8usmax:497uscount:1000 Variables: value|--------------------------------------------------count -256|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 249 -128|@@@@@@@@@@@@@@@@@@@@ 121 -64|@@@@@@@@@@ 60 -32|@@@@@@ 36 -16|@@ 12 -8|@ 8 -4| 5 -2| 3 -1| 2 0| 2 1| 2 2| 3 4|@ 7 8| 4 16|@@@ 20 32|@@@@@ 33 64|@@@@@@@ 44 128|@@@@@@@@@@@@@@@@@@ 110 256|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 279
  36. 36. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 36/84 perf and perf_events $sudoperftop-p$(pidofsystemtap)
  37. 37. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 37/84
  38. 38. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 38/84 perf and perf_events
  39. 39. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 39/84
  40. 40. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 40/84 Brendan Gregg Flame Graphs www.brendangregg.com/ amegraphs.html(http://www.brendangregg.com/ amegraphs.html) Systems Performance: Enterprise and the Cloud goo.gl/556Hs2(http://goo.gl/556Hs2) $sudoperfrecord-F99-g-p$(pidofsystemtap)--sleep10 [perfrecord:Wokenup1timestowritedata] [perfrecord:Capturedandwrote0.149MBperf.data(1719samples)] $sudoperfscript|~/tmp/FlameGraph/stackcollapse-perf.pl>out.perf-folded $~/tmp/FlameGraph/flamegraph.plout.perf-folded>perf-kernel.svg
  41. 41. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 41/84 Brendan Gregg Flame Graphs Kernel stack traces!
  42. 42. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 42/84 Memory What if we were in C/C++ world? Valgrind! Massif! #include<stdlib.h> #include<unistd.h> #include<string.h> intmain(){ constsize_tMB=1024*1024; constunsignedcount=20; char**buf=calloc(count,sizeof(*buf)); for(unsignedi=0;i<count;i++){ buf[i]=calloc(1,MB); memset(buf[i],0xFF,MB); sleep(1); } for(unsignedi=0;i<count;i++){ free(buf[i]); sleep(1); } free(buf); }
  43. 43. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 43/84 Vagrind and Massif 26.20^ :: | :::# | @@::#:: | ::@::#::: | :::::@::#:::::: | ::::@::#:::::: | ::::::@::#:::::: | :::::::::@::#:::::::::: | :::::::::@::#::::::::@@ | ::::::::::@::#::::::::@:: | ::@:::::::::@::#::::::::@:::: | :::@:::::::::@::#::::::::@::::: | ::::@:::::::::@::#::::::::@:::::: | :::::@:::::::::@::#::::::::@::::::: |::::::::@:::::::::@::#::::::::@:::::::::: |:::::::@:::::::::@::#::::::::@::::::::: |@::::::@:::::::::@::#::::::::@:::::::::@ |@::::::@:::::::::@::#::::::::@:::::::::@ |@::::::@:::::::::@::#::::::::@:::::::::@ |@::::::@:::::::::@::#::::::::@:::::::::@ 0+----------------------------------------------------------------------->s 0 39.13
  44. 44. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 44/84 Valgrind and Massif Valgrind rede nes all memory allocation functions (malloc, calloc, new, free, etc.). Go do not use them. Go has their own memory allocator which uses mmap or sbrk.
  45. 45. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 45/84 Memory Valgrind can catch mmap/sbrk, but there is no point. All other memory pro ling tools work in the same fashion. We can theoretically use perf/systemtap Or we can use rich internal tools
  46. 46. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 46/84 Memory Go can collect information about allocations with some rate (once in 512KiB by default). pprof can visualize it. Similar to CPU pro ling, we have three ways to collect data. Let's use net/http/pprof this time.
  47. 47. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 47/84 Example import_"net/http/pprof" funcallocAndKeep(){ varb[][]byte for{ b=append(b,make([]byte,1024)) time.Sleep(time.Millisecond) } } funcallocAndLeave(){ varb[][]byte for{ b=append(b,make([]byte,1024)) iflen(b)==20{ b=nil } time.Sleep(time.Millisecond) } } funcmain(){ goallocAndKeep() goallocAndLeave() http.ListenAndServe("0.0.0.0:8080",nil) }
  48. 48. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 48/84 go tool pprof alloc_space - allocated bytes alloc_objects - number of allocated objects inuse_space - allocated bytes that are in use (live) inuse_objects - number of allocated objects that are in use (live) We expect inuse to show only allocAndKeep() and alloc to show both functions.
  49. 49. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 49/84 go tool pprof $gotoolpprof-inuse_spacememtesthttp://localhost:8080/debug/pprof/heap Fetchingprofilefromhttp://localhost:8080/debug/pprof/heap Savedprofilein/home/marko/pprof/pprof.memtest.localhost:8080.inuse_objects.inuse_space.005.pb.gz Enteringinteractivemode(type"help"forcommands) (pprof)top 15.36MBof15.36MBtotal( 100%) Dropped2nodes(cum<=0.08MB) flat flat% sum% cum cum% 15.36MB 100% 100% 15.36MB 100% main.allocAndKeep 0 0% 100% 15.36MB 100% runtime.goexit $gotoolpprof-alloc_spacememtesthttp://localhost:8080/debug/pprof/heap Fetchingprofilefromhttp://localhost:8080/debug/pprof/heap Savedprofilein/home/marko/pprof/pprof.memtest.localhost:8080.alloc_objects.alloc_space.008.pb.gz Enteringinteractivemode(type"help"forcommands) (pprof)top 54.49MBof54.49MBtotal( 100%) Dropped8nodes(cum<=0.27MB) flat flat% sum% cum cum% 27.97MB51.33%51.33% 29.47MB54.08% main.allocAndKeep 23.52MB43.17%94.49% 25.02MB45.92% main.allocAndLeave 3MB 5.51% 100% 3MB 5.51% time.Sleep 0 0% 100% 54.49MB 100% runtime.goexit
  50. 50. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 50/84 Sleep? Looks like predicted. But what is with sleep? (pprof)listtime.Sleep Total:54.49MB ROUTINE========================time.Sleepin/home/marko/go/src/runtime/time.go 3MB 3MB(flat,cum) 5.51%ofTotal . . 48:functimeSleep(nsint64){ . . 49: ifns<=0{ . . 50: return . . 51: } . . 52: 3MB 3MB 53: t:=new(timer) . . 54: t.when=nanotime()+ns . . 55: t.f=goroutineReady . . 56: t.arg=getg() . . 57: lock(&timers.lock) . . 58: addtimerLocked(t)
  51. 51. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 51/84 Implicit allocations packageprinttest import( "bytes" "fmt" "testing" ) funcBenchmarkPrint(b*testing.B){ varbufbytes.Buffer varsstring="teststring" fori:=0;i<b.N;i++{ buf.Reset() fmt.Fprintf(&buf,"stringis:%s",s) } } Benchmark?
  52. 52. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 52/84 Benchmark $gotest-bench=.-benchmem testing:warning:noteststorun BenchmarkPrint-8 10000000 128ns/op 16B/op 1allocs/op PASS ok github.com/mkevac/converttest 1.420s
  53. 53. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 53/84 Pro ling $gotest-bench=.-memprofile=mem.out-memprofilerate=1 mempro lerate sets pro ling rate. 1 means all allocations. $ go tool pprof -alloc_space converttest.test mem.out (pprof)top 15.41MBof15.48MBtotal(99.59%) Dropped73nodes(cum<=0.08MB) flat flat% sum% cum cum% 15.41MB99.59%99.59% 15.43MB99.67% github.com/mkevac/converttest.BenchmarkPrint 0 0%99.59% 15.47MB99.93% runtime.goexit 0 0%99.59% 15.42MB99.66% testing.(*B).launch 0 0%99.59% 15.43MB99.67% testing.(*B).runN
  54. 54. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 54/84 Pro ling (pprof)listBenchmarkPrint Total:15.48MB ROUTINE========================github.com/mkevac/converttest.BenchmarkPrintin/home/marko/goproject 15.41MB 15.43MB(flat,cum)99.67%ofTotal . . 9:funcBenchmarkPrint(b*testing.B){ . . 10: varbufbytes.Buffer . . 11: varsstring="teststring" . . 12: fori:=0;i<b.N;i++{ . . 13: buf.Reset() 15.41MB 15.43MB 14: fmt.Fprintf(&buf,"stringis:%s",s) . . 15: } . . 16:}
  55. 55. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 55/84 Pro ling (pprof)listfmt.Fprintf Total:15.48MB ROUTINE========================fmt.Fprintfin/home/marko/go/src/fmt/print.go 0 12.02kB(flat,cum)0.076%ofTotal . . 175://Theseroutinesendin'f'andtakeaformatstring. . . 176: . . 177://Fprintfformatsaccordingtoaformatspecifierandwritestow. . . 178://Itreturnsthenumberofbyteswrittenandanywriteerrorencountered. . . 179:funcFprintf(wio.Writer,formatstring,a...interface{})(nint,errerror) . 11.55kB 180: p:=newPrinter() . 480B 181: p.doPrintf(format,a) . . 182: n,err=w.Write(p.buf) . . 183: p.free() . . 184: return . . 185:} . . 186:
  56. 56. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 56/84 Disassembly . . 466edb:CALLbytes.(*Buffer).Reset(SB) . . 466ee0:LEAQ0x98b6b(IP),AX . . 466ee7:MOVQAX,0x70(SP) . . 466eec:MOVQ$0xb,0x78(SP) . . 466ef5:MOVQ$0x0,0x60(SP) . . 466efe:MOVQ$0x0,0x68(SP) . . 466f07:LEAQ0x70d92(IP),AX . . 466f0e:MOVQAX,0(SP) . . 466f12:LEAQ0x70(SP),AX . . 466f17:MOVQAX,0x8(SP) . . 466f1c:MOVQ$0x0,0x10(SP) 15.41MB 15.41MB 466f25:CALLruntime.convT2E(SB) . . 466f2a:MOVQ0x18(SP),AX . . 466f2f:MOVQ0x20(SP),CX . . 466f34:MOVQAX,0x60(SP) . . 466f39:MOVQCX,0x68(SP) . . 466f3e:LEAQ0x10b35b(IP),AX . . 466f45:MOVQAX,0(SP) . . 466f49:MOVQ0x58(SP),AX . . 466f4e:MOVQAX,0x8(SP) . . 466f53:LEAQ0x99046(IP),CX . . 466f5a:MOVQCX,0x10(SP)
  57. 57. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 57/84 . . 466f5f:MOVQ$0xd,0x18(SP) . . 466f68:LEAQ0x60(SP),CX . . 466f6d:MOVQCX,0x20(SP) . . 466f72:MOVQ$0x1,0x28(SP) . . 466f7b:MOVQ$0x1,0x30(SP) . 12.02kB 466f84:CALLfmt.Fprintf(SB)
  58. 58. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 58/84 fprintf funcFprintf(wio.Writer,formatstring,a...interface{})(nint,errerror) interface{} same as void*... but it's not
  59. 59. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 59/84 Go internal types string, chan, func, slice, interface, etc.
  60. 60. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 60/84 Empty interface varsstring=“marko” varainterface{}=&s no allocation varsstring=“marko” varainterface{}=s 16 bytes allocation
  61. 61. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 61/84 Empty interface
  62. 62. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 62/84 Fix packagemain import( "bytes" "testing" ) funcBenchmarkPrint(b*testing.B){ varbufbytes.Buffer varsstring="teststring" fori:=0;i<b.N;i++{ buf.Reset() buf.WriteString("stringis:") buf.WriteString(s) } } Benchmark?
  63. 63. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 63/84 Benchmark $gotest-bench=BenchmarkPrint-benchmem testing:warning:noteststorun BenchmarkPrint-8 50000000 27.5ns/op 0B/op 0allocs/op PASS ok github.com/mkevac/converttest01 1.413s 0 allocations and 4x speed
  64. 64. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 64/84 Implicit allocation String and char * pretty much the same in C. But not in Go. packagemain import( "fmt" ) funcmain(){ vararray=[]byte{'m','a','r','k','o'} ifstring(array)=="marko"{ fmt.Println("equal") } }
  65. 65. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 65/84 Implicit allocation Always check your assumptions. Go runtime, Go compiler and Go tools are better with each day. Some optimization you read about in 2010 could be not needed. Or can be harmful.
  66. 66. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 66/84 Example (again) packagemain import( "bytes" "testing" "unsafe" ) varsstring funcBenchmarkConvert(b*testing.B){ varbufbytes.Buffer vararray=[]byte{'m','a','r','k','o',0} fori:=0;i<b.N;i++{ buf.Reset() s=string(array) buf.WriteString(s) } }
  67. 67. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 67/84 Benchmark $gotest-bench=.-benchmem testing:warning:noteststorun BenchmarkConvert-8 30000000 42.1ns/op 8B/op 1allocs/op
  68. 68. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 68/84 Fix funcBytesToString(b[]byte)string{ bh:=(*reflect.SliceHeader)(unsafe.Pointer(&b)) sh:=reflect.StringHeader{bh.Data,bh.Len} return*(*string)(unsafe.Pointer(&sh)) } funcBenchmarkNoConvert(b*testing.B){ varbufbytes.Buffer vararray=[]byte{'m','a','r','k','o',0} fori:=0;i<b.N;i++{ buf.Reset() s=BytesToString(array) buf.WriteString(s) } }
  69. 69. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 69/84 Benchmark $gotest-bench=.-benchmem testing:warning:noteststorun BenchmarkConvert-8 30000000 44.5ns/op 8B/op 1allocs/op BenchmarkNoConvert-8 100000000 19.2ns/op 0B/op 0allocs/op PASS ok github.com/mkevac/bytetostring 3.332s
  70. 70. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 70/84 Tracing Go runtime writes almost everything it does. Scheduling, channel operations, locks, thread creation, ... Full list in runtime/trace.go For visualization go tool trace uses same JS package that Chrome uses for page loading visualization. Example.
  71. 71. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 71/84 debugcharts github.com/mkevac/debugcharts(http://github.com/mkevac/debugcharts) runtime.ReadMemStats() once a second
  72. 72. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 72/84 Example import( "net/http" _"net/http/pprof" "time" _"github.com/mkevac/debugcharts" ) funcCPUHogger(){ varaccuint64 t:=time.Tick(2*time.Second) for{ select{ case<-t: time.Sleep(50*time.Millisecond) default: acc++ } } } funcmain(){ goCPUHogger() goCPUHogger() http.ListenAndServe("0.0.0.0:8181",nil) }
  73. 73. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 73/84 Tracing $curlhttp://localhost:8181/debug/pprof/trace?seconds=10-otrace.out Sometimes all you can visualize is 1-3 seconds. $gotooltrace-http"0.0.0.0:8080"./tracetesttrace.out
  74. 74. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 74/84 Tracing
  75. 75. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 75/84 Tracing
  76. 76. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 76/84 Tracing
  77. 77. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 77/84 proc stop and proc start
  78. 78. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 78/84 runtime.ReadMemStats() 180//ReadMemStatspopulatesmwithmemoryallocatorstatistics. 181funcReadMemStats(m*MemStats){ 182 stopTheWorld("readmemstats") 183 184 systemstack(func(){ 185 readmemstats_m(m) 186 }) 187 188 startTheWorld() 189} Production? No!
  79. 79. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 79/84 Conclusion There are so much more
  80. 80. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 80/84 Conlusion CPU pro ler Memory pro ler All allocations tracing Escape analysis Lock/Contention pro ler Scheduler tracing Tracing GC tracing Real time memory statistics System pro lers like perf and systemtap. But no tool will replace deep understanding of how your program works from start to nish.
  81. 81. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 81/84 I hope that today's crash course was helpful.
  82. 82. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 82/84 Stay curious
  83. 83. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 83/84 Thank you Marko Kevac Software Engineer, Badoo marko@kevac.org(mailto:marko@kevac.org) @mkevac(http://twitter.com/mkevac)
  84. 84. 5/12/2016 Profiling and optimizing Go programs http://localhost:3999/gomeetup.slide#1 84/84

×