Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Доклад Антона Поварова "Go in Badoo" с Golang Meetup

228,375 views

Published on

Доклад с Golang митапа

Published in: Technology

Доклад Антона Поварова "Go in Badoo" с Golang Meetup

  1. 1. GO IN BADOO Go meetup April 2015 Anton Povarov antoxa@corp.badoo.com Marko Kevac m.kevac@corp.badoo.com
  2. 2. BADOO BACKEND IN 2013 • PHP • C/C++ (~25 home-made daemons) • Python
  3. 3. BADOO BACKEND IN 2014 • PHP • C/C++ (~25 home-made daemons) • Python • Go
  4. 4. INFRASTRUCTURE
  5. 5. INFRASTRUCTURE • Same config, same logging, same directory structure • Same protocol, including JSON to Protobuf conversion • Build and testing inTeamCity • QA team should not know this is a Go project Go is not special in any way
  6. 6. INFRASTRUCTURE • Logs go to syslog and eventually to Splunk • Metrics are collected with home-grown system based on RRD (you will see some examples) • HTTP based profiling is always on
  7. 7. INFRASTRUCTURE • As of now we do not use vendoring • We use go get for dependencies • Sometimes we just fork projects (as with rocksdb lib) • We use Make for building
  8. 8. BUMPED when two people meet somewhere in the world
  9. 9. BUMPED • Done in a week • By three people • Huge win for product
  10. 10. ~2800 req/sec
  11. 11. ~2800 req/sec ~54 GiB in memory
  12. 12. ~2800 req/sec ~54 GiB in memory ~900 M objects
  13. 13. ~2800 req/sec ~54 GiB in memory ~900 M objects 30 sec GC pause :-( ~ 200 ms avg response ;-(
  14. 14. DO NOT GIVE UP
  15. 15. DO NOT GIVE UP for { generateSomeLoad() go tool pprof -alloc_objects http://.../debug/pprof/heap go tool pprof -inuse_objects http://.../debug/pprof/heap think() go build -gcflags=-m foobar.go thinkAndFix() }
  16. 16. MEMORY PROFILING CRASH COURSE
  17. 17. MEMORY PROFILING CRASH COURSE Allocating on stack ! • Fast • No GC pressure • Stack size is not fixed in Go • Not always possible Allocating on heap ! • Slower • GC pressure • Always possible
  18. 18. type Person struct { Name string Age uint } var People []*Person const PeopleCount = 10000000 func allocateInitial() { ... func allocateMore() { ... ! ! func main() { runtime.GOMAXPROCS(runtime.NumCPU()) allocateInitial() go allocateMore() http.ListenAndServe("localhost:8080", nil) } MEMORY PROFILING CRASH COURSE
  19. 19. func allocateMore() { for { for i := 0; i < PeopleCount; i++ { People = append(People, &Person{"marko", 29}) } People = People[0:PeopleCount] time.Sleep(10 * time.Second) } } func allocateInitial() { for i := 0; i < PeopleCount; i++ { People = append(People, &Person{"marko", 29}) } } MEMORY PROFILING CRASH COURSE
  20. 20. ESCAPE ANALYSIS $ go build -gcflags=-m # github.com/mkevac/test002 ./test.go:20: &Person literal escapes to heap ./test.go:27: &Person literal escapes to heap
  21. 21. $ go tool pprof —inuse_objects ./test002 http://.../debug/pprof/heap (pprof) top10 10617157 of 10618977 total ( 100%) Dropped 3 nodes (cum <= 53094) flat flat% sum% cum cum% 9945392 93.66% 93.66% 9945392 93.66% main.allocateInitial 671765 6.33% 100% 671765 6.33% main.allocateMore 0 0% 100% 9945392 93.66% main.main 0 0% 100% 10617157 100% runtime.goexit 0 0% 100% 9945392 93.66% runtime.main ALLOCATIONS
  22. 22. (pprof) list main.allocateInitial Total: 10618977 ROUTINE ======================== main.allocateInitial in /Users/marko/goprojects/src/ github.com/mkevac/test002/test.go 9945392 9945392 (flat, cum) 93.66% of Total . . 15: . . 16:const PeopleCount = 10000000 . . 17: . . 18:func allocateInitial() { . . 19: for i := 0; i < PeopleCount; i++ { 9945392 9945392 20: People = append(People, &Person{"marko", 29}) . . 21: } . . 22:} . . 23: . . 24:func allocateMore() { . . 25: for { ALLOCATIONS
  23. 23. (pprof) weblist main.allocateInitial
  24. 24. $ go tool pprof —alloc_objects ./test002 http://.../debug/pprof/heap (pprof) top10 191993610 of 191995430 total ( 100%) Dropped 3 nodes (cum <= 959977) flat flat% sum% cum cum% 182048182 94.82% 94.82% 182048182 94.82% main.allocateMore 9945428 5.18% 100% 9945428 5.18% main.allocateInitial 0 0% 100% 9945428 5.18% main.main 0 0% 100% 191993610 100% runtime.goexit 0 0% 100% 9945428 5.18% runtime.main
  25. 25. http://localhost:8080/debug/pprof/heap?debug=1 # NextGC = 1619939888 # PauseNs = [144830 87026 98881 2162680 2990228 3759763 6233690 11810930 18442986 34012539 47019926 72834183 114591578 178384506 315007729 480245709 568020053 575784519 517883227 518861595 604910252 514458210 542329937 560007420 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] # NumGC = 24 GC pause: ~500 ms
  26. 26. $ GODEBUG=gctrace=1 ./test002 gc16(1): 1+16+433413+1 us, 396 -> 793 MB, 15766167 (15766492-325) objects, 8 goroutines, 61633/1/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields read more: http://goo.gl/jRIyGg
  27. 27. gc16(1): // 16th GC (1 thread was doing it) ! 1+16+433413+1 us, // GC prepare + sweep + mark + finalise ! 396 -> 793 MB, // heap grew from X to Y since last GC ! 15766167 (15766492-325) objects, // 15766167 objects in heap (incl. garbage) // 15766492 allocs - 325 frees ! 8 goroutines, 61633/1/0 sweeps, // 61633 total spans, 1 in bg, 0 in pause ! 0(0) handoff, 0(0) steal, 0/0/0 yields // some scheduling stats :) read more: http://goo.gl/jRIyGg
  28. 28. $ wrk --latency -d 60s 'http://localhost:8080/debug/pprof/' Running 1m test @ http://localhost:8080/debug/pprof/ 2 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 26.37ms 100.32ms 751.54ms 93.65% Req/Sec 7.56k 2.47k 9.56k 89.05% Latency Distribution 50% 687.00us 75% 1.02ms 90% 11.59ms 99% 550.23ms 831369 requests in 1.00m, 507.43MB read Requests/sec: 13848.63 Transfer/sec: 8.45MB
  29. 29. DEBUGCHARTS • Memory Allocated • GC Pauses • Shows data for last 24h https://github.com/mkevac/debugcharts
  30. 30. import _ "github.com/mkevac/debugcharts" GC pause: 500 ms
  31. 31. MEMORY PROFILING CRASH COURSE (FIXED) type Person struct { Name [6]byte // was string Age uint } var People []Person // was []*Person const PeopleCount = 10000000 func allocateMore() { for { for i := 0; i < PeopleCount; i++ { People = append(People, Person{[6]byte{'m', 'a', 'r', 'k', 'o', 0}, 29}) } People = People[0:PeopleCount] time.Sleep(10 * time.Second) } }
  32. 32. $ go tool pprof —inuse_objects ./test003 http:/…/debug/pprof/heap (pprof) top10 1820 of 1824 total (99.78%) Dropped 8 nodes (cum <= 9) flat flat% sum% cum cum% 1820 99.78% 99.78% 1820 99.78% mcommoninit 0 0% 99.78% 1820 99.78% runtime.rt0_go 0 0% 99.78% 1820 99.78% runtime.schedinit
  33. 33. $ go tool pprof —alloc_objects ./test003 http://.../debug/pprof/heap (pprof) top10 1856 of 1858 total (99.89%) Dropped 4 nodes (cum <= 9) flat flat% sum% cum cum% 1820 97.95% 97.95% 1820 97.95% mcommoninit 36 1.94% 99.89% 36 1.94% main.allocateInitial 0 0% 99.89% 36 1.94% main.main 0 0% 99.89% 38 2.05% runtime.goexit 0 0% 99.89% 38 2.05% runtime.main 0 0% 99.89% 1820 97.95% runtime.rt0_go 0 0% 99.89% 1820 97.95% runtime.schedinit
  34. 34. $ wrk --latency -d 60s 'http://localhost:8080/debug/pprof/' Running 1m test @ http://localhost:8080/debug/pprof/ 2 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.52ms 11.08ms 197.07ms 98.20% Req/Sec 21.82k 3.00k 26.49k 86.63% Latency Distribution 50% 198.00us 75% 235.00us 90% 309.00us 99% 53.22ms 2600893 requests in 1.00m, 1.55GB read Requests/sec: 43318.49 Transfer/sec: 26.48MB
  35. 35. http://goo.gl/jRIyGg Debugging performance issues in Go programs DmitryVyukov on May 10, 2014 MORE INFO
  36. 36. BUMPD GC vs Developer
  37. 37. GO IS NOT C • C is using pointers is C • Go is different, learned it the hard way • Often it is preferable to copy a little bit, but avoid using pointers • We had to thoroughly inspect our code and remove pointers everywhere we could
  38. 38. $ go tool pprof --alloc_objects ./heaptest /tmp/…/mem.pprof Adjusting heap profiles for 1-in-4096 sampling rate Welcome to pprof! For help, type 'help'. (pprof) top Total: 29161720 objects 22648298 77.7% 77.7% 22648298 77.7% newselect 6513152 22.3% 100.0% 6513152 22.3% main.main 256 0.0% 100.0% 256 0.0% runtime.mallocinit 14 0.0% 100.0% 14 0.0% allocg 0 0.0% 100.0% 270 0.0% _rt0_go 0 0.0% 100.0% 22648298 77.7% main.loop 0 0.0% 100.0% 14 0.0% mcommoninit NOT AN ALLOCATION
  39. 39. PROTOBUF Very GC unfriendly library
  40. 40. $ cat test.proto package test; ! message TestMessage { required uint32 user_id = 1; optional string name = 2; optional uint32 age = 3; } $ cat test.pb.go package test ! type TestMessage struct { UserId *uint32 Name *string Age *uint32 XXX_unrecognized []byte }
  41. 41. GOGOPROTOBUF https://github.com/gogo/protobuf
  42. 42. $ cat test.proto package test; ! import "github.com/gogo/protobuf/gogoproto/gogo.proto"; option (gogoproto.goproto_unrecognized_all) = false; ! message TestMessage { required uint32 user_id = 1 [(gogoproto.nullable) = false]; optional string name = 2 [(gogoproto.nullable) = false]; optional uint32 age = 3 [(gogoproto.nullable) = false]; } removes XXX_unrecognized (which is sometimes a bad thing) removes field pointers (changes what optional means)
  43. 43. type TestMessage struct { UserId uint32 Name string Age uint32 }
  44. 44. PROTOBUF / MARSHALING
  45. 45. func SaveCoord(c Coord) error { key := GetKey(c) value, _ := proto.Marshal(&c) return db.Put(key, value) } conversion to proto.Message (an interface) = escapes to heap = allocation PROTOBUF the default way value []byte - allocated on heap
  46. 46. $ cat test.proto package test; ! import “github.com/gogo/protobuf/gogoproto/gogo.proto"; option (gogoproto.unsafe_marshaler_all) = true; option (gogoproto.unsafe_unmarshaler_all) = true; option (gogoproto.sizer_all) = true; generates extra methods to speed up Marshal/Unmarshal + enables memory optimisation tricks ! but loses required field checks (to be fixed) GOGOPROTOBUF
  47. 47. $ cat test.pb.go ! func (m *Coord) MarshalTo(data []byte) (n int, err error) func (m *Coord) Size() (n int) func SaveCoord(c Coord) error { key := GetKey(c) data := make([]byte, c.Size()) n, _ := c.MarshalTo(data) return db.Put(key, data[:n]) } does not escape, allocated on stack don’t forget to subslice :) writes to stack buffer
  48. 48. CGO Example 1
  49. 49. $ cat test.c ! int get_error(char **error) { *error = "error"; return 0; } $ cat test.go […skipped…] ! func main() { var errStr *C.char C.get_error(&errStr) s := C.GoString(errStr) fmt.Println(s) } $ go build -gcflags=-m # github.com/mkevac/test001 […skipped…] ./test.go:13: moved to heap: errStr ./test.go:14: &errStr escapes to heap ./test.go:16: main ... argument does not escape go 1.3.0
  50. 50. $ cat test.c ! int get_error(char **error) { *error = "error"; return 0; } $ cat test.go […skipped…] ! func main() { var errStr *C.char C.get_error(&errStr) s := C.GoString(errStr) fmt.Println(s) } $ go build -gcflags=-m # github.com/mkevac/test001 ./test.go:14: main &errStr does not escape ./test.go:16: main ... argument does not escape go 1.4.2
  51. 51. CGO Example 2
  52. 52. db.Put() can cause data to escape! func SaveCoord(c Coord) error { key := GetKey(c) value := make([]byte, c.Size()) n, _ := c.MarshalTo(value) return db.Put(key, value[:n]) }
  53. 53. CGO Example 3
  54. 54. Return struct from function instead of passing pointer to struct.
  55. 55. ~2800 req/sec ~54 GiB in memory ~900 M objects 30 sec GC pause :-( ~ 200 ms avg response ;-( before optimisations
  56. 56. ~2800 req/sec ~54 GiB in memory ~900 M objects GC pause: 30s -> 3s // maps :-( avg response: 200ms -> 2ms // :-) after optimisations
  57. 57. MAPS
  58. 58. type Person struct { Name [6]byte // was string Age uint } var People []Person // was []*Person no pointers in struct so GC treats this as a single object and does not “look inside” = fast GC So we expect the same for map[k]v when k/v contain no pointers MAPS
  59. 59. MAPS var m = make(map[int]int) ! func main() { runtime.GOMAXPROCS(runtime.NumCPU()) ! for i := 0; i < 10000000; i++ { m[i] = i } ! for { runtime.GC() time.Sleep(5 * time.Second) } } GC Pause: 500 ms
  60. 60. commit 85e7bee19f9f26dfca414b1e9054e429c448b14f Author: Dmitry Vyukov <dvyukov@google.com> Date: Mon Jan 26 21:04:41 2015 +0300 ! runtime: do not scan maps when k/v do not contain pointers ! Currently we scan maps even if k/v does not contain pointers. This is required because overflow buckets are hanging off the main table. This change introduces a separate array that contains pointers to all overflow buckets and keeps them alive. Buckets themselves are marked as containing no pointers and are not scanned by GC (if k/v does not contain pointers). ! This brings maps in line with slices and chans -- GC does not scan their contents if elements do not contain pointers. ! Currently scanning of a map[int]int with 2e8 entries (~8GB heap) takes ~8 seconds. With this change scanning takes negligible time. ! Update #9477. ! Change-Id: Id8a04066a53d2f743474cad406afb9f30f00eaae Reviewed-on: https://go-review.googlesource.com/3288 Reviewed-by: Keith Randall <khr@golang.org> expected in 1.5 https://github.com/golang/go/issues/9477
  61. 61. GOTIP gc #11 @11.995s 0%: 0+0+0+0+3 ms clock, 0+0+0+0+25 ms cpu, 304->304->304 MB, 8 P (forced)
  62. 62. GC SUMMARY & LINKS • If you want to write low latency apps, you have to fight GC :-( • Debugging performance issues: http://goo.gl/jRIyGg • Go Escape Analysis Flaws: http://goo.gl/U1wkvy • Go execution tracer: http://goo.gl/KHLBQN • Interface type conversion: http://goo.gl/oJDYPa • GC debugcharts: https://github.com/mkevac/debugcharts • GC visualisation (davecheney): http://goo.gl/ubz5DL
  63. 63. Thank you. Anton Povarov antoxa@corp.badoo.com Marko Kevac m.kevac@corp.badoo.com

×