SlideShare a Scribd company logo
Go meetup April 2015
Anton Povarov
Marko Kevac
• PHP	

• C/C++ (~25 home-made daemons)	

• Python
• PHP	

• C/C++ (~25 home-made daemons)	

• Python	

• Go
• Same config, same logging, same directory structure	

• Same protocol, including JSON to Protobuf conversion	

• Build and testing inTeamCity	

• QA team should not know this is a Go project
Go is not special in any way
• Logs go to syslog and eventually to Splunk	

• Metrics are collected with home-grown system based on
RRD (you will see some examples)	

• HTTP based profiling is always on
• As of now we do not use vendoring	

• We use go get for dependencies	

• Sometimes we just fork projects (as with rocksdb lib)	

• We use Make for building
when two people meet somewhere in the world
• Done in a week	

• By three people	

• Huge win for product
~2800 req/sec
~2800 req/sec
~54 GiB in memory
~2800 req/sec ~54 GiB in memory
~800 M objects
~2800 req/sec ~54 GiB in memory ~800 M objects
30 sec GC pause :-(	

~ 200 ms avg response ;-(
for {
go tool pprof -alloc_objects http://.../debug/pprof/heap
go tool pprof -inuse_objects http://.../debug/pprof/heap
go build -gcflags=-m foobar.go
Allocating on stack	

• Fast	

• No GC pressure	

• Stack size is not fixed in Go	

• Not always possible
Allocating on heap	

• Slower	

• GC pressure	

• Always possible
type Person struct {
Name string
Age uint
var People []*Person
const PeopleCount = 10000000
func allocateInitial() { ...
func allocateMore() { ...
func main() {
go allocateMore()
http.ListenAndServe("localhost:8080", nil)
func allocateMore() {
for {
for i := 0; i < PeopleCount; i++ {
People = append(People, &Person{"marko", 29})
People = People[0:PeopleCount]
time.Sleep(10 * time.Second)
func allocateInitial() {
for i := 0; i < PeopleCount; i++ {
People = append(People, &Person{"marko", 29})
$ go build -gcflags=-m
./test.go:20: &Person literal escapes to heap
./test.go:27: &Person literal escapes to heap
$ go tool pprof —inuse_objects ./test002 http://.../debug/pprof/heap
(pprof) top10
10617157 of 10618977 total ( 100%)
Dropped 3 nodes (cum <= 53094)
flat flat% sum% cum cum%
9945392 93.66% 93.66% 9945392 93.66% main.allocateInitial
671765 6.33% 100% 671765 6.33% main.allocateMore
0 0% 100% 9945392 93.66% main.main
0 0% 100% 10617157 100% runtime.goexit
0 0% 100% 9945392 93.66% runtime.main
(pprof) list main.allocateInitial
Total: 10618977
ROUTINE ======================== main.allocateInitial in /Users/marko/goprojects/src/
9945392 9945392 (flat, cum) 93.66% of Total
. . 15:
. . 16:const PeopleCount = 10000000
. . 17:
. . 18:func allocateInitial() {
. . 19: for i := 0; i < PeopleCount; i++ {
9945392 9945392 20: People = append(People, &Person{"marko", 29})
. . 21: }
. . 22:}
. . 23:
. . 24:func allocateMore() {
. . 25: for {
(pprof) weblist main.allocateInitial
$ go tool pprof —alloc_objects ./test002 http://.../debug/pprof/heap
(pprof) top10
191993610 of 191995430 total ( 100%)
Dropped 3 nodes (cum <= 959977)
flat flat% sum% cum cum%
182048182 94.82% 94.82% 182048182 94.82% main.allocateMore
9945428 5.18% 100% 9945428 5.18% main.allocateInitial
0 0% 100% 9945428 5.18% main.main
0 0% 100% 191993610 100% runtime.goexit
0 0% 100% 9945428 5.18% runtime.main
# NextGC = 1619939888
# PauseNs = [144830 87026 98881 2162680 2990228 3759763 6233690 11810930 18442986
34012539 47019926 72834183 114591578 178384506 315007729 480245709 568020053
575784519 517883227 518861595 604910252 514458210 542329937 560007420 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# NumGC = 24
GC pause:	

~500 ms
$ GODEBUG=gctrace=1 ./test002
gc16(1): 1+16+433413+1 us, 396 -> 793 MB, 15766167
(15766492-325) objects, 8 goroutines, 61633/1/0 sweeps, 0(0)
handoff, 0(0) steal, 0/0/0 yields
read more:
gc16(1): // 16th GC (1 thread was doing it)
1+16+433413+1 us, // GC prepare + sweep + mark + finalise
396 -> 793 MB, // heap grew from X to Y since last GC
15766167 (15766492-325) objects,
// 15766167 objects in heap (incl. garbage)
// 15766492 allocs - 325 frees
8 goroutines,
61633/1/0 sweeps, // 61633 total spans, 1 in bg, 0 in pause
0(0) handoff, 0(0) steal, 0/0/0 yields
// some scheduling stats :)
read more:
$ wrk --latency -d 60s 'http://localhost:8080/debug/pprof/'
Running 1m test @ http://localhost:8080/debug/pprof/
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 26.37ms 100.32ms 751.54ms 93.65%
Req/Sec 7.56k 2.47k 9.56k 89.05%
Latency Distribution
50% 687.00us
75% 1.02ms
90% 11.59ms
99% 550.23ms
831369 requests in 1.00m, 507.43MB read
Requests/sec: 13848.63
Transfer/sec: 8.45MB
• Memory Allocated	

• GC Pauses	

• Shows data for last 24h
import _ ""
GC pause:	

500 ms
type Person struct {
Name [6]byte // was string
Age uint
var People []Person // was []*Person
const PeopleCount = 10000000
func allocateMore() {
for {
for i := 0; i < PeopleCount; i++ {
People = append(People,
Person{[6]byte{'m', 'a', 'r', 'k', 'o', 0}, 29})
People = People[0:PeopleCount]
time.Sleep(10 * time.Second)
$ go tool pprof —inuse_objects ./test003 http:/…/debug/pprof/heap
(pprof) top10
1820 of 1824 total (99.78%)
Dropped 8 nodes (cum <= 9)
flat flat% sum% cum cum%
1820 99.78% 99.78% 1820 99.78% mcommoninit
0 0% 99.78% 1820 99.78% runtime.rt0_go
0 0% 99.78% 1820 99.78% runtime.schedinit
$ go tool pprof —alloc_objects ./test003 http://.../debug/pprof/heap
(pprof) top10
1856 of 1858 total (99.89%)
Dropped 4 nodes (cum <= 9)
flat flat% sum% cum cum%
1820 97.95% 97.95% 1820 97.95% mcommoninit
36 1.94% 99.89% 36 1.94% main.allocateInitial
0 0% 99.89% 36 1.94% main.main
0 0% 99.89% 38 2.05% runtime.goexit
0 0% 99.89% 38 2.05% runtime.main
0 0% 99.89% 1820 97.95% runtime.rt0_go
0 0% 99.89% 1820 97.95% runtime.schedinit
$ wrk --latency -d 60s 'http://localhost:8080/debug/pprof/'
Running 1m test @ http://localhost:8080/debug/pprof/
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.52ms 11.08ms 197.07ms 98.20%
Req/Sec 21.82k 3.00k 26.49k 86.63%
Latency Distribution
50% 198.00us
75% 235.00us
90% 309.00us
99% 53.22ms
2600893 requests in 1.00m, 1.55GB read
Requests/sec: 43318.49
Transfer/sec: 26.48MB
Debugging performance issues in Go programs
DmitryVyukov on May 10, 2014
• Often it is preferable to copy a little bit,
but avoid using pointers	

• We had to thoroughly inspect our code
and remove pointers everywhere we
$ go tool pprof --alloc_objects ./heaptest /tmp/…/mem.pprof
Adjusting heap profiles for 1-in-4096 sampling rate
Welcome to pprof! For help, type 'help'.
(pprof) top
Total: 29161720 objects
22648298 77.7% 77.7% 22648298 77.7% newselect
6513152 22.3% 100.0% 6513152 22.3% main.main
256 0.0% 100.0% 256 0.0% runtime.mallocinit
14 0.0% 100.0% 14 0.0% allocg
0 0.0% 100.0% 270 0.0% _rt0_go
0 0.0% 100.0% 22648298 77.7% main.loop
0 0.0% 100.0% 14 0.0% mcommoninit
Very GC unfriendly library
$ cat test.proto
package test;
message TestMessage {
required uint32 user_id = 1;
optional string name = 2;
optional uint32 age = 3;
$ cat test.pb.go
package test
type TestMessage struct {
UserId *uint32
Name *string
Age *uint32
XXX_unrecognized []byte
$ cat test.proto
package test;
import "";
option (gogoproto.goproto_unrecognized_all) = false;
message TestMessage {
required uint32 user_id = 1 [(gogoproto.nullable) = false];
optional string name = 2 [(gogoproto.nullable) = false];
optional uint32 age = 3 [(gogoproto.nullable) = false];
removes XXX_unrecognized	

(which is sometimes a bad thing)
removes field pointers	

(changes what optional means)
type TestMessage struct {
UserId uint32
Name string
Age uint32
func SaveCoord(c Coord) error {
key := GetKey(c)
value, _ := proto.Marshal(&c)
return db.Put(key, value)
conversion to proto.Message	

(an interface)	

= escapes to heap	

= allocation
the default way
value []byte - allocated on heap
$ cat test.proto
package test;
import “";
option (gogoproto.unsafe_marshaler_all) = true;
option (gogoproto.unsafe_unmarshaler_all) = true;
option (gogoproto.sizer_all) = true;
generates extra methods to speed up Marshal/Unmarshal	


enables memory optimisation tricks	

but loses required field checks (to be fixed)
$ cat test.pb.go
func (m *Coord) MarshalTo(data []byte) (n int, err error)
func (m *Coord) Size() (n int)
func SaveCoord(c Coord) error {
key := GetKey(c)
data := make([]byte, c.Size())
n, _ := c.MarshalTo(data)
return db.Put(key, data[:n])
does not escape, allocated on stack
don’t forget to subslice :)
writes to stack buffer
Example 1
$ cat test.c
int get_error(char **error) {
*error = "error";
return 0;
$ cat test.go
func main() {
var errStr *C.char
s := C.GoString(errStr)
$ go build -gcflags=-m
./test.go:13: moved to heap: errStr
./test.go:14: &errStr escapes to heap
./test.go:16: main ... argument does not escape
go 1.3.0
$ cat test.c
int get_error(char **error) {
*error = "error";
return 0;
$ cat test.go
func main() {
var errStr *C.char
s := C.GoString(errStr)
$ go build -gcflags=-m
./test.go:14: main &errStr does not escape
./test.go:16: main ... argument does not escape
go 1.4.2
Example 2
db.Put() can cause data to escape!
func SaveCoord(c Coord) error {
key := GetKey(c)
data := make([]byte, c.Size())
n, _ := c.MarshalTo(data)
return db.Put(key, data[:n])
Example 3
Return struct from function instead of passing pointer to struct.
~2800 req/sec ~54 GiB in memory ~800 M objects
30 sec GC pause :-(	

~ 200 ms avg response ;-(
before optimisations
~2800 req/sec ~54 GiB in memory ~800 M objects
GC pause: 30s -> 3s // maps :-(	

avg response: 200ms -> 2ms // :-)
after optimisations
var m = make(map[int]int)
func main() {
for i := 0; i < 10000000; i++ {
m[i] = i
for {
time.Sleep(5 * time.Second)
GC Pause:	

500 ms
commit 85e7bee19f9f26dfca414b1e9054e429c448b14f
Author: Dmitry Vyukov <>
Date: Mon Jan 26 21:04:41 2015 +0300
runtime: do not scan maps when k/v do not contain pointers
Currently we scan maps even if k/v does not contain pointers.
This is required because overflow buckets are hanging off the main table.
This change introduces a separate array that contains pointers to all
overflow buckets and keeps them alive. Buckets themselves are marked
as containing no pointers and are not scanned by GC (if k/v does not
contain pointers).
This brings maps in line with slices and chans -- GC does not scan
their contents if elements do not contain pointers.
Currently scanning of a map[int]int with 2e8 entries (~8GB heap)
takes ~8 seconds. With this change scanning takes negligible time.
Update #9477.
Change-Id: Id8a04066a53d2f743474cad406afb9f30f00eaae
Reviewed-by: Keith Randall <>
expected in 1.5
gc #11 @11.995s 0%: 0+0+0+0+3 ms clock, 0+0+0+0+25 ms
cpu, 304->304->304 MB, 8 P (forced)
• If you want to write low latency apps, you have to fight GC :-(	

• Debugging performance issues:	

• Go Escape Analysis Flaws:	

• Go execution tracer:	

• Interface type conversion:	

• GC debugcharts:	

• GC visualisation (davecheney):
Thank you.
Anton Povarov
Marko Kevac

More Related Content

What's hot

Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
Monitoring MongoDB (MongoSV)
Monitoring MongoDB (MongoSV)Monitoring MongoDB (MongoSV)
Monitoring MongoDB (MongoSV)
Boxed Ice
Profiling and optimizing go programs
Profiling and optimizing go programsProfiling and optimizing go programs
Profiling and optimizing go programs
Badoo Development
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDBMongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDBBoxed Ice
Best Practices in Handling Performance Issues
Best Practices in Handling Performance IssuesBest Practices in Handling Performance Issues
Best Practices in Handling Performance Issues
(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석
Hacking with ruby2ruby
Hacking with ruby2rubyHacking with ruby2ruby
Hacking with ruby2ruby
Marc Chung
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
Emanuel Calvo
How & why-memory-efficient?
How & why-memory-efficient?How & why-memory-efficient?
How & why-memory-efficient?
Tier1 app
Find bottleneck and tuning in Java Application
Find bottleneck and tuning in Java ApplicationFind bottleneck and tuning in Java Application
Find bottleneck and tuning in Java Application
The origin: Init (compact version)
The origin: Init (compact version)The origin: Init (compact version)
The origin: Init (compact version)
Tzung-Bi Shih
PL/Perl - New Features in PostgreSQL 9.0
PL/Perl - New Features in PostgreSQL 9.0PL/Perl - New Features in PostgreSQL 9.0
PL/Perl - New Features in PostgreSQL 9.0
Tim Bunce
Puppet Module Reusability - What I Learned from Shipping to the Forge
Puppet Module Reusability - What I Learned from Shipping to the ForgePuppet Module Reusability - What I Learned from Shipping to the Forge
Puppet Module Reusability - What I Learned from Shipping to the Forge
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Tzung-Bi Shih
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложенийПрактический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Olga Lavrentieva
Nine Circles of Inferno or Explaining the PostgreSQL Vacuum
Nine Circles of Inferno or Explaining the PostgreSQL VacuumNine Circles of Inferno or Explaining the PostgreSQL Vacuum
Nine Circles of Inferno or Explaining the PostgreSQL Vacuum
Alexey Lesovsky
A little systemtap
A little systemtapA little systemtap
A little systemtap
yang bingwu
CSS parsing: performance tips & tricks
CSS parsing: performance tips & tricksCSS parsing: performance tips & tricks
CSS parsing: performance tips & tricks
Roman Dvornov
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
Aman Gupta
Deploying Prometheus stacks with Juju
Deploying Prometheus stacks with JujuDeploying Prometheus stacks with Juju
Deploying Prometheus stacks with Juju
J.J. Ciarlante

What's hot (20)

Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
Monitoring MongoDB (MongoSV)
Monitoring MongoDB (MongoSV)Monitoring MongoDB (MongoSV)
Monitoring MongoDB (MongoSV)
Profiling and optimizing go programs
Profiling and optimizing go programsProfiling and optimizing go programs
Profiling and optimizing go programs
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDBMongoUK 2011 - Rplacing RabbitMQ with MongoDB
MongoUK 2011 - Rplacing RabbitMQ with MongoDB
Best Practices in Handling Performance Issues
Best Practices in Handling Performance IssuesBest Practices in Handling Performance Issues
Best Practices in Handling Performance Issues
(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석(Fios#02) 2. elk 포렌식 분석
(Fios#02) 2. elk 포렌식 분석
Hacking with ruby2ruby
Hacking with ruby2rubyHacking with ruby2ruby
Hacking with ruby2ruby
Osol Pgsql
Osol PgsqlOsol Pgsql
Osol Pgsql
How & why-memory-efficient?
How & why-memory-efficient?How & why-memory-efficient?
How & why-memory-efficient?
Find bottleneck and tuning in Java Application
Find bottleneck and tuning in Java ApplicationFind bottleneck and tuning in Java Application
Find bottleneck and tuning in Java Application
The origin: Init (compact version)
The origin: Init (compact version)The origin: Init (compact version)
The origin: Init (compact version)
PL/Perl - New Features in PostgreSQL 9.0
PL/Perl - New Features in PostgreSQL 9.0PL/Perl - New Features in PostgreSQL 9.0
PL/Perl - New Features in PostgreSQL 9.0
Puppet Module Reusability - What I Learned from Shipping to the Forge
Puppet Module Reusability - What I Learned from Shipping to the ForgePuppet Module Reusability - What I Learned from Shipping to the Forge
Puppet Module Reusability - What I Learned from Shipping to the Forge
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложенийПрактический опыт профайлинга и оптимизации производительности Ruby-приложений
Практический опыт профайлинга и оптимизации производительности Ruby-приложений
Nine Circles of Inferno or Explaining the PostgreSQL Vacuum
Nine Circles of Inferno or Explaining the PostgreSQL VacuumNine Circles of Inferno or Explaining the PostgreSQL Vacuum
Nine Circles of Inferno or Explaining the PostgreSQL Vacuum
A little systemtap
A little systemtapA little systemtap
A little systemtap
CSS parsing: performance tips & tricks
CSS parsing: performance tips & tricksCSS parsing: performance tips & tricks
CSS parsing: performance tips & tricks
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
Deploying Prometheus stacks with Juju
Deploying Prometheus stacks with JujuDeploying Prometheus stacks with Juju
Deploying Prometheus stacks with Juju

Viewers also liked

Bharti airtel By Meha Thakur
Bharti airtel By Meha ThakurBharti airtel By Meha Thakur
Bharti airtel By Meha Thakur
Meha Thakur
Every cloud has a silver lining proverb
Every cloud has a silver lining proverbEvery cloud has a silver lining proverb
Every cloud has a silver lining proverb
Meha Thakur
Whole Procedure of Equations of motion.
Whole Procedure of Equations of motion.Whole Procedure of Equations of motion.
Whole Procedure of Equations of motion.Nafria_duky
New business idea by meha thakur
New business idea by meha thakurNew business idea by meha thakur
New business idea by meha thakur
Meha Thakur

Viewers also liked (8)

Resume Vaibhav Totla
Resume Vaibhav TotlaResume Vaibhav Totla
Resume Vaibhav Totla
Bharti airtel By Meha Thakur
Bharti airtel By Meha ThakurBharti airtel By Meha Thakur
Bharti airtel By Meha Thakur
Every cloud has a silver lining proverb
Every cloud has a silver lining proverbEvery cloud has a silver lining proverb
Every cloud has a silver lining proverb
Whole Procedure of Equations of motion.
Whole Procedure of Equations of motion.Whole Procedure of Equations of motion.
Whole Procedure of Equations of motion.
New business idea by meha thakur
New business idea by meha thakurNew business idea by meha thakur
New business idea by meha thakur

Similar to marko_go_in_badoo

Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang MeetupДоклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Badoo Development
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
Aman Kohli
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
Engine Yard
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Yauheni Akhotnikau
Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming Cloudflare
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Dongmin Yu
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war story
手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務
Mu Chun Wang
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injectionStHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes  with ...GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes  with ...
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
Cyber Security Alliance
Kettunen, miaubiz fuzzing at scale and in style
Kettunen, miaubiz   fuzzing at scale and in styleKettunen, miaubiz   fuzzing at scale and in style
Kettunen, miaubiz fuzzing at scale and in styleDefconRussia

Similar to marko_go_in_badoo (20)

Доклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang MeetupДоклад Антона Поварова "Go in Badoo" с Golang Meetup
Доклад Антона Поварова "Go in Badoo" с Golang Meetup
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Shrimp: A Rather Practical Example Of Application Development With RESTinio a...
Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming Go Profiling - John Graham-Cumming
Go Profiling - John Graham-Cumming
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Rails Performance
Rails PerformanceRails Performance
Rails Performance
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war story
手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injectionStHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
StHack 2013 - Florian "@agixid" Gaultier No SQL injection but NoSQL injection
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes  with ...GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes  with ...
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
Kettunen, miaubiz fuzzing at scale and in style
Kettunen, miaubiz   fuzzing at scale and in styleKettunen, miaubiz   fuzzing at scale and in style
Kettunen, miaubiz fuzzing at scale and in style


  • 1. GO IN BADOO Go meetup April 2015 Anton Povarov Marko Kevac
  • 2. BADOO BACKEND IN 2013 • PHP • C/C++ (~25 home-made daemons) • Python
  • 3. BADOO BACKEND IN 2014 • PHP • C/C++ (~25 home-made daemons) • Python • Go
  • 5. INFRASTRUCTURE • Same config, same logging, same directory structure • Same protocol, including JSON to Protobuf conversion • Build and testing inTeamCity • QA team should not know this is a Go project Go is not special in any way
  • 6. INFRASTRUCTURE • Logs go to syslog and eventually to Splunk • Metrics are collected with home-grown system based on RRD (you will see some examples) • HTTP based profiling is always on
  • 7. INFRASTRUCTURE • As of now we do not use vendoring • We use go get for dependencies • Sometimes we just fork projects (as with rocksdb lib) • We use Make for building
  • 8. BUMPED when two people meet somewhere in the world
  • 9. BUMPED • Done in a week • By three people • Huge win for product
  • 12. ~2800 req/sec ~54 GiB in memory ~800 M objects
  • 13. ~2800 req/sec ~54 GiB in memory ~800 M objects 30 sec GC pause :-( ~ 200 ms avg response ;-(
  • 15. DO NOT GIVE UP for { generateSomeLoad() go tool pprof -alloc_objects http://.../debug/pprof/heap go tool pprof -inuse_objects http://.../debug/pprof/heap think() go build -gcflags=-m foobar.go thinkAndFix() }
  • 17. MEMORY PROFILING CRASH COURSE Allocating on stack ! • Fast • No GC pressure • Stack size is not fixed in Go • Not always possible Allocating on heap ! • Slower • GC pressure • Always possible
  • 18. type Person struct { Name string Age uint } var People []*Person const PeopleCount = 10000000 func allocateInitial() { ... func allocateMore() { ... ! ! func main() { runtime.GOMAXPROCS(runtime.NumCPU()) allocateInitial() go allocateMore() http.ListenAndServe("localhost:8080", nil) } MEMORY PROFILING CRASH COURSE
  • 19. func allocateMore() { for { for i := 0; i < PeopleCount; i++ { People = append(People, &Person{"marko", 29}) } People = People[0:PeopleCount] time.Sleep(10 * time.Second) } } func allocateInitial() { for i := 0; i < PeopleCount; i++ { People = append(People, &Person{"marko", 29}) } } MEMORY PROFILING CRASH COURSE
  • 20. ESCAPE ANALYSIS $ go build -gcflags=-m # ./test.go:20: &Person literal escapes to heap ./test.go:27: &Person literal escapes to heap
  • 21. $ go tool pprof —inuse_objects ./test002 http://.../debug/pprof/heap (pprof) top10 10617157 of 10618977 total ( 100%) Dropped 3 nodes (cum <= 53094) flat flat% sum% cum cum% 9945392 93.66% 93.66% 9945392 93.66% main.allocateInitial 671765 6.33% 100% 671765 6.33% main.allocateMore 0 0% 100% 9945392 93.66% main.main 0 0% 100% 10617157 100% runtime.goexit 0 0% 100% 9945392 93.66% runtime.main ALLOCATIONS
  • 22. (pprof) list main.allocateInitial Total: 10618977 ROUTINE ======================== main.allocateInitial in /Users/marko/goprojects/src/ 9945392 9945392 (flat, cum) 93.66% of Total . . 15: . . 16:const PeopleCount = 10000000 . . 17: . . 18:func allocateInitial() { . . 19: for i := 0; i < PeopleCount; i++ { 9945392 9945392 20: People = append(People, &Person{"marko", 29}) . . 21: } . . 22:} . . 23: . . 24:func allocateMore() { . . 25: for { ALLOCATIONS
  • 24. $ go tool pprof —alloc_objects ./test002 http://.../debug/pprof/heap (pprof) top10 191993610 of 191995430 total ( 100%) Dropped 3 nodes (cum <= 959977) flat flat% sum% cum cum% 182048182 94.82% 94.82% 182048182 94.82% main.allocateMore 9945428 5.18% 100% 9945428 5.18% main.allocateInitial 0 0% 100% 9945428 5.18% main.main 0 0% 100% 191993610 100% runtime.goexit 0 0% 100% 9945428 5.18% runtime.main
  • 25. http://localhost:8080/debug/pprof/heap?debug=1 # NextGC = 1619939888 # PauseNs = [144830 87026 98881 2162680 2990228 3759763 6233690 11810930 18442986 34012539 47019926 72834183 114591578 178384506 315007729 480245709 568020053 575784519 517883227 518861595 604910252 514458210 542329937 560007420 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] # NumGC = 24 GC pause: ~500 ms
  • 26. $ GODEBUG=gctrace=1 ./test002 gc16(1): 1+16+433413+1 us, 396 -> 793 MB, 15766167 (15766492-325) objects, 8 goroutines, 61633/1/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields read more:
  • 27. gc16(1): // 16th GC (1 thread was doing it) ! 1+16+433413+1 us, // GC prepare + sweep + mark + finalise ! 396 -> 793 MB, // heap grew from X to Y since last GC ! 15766167 (15766492-325) objects, // 15766167 objects in heap (incl. garbage) // 15766492 allocs - 325 frees ! 8 goroutines, 61633/1/0 sweeps, // 61633 total spans, 1 in bg, 0 in pause ! 0(0) handoff, 0(0) steal, 0/0/0 yields // some scheduling stats :) read more:
  • 28. $ wrk --latency -d 60s 'http://localhost:8080/debug/pprof/' Running 1m test @ http://localhost:8080/debug/pprof/ 2 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 26.37ms 100.32ms 751.54ms 93.65% Req/Sec 7.56k 2.47k 9.56k 89.05% Latency Distribution 50% 687.00us 75% 1.02ms 90% 11.59ms 99% 550.23ms 831369 requests in 1.00m, 507.43MB read Requests/sec: 13848.63 Transfer/sec: 8.45MB
  • 29. DEBUGCHARTS • Memory Allocated • GC Pauses • Shows data for last 24h
  • 31. MEMORY PROFILING CRASH COURSE (FIXED) type Person struct { Name [6]byte // was string Age uint } var People []Person // was []*Person const PeopleCount = 10000000 func allocateMore() { for { for i := 0; i < PeopleCount; i++ { People = append(People, Person{[6]byte{'m', 'a', 'r', 'k', 'o', 0}, 29}) } People = People[0:PeopleCount] time.Sleep(10 * time.Second) } }
  • 32. $ go tool pprof —inuse_objects ./test003 http:/…/debug/pprof/heap (pprof) top10 1820 of 1824 total (99.78%) Dropped 8 nodes (cum <= 9) flat flat% sum% cum cum% 1820 99.78% 99.78% 1820 99.78% mcommoninit 0 0% 99.78% 1820 99.78% runtime.rt0_go 0 0% 99.78% 1820 99.78% runtime.schedinit
  • 33. $ go tool pprof —alloc_objects ./test003 http://.../debug/pprof/heap (pprof) top10 1856 of 1858 total (99.89%) Dropped 4 nodes (cum <= 9) flat flat% sum% cum cum% 1820 97.95% 97.95% 1820 97.95% mcommoninit 36 1.94% 99.89% 36 1.94% main.allocateInitial 0 0% 99.89% 36 1.94% main.main 0 0% 99.89% 38 2.05% runtime.goexit 0 0% 99.89% 38 2.05% runtime.main 0 0% 99.89% 1820 97.95% runtime.rt0_go 0 0% 99.89% 1820 97.95% runtime.schedinit
  • 34. $ wrk --latency -d 60s 'http://localhost:8080/debug/pprof/' Running 1m test @ http://localhost:8080/debug/pprof/ 2 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.52ms 11.08ms 197.07ms 98.20% Req/Sec 21.82k 3.00k 26.49k 86.63% Latency Distribution 50% 198.00us 75% 235.00us 90% 309.00us 99% 53.22ms 2600893 requests in 1.00m, 1.55GB read Requests/sec: 43318.49 Transfer/sec: 26.48MB
  • 35. Debugging performance issues in Go programs DmitryVyukov on May 10, 2014
  • 37. GO IS NOT C • Often it is preferable to copy a little bit, but avoid using pointers • We had to thoroughly inspect our code and remove pointers everywhere we could
  • 38. $ go tool pprof --alloc_objects ./heaptest /tmp/…/mem.pprof Adjusting heap profiles for 1-in-4096 sampling rate Welcome to pprof! For help, type 'help'. (pprof) top Total: 29161720 objects 22648298 77.7% 77.7% 22648298 77.7% newselect 6513152 22.3% 100.0% 6513152 22.3% main.main 256 0.0% 100.0% 256 0.0% runtime.mallocinit 14 0.0% 100.0% 14 0.0% allocg 0 0.0% 100.0% 270 0.0% _rt0_go 0 0.0% 100.0% 22648298 77.7% main.loop 0 0.0% 100.0% 14 0.0% mcommoninit NOT AN ALLOCATION
  • 40. $ cat test.proto package test; ! message TestMessage { required uint32 user_id = 1; optional string name = 2; optional uint32 age = 3; } $ cat test.pb.go package test ! type TestMessage struct { UserId *uint32 Name *string Age *uint32 XXX_unrecognized []byte }
  • 42. $ cat test.proto package test; ! import ""; option (gogoproto.goproto_unrecognized_all) = false; ! message TestMessage { required uint32 user_id = 1 [(gogoproto.nullable) = false]; optional string name = 2 [(gogoproto.nullable) = false]; optional uint32 age = 3 [(gogoproto.nullable) = false]; } removes XXX_unrecognized (which is sometimes a bad thing) removes field pointers (changes what optional means)
  • 43. type TestMessage struct { UserId uint32 Name string Age uint32 }
  • 45. func SaveCoord(c Coord) error { key := GetKey(c) value, _ := proto.Marshal(&c) return db.Put(key, value) } conversion to proto.Message (an interface) = escapes to heap = allocation PROTOBUF the default way value []byte - allocated on heap
  • 46. $ cat test.proto package test; ! import “"; option (gogoproto.unsafe_marshaler_all) = true; option (gogoproto.unsafe_unmarshaler_all) = true; option (gogoproto.sizer_all) = true; generates extra methods to speed up Marshal/Unmarshal + enables memory optimisation tricks ! but loses required field checks (to be fixed) GOGOPROTOBUF
  • 47. $ cat test.pb.go ! func (m *Coord) MarshalTo(data []byte) (n int, err error) func (m *Coord) Size() (n int) func SaveCoord(c Coord) error { key := GetKey(c) data := make([]byte, c.Size()) n, _ := c.MarshalTo(data) return db.Put(key, data[:n]) } does not escape, allocated on stack don’t forget to subslice :) writes to stack buffer
  • 49. $ cat test.c ! int get_error(char **error) { *error = "error"; return 0; } $ cat test.go […skipped…] ! func main() { var errStr *C.char C.get_error(&errStr) s := C.GoString(errStr) fmt.Println(s) } $ go build -gcflags=-m # […skipped…] ./test.go:13: moved to heap: errStr ./test.go:14: &errStr escapes to heap ./test.go:16: main ... argument does not escape go 1.3.0
  • 50. $ cat test.c ! int get_error(char **error) { *error = "error"; return 0; } $ cat test.go […skipped…] ! func main() { var errStr *C.char C.get_error(&errStr) s := C.GoString(errStr) fmt.Println(s) } $ go build -gcflags=-m # ./test.go:14: main &errStr does not escape ./test.go:16: main ... argument does not escape go 1.4.2
  • 52. db.Put() can cause data to escape! func SaveCoord(c Coord) error { key := GetKey(c) data := make([]byte, c.Size()) n, _ := c.MarshalTo(data) return db.Put(key, data[:n]) }
  • 53.
  • 55. Return struct from function instead of passing pointer to struct.
  • 56. ~2800 req/sec ~54 GiB in memory ~800 M objects 30 sec GC pause :-( ~ 200 ms avg response ;-( before optimisations
  • 57. ~2800 req/sec ~54 GiB in memory ~800 M objects GC pause: 30s -> 3s // maps :-( avg response: 200ms -> 2ms // :-) after optimisations
  • 58. MAPS
  • 59. MAPS var m = make(map[int]int) ! func main() { runtime.GOMAXPROCS(runtime.NumCPU()) ! for i := 0; i < 10000000; i++ { m[i] = i } ! for { runtime.GC() time.Sleep(5 * time.Second) } } GC Pause: 500 ms
  • 60. commit 85e7bee19f9f26dfca414b1e9054e429c448b14f Author: Dmitry Vyukov <> Date: Mon Jan 26 21:04:41 2015 +0300 ! runtime: do not scan maps when k/v do not contain pointers ! Currently we scan maps even if k/v does not contain pointers. This is required because overflow buckets are hanging off the main table. This change introduces a separate array that contains pointers to all overflow buckets and keeps them alive. Buckets themselves are marked as containing no pointers and are not scanned by GC (if k/v does not contain pointers). ! This brings maps in line with slices and chans -- GC does not scan their contents if elements do not contain pointers. ! Currently scanning of a map[int]int with 2e8 entries (~8GB heap) takes ~8 seconds. With this change scanning takes negligible time. ! Update #9477. ! Change-Id: Id8a04066a53d2f743474cad406afb9f30f00eaae Reviewed-on: Reviewed-by: Keith Randall <> expected in 1.5
  • 61. GOTIP gc #11 @11.995s 0%: 0+0+0+0+3 ms clock, 0+0+0+0+25 ms cpu, 304->304->304 MB, 8 P (forced)
  • 62. GC SUMMARY & LINKS • If you want to write low latency apps, you have to fight GC :-( • Debugging performance issues: • Go Escape Analysis Flaws: • Go execution tracer: • Interface type conversion: • GC debugcharts: • GC visualisation (davecheney):