Debugging Ruby (Aman Gupta)

Debugging Ruby
with MongoDB
Aman Gupta
@tmm1

debugging ruby?

• i use ruby

debugging ruby?

• i use ruby
• my ruby processes
use a lot of ram

debugging ruby?

• i use ruby
• my ruby processes
use a lot of ram
• i want to ﬁx this

let’s build a debugger
• step 1: collect data
• list of all ruby
objects in memory

let’s build a debugger
• step 1: collect data
• list of all ruby
objects in memory

• step 2: analyze data
• group by type
• group by ﬁle/line

version 1: collect data
• simple patch to ruby VM (300 lines of C)
• http://gist.github.com/73674
• simple text based output format
0x154750 @ -e:1 is OBJECT of type: T
0x15476c @ -e:1 is HASH which has data
0x154788 @ -e:1 is ARRAY of len: 0
0x1547c0 @ -e:1 is STRING (SHARED) len: 2 and val: hi
0x1547dc @ -e:1 is STRING len: 1 and val: T
0x154814 @ -e:1 is CLASS named: T inherits from Object
0x154a98 @ -e:1 is STRING len: 2 and val: hi
0x154b40 @ -e:1 is OBJECT of type: Range

version 1: analyze data
$ wc -l /tmp/ruby.heap

1571529 /tmp/ruby.heap



$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort |
uniq -c | sort -g | tail -1

236840 memcached/memcached.rb:316



$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort |
uniq -c | sort -g | tail -1

236840 memcached/memcached.rb:316

$ grep "memcached.rb:316" /tmp/ruby.heap | awk
'{ print $5 }' | sort | uniq -c | sort -g | tail -5

10948 ARRAY
20355 OBJECT
30744 DATA
64952 HASH
123290 STRING

version 1
• it works!
• but...

version 1
• it works!
• but...
• must patch and rebuild ruby binary

version 1
• it works!
• but...
• no information about references between
objects

version 1
• it works!
• but...
• no information about references between
objects
• limited analysis via shell scripting

version 2 goals
• better data format

version 2 goals
• simple: one line of text per object

version 2 goals
• expressive: include all details about
object contents and references

version 2 goals
• expressive: include all details about
object contents and references
• easy to use: easy to generate from C
code & easy to consume from various
scripting languages

version 2 is memprof
• no patches to ruby necessary
• gem install memprof
• require ‘memprof’
• Memprof.dump_all(“/tmp/app.json”)

version 2 is memprof
• no patches to ruby necessary
• gem install memprof
• require ‘memprof’
• Memprof.dump_all(“/tmp/app.json”)
• C extension for MRI ruby VM
http://github.com/ice799/memprof
• uses libyajl to dump out all ruby objects
as json

Memprof.dump{
strings }
“hello” + “world”

Memprof.dump{
strings }

{
"_id": "0x19c610", memory address of object
"file": "-e",
"line": 1,

"type": "string",
"class": "0x1ba7f0",
"class_name": "String",

"length": 10,
"data": "helloworld"
}

Memprof.dump{
strings }

{
"file": "-e", ﬁle and line where string
"line": 1,
was created
"type": "string",
"class": "0x1ba7f0",
"class_name": "String",

"length": 10,
}

Memprof.dump{
strings }

{
"line": 1,
was created
"type": "string",
"class": "0x1ba7f0", address of the class object
"class_name": "String", “String”

"length": 10,
}

Memprof.dump{
strings }

{
"line": 1,
was created
"type": "string",
"class": "0x1ba7f0", address of the class object
"class_name": "String", “String”

"length": 10, length and contents
"data": "helloworld" of this string instance
}

arrays
Memprof.dump{
[
1,
:b,
2.2,
“d”
]
}

arrays
Memprof.dump{
[
1,
:b,
{
"_id": "0x19c5c0",
2.2,
“d”
"class": "0x1b0d18", ]
"class_name": "Array", }

"length": 4,
"data": [
1,
":b",

"0x19c750",
"0x19c598"
]
}

arrays
Memprof.dump{
[
1,
:b,
{
"_id": "0x19c5c0",
2.2,
“d”
"class": "0x1b0d18", ]

"length": 4,
"data": [
1, integers and symbols are
":b", stored in the array itself
"0x19c750",
"0x19c598"
]
}

arrays
Memprof.dump{
[
1,
:b,
{
"_id": "0x19c5c0",
2.2,
“d”
"class": "0x1b0d18", ]

"length": 4,
"data": [
1, integers and symbols are
":b", stored in the array itself
"0x19c750", ﬂoats and strings are
"0x19c598" separate ruby objects
]
}

hashes
Memprof.dump{
{
:a => 1,
“b” => 2.2
}
}

hashes
Memprof.dump{
{
:a => 1,
“b” => 2.2
{ }
"_id": "0x19c598", }
"type": "hash",
"class": "0x1af170",
"class_name": "Hash",

"default": null,

"length": 2,
"data": [
[ ":a", 1 ],
[ "0xc728", "0xc750" ]
]
}

hashes
Memprof.dump{
{
:a => 1,
“b” => 2.2
{ }
"_id": "0x19c598", }
"type": "hash",

"default": null,

"length": 2,
"data": [
[ ":a", 1 ],
hash entries as key/value
[ "0xc728", "0xc750" ] pairs
]
}

hashes
Memprof.dump{
{
:a => 1,
“b” => 2.2
{ }
"_id": "0x19c598", }
"type": "hash",

"default": null, no default proc
"length": 2,
"data": [
[ ":a", 1 ],
hash entries as key/value
[ "0xc728", "0xc750" ] pairs
]
}

classes
Memprof.dump{
class Hello
@@var=1
Const=2
def world() end
end
}

classes
Memprof.dump{
class Hello
@@var=1
Const=2
{ def world() end
"_id": "0x19c408",
end
"type": "class", }
"name": "Hello",
"super": "0x1bfa48",
"super_name": "Object",

"ivars": {
"@@var": 1,
"Const": 2
},
"methods": {
"world": "0x19c318"
}
}

classes
Memprof.dump{
class Hello
@@var=1
Const=2
{ def world() end
"_id": "0x19c408",
end
"type": "class", }
"name": "Hello",
"super": "0x1bfa48", superclass object reference

"ivars": {
"@@var": 1,
"Const": 2
},
"methods": {
"world": "0x19c318"
}
}

classes
Memprof.dump{
class Hello
@@var=1
Const=2
{ def world() end
"_id": "0x19c408",
end
"type": "class", }
"name": "Hello",

"ivars": { class variables and constants
"@@var": 1, are stored in the instance
"Const": 2
}, variable table
"methods": {
"world": "0x19c318"
}
}

classes
Memprof.dump{
class Hello
@@var=1
Const=2
{ def world() end
"_id": "0x19c408",
end
"type": "class", }
"name": "Hello",

"ivars": { class variables and constants
"@@var": 1, are stored in the instance
"Const": 2
}, variable table
"methods": {
"world": "0x19c318" references to method objects
}
}

version 2: memprof.com
a web-based heap visualizer and leak analyzer

built on...

$ mongoimport
-d memprof
-c rails
--file /tmp/app.json
$ mongo memprof

built on...

$ mongoimport
-d memprof
-c rails
--file /tmp/app.json
$ mongo memprof

let’s run some queries.

how many objects?
> db.rails.count()
809816

• ruby scripts create a lot of objects
• usually not a problem, but...
• MRI has a naïve stop-the-world mark/
sweep GC
• fewer objects = faster GC = better
performance

what types of objects?
> db.rails.distinct(‘type’)

[‘array’,
‘bignum’,
‘class’,
‘float’,
‘hash’,
‘module’,
‘node’,
‘object’,
‘regexp’,
‘string’,
...]

mongodb: distinct
• distinct(‘type’)
list of types of objects

mongodb: distinct
• distinct(‘file’)
list of source ﬁles

mongodb: distinct
• distinct(‘class_name’)
list of instance class names

mongodb: distinct
• distinct(‘class_name’)
list of instance class names
• optionally filter first
• distinct(‘name’, {type:“class”})
names of all defined classes

improve performance
with indexes

> db.rails.ensureIndex({‘type’:1})

improve performance
with indexes

> db.rails.ensureIndex({‘type’:1})

> db.rails.ensureIndex(
{‘file’:1},
{background:true}
)

mongodb: ensureIndex
• add an index on a field (if it doesn’t exist yet)
• improve performance of queries against
common fields: type, class_name, super, file

mongodb: ensureIndex
• add an index on a field (if it doesn’t exist yet)
• improve performance of queries against
common fields: type, class_name, super, file
• can index embedded field names
• ensureIndex(‘methods.add’)

• find({‘methods.add’:{$exists:true}})
find classes that define the method add

how many objs per type?
> db.rails.group({
initial: {count:0},
key: {type:true}, group on type
cond: {},
reduce: function(obj, out) {
out.count++
}
}).sort(function(a,b){
return a.count - b.count
})

> db.rails.group({
initial: {count:0},
cond: {},
increment count
out.count++
for each obj
}
return a.count - b.count
})

> db.rails.group({
initial: {count:0},
cond: {},
increment count
out.count++
for each obj
}
return a.count - b.count sort results
})

[
...,
{type: ‘array’, count: 7621},
{type: ‘string’, count: 69139},
{type: ‘node’, count: 365285}
]

[
...,
]
lots of nodes

[
...,
]
lots of nodes

• nodes represent ruby code
• stored like any other ruby object
• makes ruby completely dynamic

mongodb: group
• cond: query to ﬁlter objects before
grouping

mongodb: group
grouping
• key: ﬁeld(s) to group on

mongodb: group
grouping
• initial: initial values for each group’s
results

mongodb: group
grouping
• initial: initial values for each group’s
results
• reduce: aggregation function

mongodb: group
• bykey: {type:1}
type or class
•
• key: {class_name:1}

mongodb: group
• bykey: {type:1}
type or class
•

• bykey:&{file:1, line:1}
ﬁle line
•

mongodb: group
• bykey: {type:1}
type or class
•

file line
•
• bycond: in a specific file
type
• {file: “app.rb”},
key: {file:1, line:1}

mongodb: group
• bykey: {type:1}
type or class
•

file line
•
• bycond: in a specific file
type
• {file: “app.rb”},
key: {file:1, line:1}

• bycond: {file:“app.rb”,type:‘string’},
length of strings in a specific file
•
key: {length:1}

what subclasses String?
> db.rails.find(
{super_name:"String"},
{name:1}
)

{name: "ActiveSupport::SafeBuffer"}
{name: "ActiveSupport::StringInquirer"}
{name: "SQLite3::Blob"}
{name: "ActiveModel::Name"}
{name: "Arel::Attribute::Expressions"}
{name: "ActiveSupport::JSON::Variable"}

what subclasses String?
> db.rails.find(
{super_name:"String"},
{name:1} select only name ﬁeld
)

{name: "ActiveSupport::SafeBuffer"}
{name: "ActiveSupport::StringInquirer"}
{name: "SQLite3::Blob"}
{name: "ActiveModel::Name"}
{name: "Arel::Attribute::Expressions"}
{name: "ActiveSupport::JSON::Variable"}

mongodb: ﬁnd

• find({type:‘string’})
all strings

mongodb: ﬁnd

all strings
• find({type:{$ne:‘string’}})
everything except strings

mongodb: ﬁnd

all strings
• find({type:{$ne:‘string’}})
everything except strings
• find({type:‘string’}, {data:1})
only select string’s data ﬁeld

the largest objects?
> db.rails.find(
{type:
{$in:['string','array','hash']}
},
{type:1,length:1}
).sort({length:-1}).limit(3)

{type: "string", length: 2308}

mongodb: sort, limit/skip
• sort({length:-1,file:1})
sort by length desc, ﬁle asc

• limit(10)
ﬁrst 10 results

• limit(10)
ﬁrst 10 results
• skip(10).limit(10)
second 10 results

when were objs created?
• useful to look at objects over time
• each obj has a timestamp of when it was
created

created
• ﬁnd minimum time, call it
start_time

created
start_time
• create buckets for every
minute of execution since
start

created
start_time
• create buckets for every
minute of execution since
start
• place objects into buckets

> db.rails.mapReduce(function(){
var secs = this.time - start_time;
var mins_since_start = secs % 60;
emit(mins_since_start, 1);
}, function(key, vals){
for(var i=0,sum=0; i<vals.length;
sum += vals[i++]);
return sum;
}, {
scope: { start_time: db.rails.find
().sort({time:1}).limit(1)[0].time }
} start_time = min(time)
)
{result:"tmp.mr_1272615772_3"}

mongodb: mapReduce
• arguments
• map: function that emits one or more
key/value pairs given each object this
• reduce: function to return aggregate
result, given key and list of values
• scope: global variables to set for funcs

mongodb: mapReduce
• arguments
• map: function that emits one or more
key/value pairs given each object this
• reduce: function to return aggregate
result, given key and list of values
• scope: global variables to set for funcs
• results
• stored in a temporary collection
(tmp.mr_1272615772_3)

> db.tmp.mr_1272615772_3.count()
12
script was running for 12 minutes

> db.tmp.mr_1272615772_3.count()
12
script was running for 12 minutes

> db.tmp.mr_1272615772_3.find().sort
({value:-1}).limit(1)
{_id: 8, value: 41231}
41k objects created 8 minutes after start

references to this object?
ary = [“a”,”b”,”c”]
ary references “a”
“b” referenced by ary

• ruby makes it easy to “leak” references
• an object will stay around until all
references to it are gone
• more objects = longer GC = bad
performance
• must ﬁnd references to ﬁx leaks

• db.rails_refs.insert({
_id:"0xary", refs:["0xa","0xb","0xc"]
})
create references lookup table

})
• db.rails_refs.ensureIndex({refs:1})
add ‘multikey’ index to refs array

})
• db.rails_refs.ensureIndex({refs:1})
add ‘multikey’ index to refs array
• db.rails_refs.find({refs:“0xa”})
efﬁciently lookup all objs holding a ref to 0xa

mongodb: multikeys

• indexes on array values create a ‘multikey’
index
• classic example: nested array of tags
• find({tags: “ruby”})
ﬁnd objs where obj.tags includes “ruby”

memprof.com
a web-based heap visualizer and leak analyzer

plugging a leak in rails3
• in dev mode, rails3 is leaking 10mb per request

plugging a leak in rails3
• in dev mode, rails3 is leaking 10mb per request

let’s use memprof to ﬁnd it!

# in environment.rb
require `gem which memprof/signal`.strip

plugging a leak
in rails3
send the app some
requests so it leaks
$ ab -c 1 -n 30
http://localhost:3000/

plugging a leak
in rails3
send the app some
requests so it leaks
$ ab -c 1 -n 30
http://localhost:3000/

tell memprof to dump
out the entire heap to
json
$ memprof
--pid <pid>
--name <dump name>
--key <api key>

2519 classes
30 copies of
TestController

2519 classes
30 copies of
TestController

mongo query for all
TestController classes

2519 classes
30 copies of
TestController

mongo query for all
TestController classes

details for one copy of
TestController

ﬁnd references to object

holding references
to all controllers

ﬁnd references to object

“leak” is on line 178

holding references
to all controllers

• In development mode, Rails reloads all your
application code on every request

• ActionView::Partials::PartialRenderer is caching
partials used by each controller as an optimization

• ActionView::Partials::PartialRenderer is caching
partials used by each controller as an optimization
• But.. it ends up holding a reference to every single
reloaded version of those controllers

Questions?
Aman Gupta
@tmm1

Debugging Ruby (Aman Gupta)

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Debugging Ruby (Aman Gupta)

Similar to Debugging Ruby (Aman Gupta) (20)

More from MongoSF

More from MongoSF (19)

Recently uploaded

Recently uploaded (20)

Debugging Ruby (Aman Gupta)