debugging ruby?
• i use ruby
• my ruby processes
use a lot of ram
• i want to fix this
let’s build a debugger
• step 1: collect data
• list of all ruby
objects in memory
let’s build a debugger
• step 1: collect data
• list of all ruby
objects in memory
• step 2: analyze data
• group by type
• group by file/line
version 1: collect data
• simple patch to ruby VM (300 lines of C)
• http://gist.github.com/73674
• simple text based output format
0x154750 @ -e:1 is OBJECT of type: T
0x15476c @ -e:1 is HASH which has data
0x154788 @ -e:1 is ARRAY of len: 0
0x1547c0 @ -e:1 is STRING (SHARED) len: 2 and val: hi
0x1547dc @ -e:1 is STRING len: 1 and val: T
0x154814 @ -e:1 is CLASS named: T inherits from Object
0x154a98 @ -e:1 is STRING len: 2 and val: hi
0x154b40 @ -e:1 is OBJECT of type: Range
version 1
• it works!
• but...
• must patch and rebuild ruby binary
version 1
• it works!
• but...
• must patch and rebuild ruby binary
• no information about references between
objects
version 1
• it works!
• but...
• must patch and rebuild ruby binary
• no information about references between
objects
• limited analysis via shell scripting
version 2 goals
• better data format
• simple: one line of text per object
version 2 goals
• better data format
• simple: one line of text per object
• expressive: include all details about
object contents and references
version 2 goals
• better data format
• simple: one line of text per object
• expressive: include all details about
object contents and references
• easy to use: easy to generate from C
code & easy to consume from various
scripting languages
version 2 is memprof
• no patches to ruby necessary
• gem install memprof
• require ‘memprof’
• Memprof.dump_all(“/tmp/app.json”)
version 2 is memprof
• no patches to ruby necessary
• gem install memprof
• require ‘memprof’
• Memprof.dump_all(“/tmp/app.json”)
• C extension for MRI ruby VM
http://github.com/ice799/memprof
• uses libyajl to dump out all ruby objects
as json
Memprof.dump{
strings }
“hello” + “world”
{
"_id": "0x19c610", memory address of object
"file": "-e", file and line where string
"line": 1,
was created
"type": "string",
"class": "0x1ba7f0",
"class_name": "String",
"length": 10,
"data": "helloworld"
}
Memprof.dump{
strings }
“hello” + “world”
{
"_id": "0x19c610", memory address of object
"file": "-e", file and line where string
"line": 1,
was created
"type": "string",
"class": "0x1ba7f0", address of the class object
"class_name": "String", “String”
"length": 10,
"data": "helloworld"
}
Memprof.dump{
strings }
“hello” + “world”
{
"_id": "0x19c610", memory address of object
"file": "-e", file and line where string
"line": 1,
was created
"type": "string",
"class": "0x1ba7f0", address of the class object
"class_name": "String", “String”
"length": 10, length and contents
"data": "helloworld" of this string instance
}
built on...
$ mongoimport
-d memprof
-c rails
--file /tmp/app.json
$ mongo memprof
let’s run some queries.
how many objects?
> db.rails.count()
809816
• ruby scripts create a lot of objects
• usually not a problem, but...
• MRI has a naïve stop-the-world mark/
sweep GC
• fewer objects = faster GC = better
performance
what types of objects?
> db.rails.distinct(‘type’)
[‘array’,
‘bignum’,
‘class’,
‘float’,
‘hash’,
‘module’,
‘node’,
‘object’,
‘regexp’,
‘string’,
...]
mongodb: distinct
• distinct(‘type’)
list of types of objects
• distinct(‘file’)
list of source files
mongodb: distinct
• distinct(‘type’)
list of types of objects
• distinct(‘file’)
list of source files
• distinct(‘class_name’)
list of instance class names
mongodb: distinct
• distinct(‘type’)
list of types of objects
• distinct(‘file’)
list of source files
• distinct(‘class_name’)
list of instance class names
• optionally filter first
• distinct(‘name’, {type:“class”})
names of all defined classes
mongodb: ensureIndex
• add an index on a field (if it doesn’t exist yet)
• improve performance of queries against
common fields: type, class_name, super, file
mongodb: ensureIndex
• add an index on a field (if it doesn’t exist yet)
• improve performance of queries against
common fields: type, class_name, super, file
• can index embedded field names
• ensureIndex(‘methods.add’)
• find({‘methods.add’:{$exists:true}})
find classes that define the method add
how many objs per type?
> db.rails.group({
initial: {count:0},
key: {type:true}, group on type
cond: {},
reduce: function(obj, out) {
out.count++
}
}).sort(function(a,b){
return a.count - b.count
})
how many objs per type?
> db.rails.group({
initial: {count:0},
key: {type:true}, group on type
cond: {},
reduce: function(obj, out) {
increment count
out.count++
for each obj
}
}).sort(function(a,b){
return a.count - b.count
})
how many objs per type?
> db.rails.group({
initial: {count:0},
key: {type:true}, group on type
cond: {},
reduce: function(obj, out) {
increment count
out.count++
for each obj
}
}).sort(function(a,b){
return a.count - b.count sort results
})
how many objs per type?
[
...,
{type: ‘array’, count: 7621},
{type: ‘string’, count: 69139},
{type: ‘node’, count: 365285}
]
how many objs per type?
[
...,
{type: ‘array’, count: 7621},
{type: ‘string’, count: 69139},
{type: ‘node’, count: 365285}
]
lots of nodes
how many objs per type?
[
...,
{type: ‘array’, count: 7621},
{type: ‘string’, count: 69139},
{type: ‘node’, count: 365285}
]
lots of nodes
• nodes represent ruby code
• stored like any other ruby object
• makes ruby completely dynamic
mongodb: group
• cond: query to filter objects before
grouping
• key: field(s) to group on
• initial: initial values for each group’s
results
mongodb: group
• cond: query to filter objects before
grouping
• key: field(s) to group on
• initial: initial values for each group’s
results
• reduce: aggregation function
mongodb: group
• bykey: {type:1}
type or class
•
• key: {class_name:1}
• bykey:&{file:1, line:1}
file line
•
mongodb: group
• bykey: {type:1}
type or class
•
• key: {class_name:1}
• bykey:&{file:1, line:1}
file line
•
• bycond: in a specific file
type
• {file: “app.rb”},
key: {file:1, line:1}
mongodb: group
• bykey: {type:1}
type or class
•
• key: {class_name:1}
• bykey:&{file:1, line:1}
file line
•
• bycond: in a specific file
type
• {file: “app.rb”},
key: {file:1, line:1}
• bycond: {file:“app.rb”,type:‘string’},
length of strings in a specific file
•
key: {length:1}
when were objs created?
• useful to look at objects over time
• each obj has a timestamp of when it was
created
when were objs created?
• useful to look at objects over time
• each obj has a timestamp of when it was
created
• find minimum time, call it
start_time
when were objs created?
• useful to look at objects over time
• each obj has a timestamp of when it was
created
• find minimum time, call it
start_time
• create buckets for every
minute of execution since
start
when were objs created?
• useful to look at objects over time
• each obj has a timestamp of when it was
created
• find minimum time, call it
start_time
• create buckets for every
minute of execution since
start
• place objects into buckets
when were objs created?
> db.rails.mapReduce(function(){
var secs = this.time - start_time;
var mins_since_start = secs % 60;
emit(mins_since_start, 1);
}, function(key, vals){
for(var i=0,sum=0; i<vals.length;
sum += vals[i++]);
return sum;
}, {
scope: { start_time: db.rails.find
().sort({time:1}).limit(1)[0].time }
} start_time = min(time)
)
{result:"tmp.mr_1272615772_3"}
mongodb: mapReduce
• arguments
• map: function that emits one or more
key/value pairs given each object this
• reduce: function to return aggregate
result, given key and list of values
• scope: global variables to set for funcs
mongodb: mapReduce
• arguments
• map: function that emits one or more
key/value pairs given each object this
• reduce: function to return aggregate
result, given key and list of values
• scope: global variables to set for funcs
• results
• stored in a temporary collection
(tmp.mr_1272615772_3)
when were objs created?
> db.tmp.mr_1272615772_3.count()
12
script was running for 12 minutes
when were objs created?
> db.tmp.mr_1272615772_3.count()
12
script was running for 12 minutes
> db.tmp.mr_1272615772_3.find().sort
({value:-1}).limit(1)
{_id: 8, value: 41231}
41k objects created 8 minutes after start
references to this object?
ary = [“a”,”b”,”c”]
ary references “a”
“b” referenced by ary
• ruby makes it easy to “leak” references
• an object will stay around until all
references to it are gone
• more objects = longer GC = bad
performance
• must find references to fix leaks
references to this object?
• db.rails_refs.insert({
_id:"0xary", refs:["0xa","0xb","0xc"]
})
create references lookup table
references to this object?
• db.rails_refs.insert({
_id:"0xary", refs:["0xa","0xb","0xc"]
})
create references lookup table
• db.rails_refs.ensureIndex({refs:1})
add ‘multikey’ index to refs array
references to this object?
• db.rails_refs.insert({
_id:"0xary", refs:["0xa","0xb","0xc"]
})
create references lookup table
• db.rails_refs.ensureIndex({refs:1})
add ‘multikey’ index to refs array
• db.rails_refs.find({refs:“0xa”})
efficiently lookup all objs holding a ref to 0xa
mongodb: multikeys
• indexes on array values create a ‘multikey’
index
• classic example: nested array of tags
• find({tags: “ruby”})
find objs where obj.tags includes “ruby”
plugging a leak in rails3
• in dev mode, rails3 is leaking 10mb per request
plugging a leak in rails3
• in dev mode, rails3 is leaking 10mb per request
let’s use memprof to find it!
# in environment.rb
require `gem which memprof/signal`.strip
plugging a leak
in rails3
send the app some
requests so it leaks
$ ab -c 1 -n 30
http://localhost:3000/
plugging a leak
in rails3
send the app some
requests so it leaks
$ ab -c 1 -n 30
http://localhost:3000/
tell memprof to dump
out the entire heap to
json
$ memprof
--pid <pid>
--name <dump name>
--key <api key>
plugging a leak
in rails3
send the app some
requests so it leaks
$ ab -c 1 -n 30
http://localhost:3000/
tell memprof to dump
out the entire heap to
json
$ memprof
--pid <pid>
--name <dump name>
--key <api key>
find references to object
“leak” is on line 178
holding references
to all controllers
• In development mode, Rails reloads all your
application code on every request
• In development mode, Rails reloads all your
application code on every request
• ActionView::Partials::PartialRenderer is caching
partials used by each controller as an optimization
• In development mode, Rails reloads all your
application code on every request
• ActionView::Partials::PartialRenderer is caching
partials used by each controller as an optimization
• But.. it ends up holding a reference to every single
reloaded version of those controllers
• In development mode, Rails reloads all your
application code on every request
• ActionView::Partials::PartialRenderer is caching
partials used by each controller as an optimization
• But.. it ends up holding a reference to every single
reloaded version of those controllers