Debugging Ruby
 with MongoDB
    Aman Gupta
     @tmm1
debugging ruby?

• i use ruby
debugging ruby?

• i use ruby
• my ruby processes
  use a lot of ram
debugging ruby?

• i use ruby
• my ruby processes
  use a lot of ram
• i want to fix this
let’s build a debugger
• step 1: collect data
 • list of all ruby
    objects in memory
let’s build a debugger
• step 1: collect data
 • list of all ruby
    objects in memory


• step 2: analyze data
 • group by type
 • group by file/line
version 1: collect data
 • simple patch to ruby VM (300 lines of C)
  • http://gist.github.com/73674
 • simple text based output format
0x154750   @   -e:1   is   OBJECT of type: T
0x15476c   @   -e:1   is   HASH which has data
0x154788   @   -e:1   is   ARRAY of len: 0
0x1547c0   @   -e:1   is   STRING (SHARED) len: 2 and val: hi
0x1547dc   @   -e:1   is   STRING len: 1 and val: T
0x154814   @   -e:1   is   CLASS named: T inherits from Object
0x154a98   @   -e:1   is   STRING len: 2 and val: hi
0x154b40   @   -e:1   is   OBJECT of type: Range
version 1: analyze data
$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap
version 1: analyze data
$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort |
uniq -c | sort -g | tail -1

 236840 memcached/memcached.rb:316
version 1: analyze data
$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort |
uniq -c | sort -g | tail -1

 236840 memcached/memcached.rb:316

$ grep "memcached.rb:316" /tmp/ruby.heap | awk
'{ print $5 }' | sort | uniq -c | sort -g | tail -5

     10948   ARRAY
     20355   OBJECT
     30744   DATA
     64952   HASH
    123290   STRING
version 1
version 1
• it works!
version 1
• it works!
• but...
version 1
• it works!
• but...
 • must patch and rebuild ruby binary
version 1
• it works!
• but...
 • must patch and rebuild ruby binary
 • no information about references between
    objects
version 1
• it works!
• but...
 • must patch and rebuild ruby binary
 • no information about references between
    objects
 • limited analysis via shell scripting
version 2 goals
version 2 goals
• better data format
version 2 goals
• better data format
 • simple: one line of text per object
version 2 goals
• better data format
 • simple: one line of text per object
 • expressive: include all details about
    object contents and references
version 2 goals
• better data format
 • simple: one line of text per object
 • expressive: include all details about
    object contents and references
 • easy to use: easy to generate from C
    code & easy to consume from various
    scripting languages
JSON!
version 2 is memprof
version 2 is memprof
• no patches to ruby necessary
 • gem install memprof
 • require ‘memprof’
 • Memprof.dump_all(“/tmp/app.json”)
version 2 is memprof
• no patches to ruby necessary
 • gem install memprof
 • require ‘memprof’
 • Memprof.dump_all(“/tmp/app.json”)
• C extension for MRI ruby VM
  http://github.com/ice799/memprof
 • uses libyajl to dump out all ruby objects
    as json
Memprof.dump{
strings   }
            “hello” + “world”
Memprof.dump{
        strings                 }
                                  “hello” + “world”




{
    "_id": "0x19c610",        memory address of object
    "file": "-e",
    "line": 1,

    "type": "string",
    "class": "0x1ba7f0",
    "class_name": "String",

    "length": 10,
    "data": "helloworld"
}
Memprof.dump{
        strings                  }
                                   “hello” + “world”




{
    "_id": "0x19c610",        memory address of object
    "file": "-e",             file and line where string
    "line": 1,
                              was created
    "type": "string",
    "class": "0x1ba7f0",
    "class_name": "String",

    "length": 10,
    "data": "helloworld"
}
Memprof.dump{
        strings                  }
                                   “hello” + “world”




{
    "_id": "0x19c610",        memory address of object
    "file": "-e",             file and line where string
    "line": 1,
                              was created
    "type": "string",
    "class": "0x1ba7f0",      address of the class object
    "class_name": "String",   “String”

    "length": 10,
    "data": "helloworld"
}
Memprof.dump{
        strings                  }
                                   “hello” + “world”




{
    "_id": "0x19c610",        memory address of object
    "file": "-e",             file and line where string
    "line": 1,
                              was created
    "type": "string",
    "class": "0x1ba7f0",      address of the class object
    "class_name": "String",   “String”

    "length": 10,             length and contents
    "data": "helloworld"      of this string instance
}
arrays
         Memprof.dump{
           [
             1,
             :b,
             2.2,
             “d”
           ]
         }
arrays
                             Memprof.dump{
                               [
                                 1,
                                 :b,
{
    "_id": "0x19c5c0",
                                 2.2,
                                 “d”
    "class": "0x1b0d18",       ]
    "class_name": "Array",   }

    "length": 4,
    "data": [
      1,
      ":b",

        "0x19c750",
        "0x19c598"
    ]
}
arrays
                                        Memprof.dump{
                                          [
                                            1,
                                            :b,
{
    "_id": "0x19c5c0",
                                            2.2,
                                            “d”
    "class": "0x1b0d18",                  ]
    "class_name": "Array",              }

    "length": 4,
    "data": [
      1,                     integers and symbols are
      ":b",                  stored in the array itself
        "0x19c750",
        "0x19c598"
    ]
}
arrays
                                        Memprof.dump{
                                          [
                                            1,
                                            :b,
{
    "_id": "0x19c5c0",
                                            2.2,
                                            “d”
    "class": "0x1b0d18",                  ]
    "class_name": "Array",              }

    "length": 4,
    "data": [
      1,                     integers and symbols are
      ":b",                  stored in the array itself
        "0x19c750",          floats and strings are
        "0x19c598"           separate ruby objects
    ]
}
hashes
         Memprof.dump{
           {
             :a => 1,
             “b” => 2.2
           }
         }
hashes
                               Memprof.dump{
                                 {
                                   :a => 1,
                                   “b” => 2.2
{                                }
    "_id": "0x19c598",         }
    "type": "hash",
    "class": "0x1af170",
    "class_name": "Hash",

    "default": null,

    "length": 2,
    "data": [
      [ ":a", 1 ],
      [ "0xc728", "0xc750" ]
    ]
}
hashes
                                        Memprof.dump{
                                          {
                                            :a => 1,
                                            “b” => 2.2
{                                         }
    "_id": "0x19c598",                  }
    "type": "hash",
    "class": "0x1af170",
    "class_name": "Hash",

    "default": null,

    "length": 2,
    "data": [
      [ ":a", 1 ],
                               hash entries as key/value
      [ "0xc728", "0xc750" ]   pairs
    ]
}
hashes
                                        Memprof.dump{
                                          {
                                            :a => 1,
                                            “b” => 2.2
{                                         }
    "_id": "0x19c598",                  }
    "type": "hash",
    "class": "0x1af170",
    "class_name": "Hash",

    "default": null,           no default proc
    "length": 2,
    "data": [
      [ ":a", 1 ],
                               hash entries as key/value
      [ "0xc728", "0xc750" ]   pairs
    ]
}
classes
          Memprof.dump{
            class Hello
              @@var=1
              Const=2
              def world() end
            end
          }
classes
                               Memprof.dump{
                                 class Hello
                                   @@var=1
                                   Const=2
{                                  def world() end
    "_id": "0x19c408",
                                 end
    "type": "class",           }
    "name": "Hello",
    "super": "0x1bfa48",
    "super_name": "Object",

    "ivars": {
       "@@var":   1,
       "Const":   2
    },
    "methods":    {
       "world":   "0x19c318"
    }
}
classes
                                   Memprof.dump{
                                     class Hello
                                       @@var=1
                                       Const=2
{                                      def world() end
    "_id": "0x19c408",
                                     end
    "type": "class",               }
    "name": "Hello",
    "super": "0x1bfa48",       superclass object reference
    "super_name": "Object",

    "ivars": {
       "@@var":   1,
       "Const":   2
    },
    "methods":    {
       "world":   "0x19c318"
    }
}
classes
                                   Memprof.dump{
                                     class Hello
                                       @@var=1
                                       Const=2
{                                      def world() end
    "_id": "0x19c408",
                                     end
    "type": "class",               }
    "name": "Hello",
    "super": "0x1bfa48",       superclass object reference
    "super_name": "Object",

    "ivars": {                 class variables and constants
       "@@var":   1,           are stored in the instance
       "Const":   2
    },                         variable table
    "methods":    {
       "world":   "0x19c318"
    }
}
classes
                                   Memprof.dump{
                                     class Hello
                                       @@var=1
                                       Const=2
{                                      def world() end
    "_id": "0x19c408",
                                     end
    "type": "class",               }
    "name": "Hello",
    "super": "0x1bfa48",       superclass object reference
    "super_name": "Object",

    "ivars": {                 class variables and constants
       "@@var":   1,           are stored in the instance
       "Const":   2
    },                         variable table
    "methods":    {
       "world":   "0x19c318"   references to method objects
    }
}
version 2: analyze data
version 2: memprof.com
   a web-based heap visualizer and leak analyzer
built on...
built on...


    $ mongoimport
        -d memprof
        -c rails
        --file /tmp/app.json
    $ mongo memprof
built on...


    $ mongoimport
        -d memprof
        -c rails
        --file /tmp/app.json
    $ mongo memprof

                 let’s run some queries.
how many objects?
> db.rails.count()
809816

 • ruby scripts create a lot of objects
 • usually not a problem, but...
  • MRI has a naïve stop-the-world mark/
     sweep GC
  • fewer objects = faster GC = better
     performance
what types of objects?
> db.rails.distinct(‘type’)

[‘array’,
 ‘bignum’,
 ‘class’,
 ‘float’,
 ‘hash’,
 ‘module’,
 ‘node’,
 ‘object’,
 ‘regexp’,
 ‘string’,
 ...]
mongodb: distinct
mongodb: distinct
•   distinct(‘type’)
    list of types of objects
mongodb: distinct
•   distinct(‘type’)
    list of types of objects
•   distinct(‘file’)
    list of source files
mongodb: distinct
•   distinct(‘type’)
    list of types of objects
•   distinct(‘file’)
    list of source files
•   distinct(‘class_name’)
    list of instance class names
mongodb: distinct
•   distinct(‘type’)
    list of types of objects
•   distinct(‘file’)
    list of source files
•   distinct(‘class_name’)
    list of instance class names
• optionally filter first
    •   distinct(‘name’, {type:“class”})
        names of all defined classes
improve performance
             with indexes

> db.rails.ensureIndex({‘type’:1})
improve performance
             with indexes

> db.rails.ensureIndex({‘type’:1})

> db.rails.ensureIndex(
    {‘file’:1},
    {background:true}
)
mongodb: ensureIndex
• add an index on a field (if it doesn’t exist yet)
• improve performance of queries against
  common fields: type, class_name, super, file
mongodb: ensureIndex
• add an index on a field (if it doesn’t exist yet)
• improve performance of queries against
  common fields: type, class_name, super, file
• can index embedded field names
  •   ensureIndex(‘methods.add’)

  •   find({‘methods.add’:{$exists:true}})
      find classes that define the method add
how many objs per type?
> db.rails.group({
   initial: {count:0},
   key: {type:true},              group on type
   cond: {},
   reduce: function(obj, out) {
     out.count++
   }
}).sort(function(a,b){
   return a.count - b.count
})
how many objs per type?
> db.rails.group({
   initial: {count:0},
   key: {type:true},              group on type
   cond: {},
   reduce: function(obj, out) {
                                  increment count
     out.count++
                                  for each obj
   }
}).sort(function(a,b){
   return a.count - b.count
})
how many objs per type?
> db.rails.group({
   initial: {count:0},
   key: {type:true},              group on type
   cond: {},
   reduce: function(obj, out) {
                                  increment count
     out.count++
                                  for each obj
   }
}).sort(function(a,b){
   return a.count - b.count       sort results
})
how many objs per type?
 [
     ...,
     {type: ‘array’, count: 7621},
     {type: ‘string’, count: 69139},
     {type: ‘node’, count: 365285}
 ]
how many objs per type?
 [
     ...,
     {type: ‘array’, count: 7621},
     {type: ‘string’, count: 69139},
     {type: ‘node’, count: 365285}
 ]
                                lots of nodes
how many objs per type?
 [
     ...,
     {type: ‘array’, count: 7621},
     {type: ‘string’, count: 69139},
     {type: ‘node’, count: 365285}
 ]
                                        lots of nodes

 • nodes represent ruby code
  • stored like any other ruby object
  • makes ruby completely dynamic
mongodb: group
mongodb: group
• cond: query to filter objects before
  grouping
mongodb: group
• cond: query to filter objects before
  grouping
• key: field(s) to group on
mongodb: group
• cond: query to filter objects before
  grouping
• key: field(s) to group on
• initial: initial values for each group’s
  results
mongodb: group
• cond: query to filter objects before
  grouping
• key: field(s) to group on
• initial: initial values for each group’s
  results
• reduce: aggregation function
mongodb: group
mongodb: group
• bykey: {type:1}
     type or class
 •
 •   key: {class_name:1}
mongodb: group
• bykey: {type:1}
     type or class
 •
 •   key: {class_name:1}

• bykey:&{file:1, line:1}
     file line
 •
mongodb: group
• bykey: {type:1}
     type or class
 •
 •   key: {class_name:1}

• bykey:&{file:1, line:1}
     file line
 •
• bycond: in a specific file
     type
 •         {file: “app.rb”},
      key: {file:1, line:1}
mongodb: group
• bykey: {type:1}
     type or class
 •
 •   key: {class_name:1}

• bykey:&{file:1, line:1}
     file line
 •
• bycond: in a specific file
     type
 •         {file: “app.rb”},
      key: {file:1, line:1}

• bycond: {file:“app.rb”,type:‘string’},
     length of strings in a specific file
 •
      key: {length:1}
what subclasses String?
> db.rails.find(
  {super_name:"String"},
  {name:1}
)

{name:   "ActiveSupport::SafeBuffer"}
{name:   "ActiveSupport::StringInquirer"}
{name:   "SQLite3::Blob"}
{name:   "ActiveModel::Name"}
{name:   "Arel::Attribute::Expressions"}
{name:   "ActiveSupport::JSON::Variable"}
what subclasses String?
> db.rails.find(
  {super_name:"String"},
  {name:1}             select only name field
)

{name:   "ActiveSupport::SafeBuffer"}
{name:   "ActiveSupport::StringInquirer"}
{name:   "SQLite3::Blob"}
{name:   "ActiveModel::Name"}
{name:   "Arel::Attribute::Expressions"}
{name:   "ActiveSupport::JSON::Variable"}
mongodb: find
mongodb: find

•   find({type:‘string’})
    all strings
mongodb: find

•   find({type:‘string’})
    all strings
•   find({type:{$ne:‘string’}})
    everything except strings
mongodb: find

•   find({type:‘string’})
    all strings
•   find({type:{$ne:‘string’}})
    everything except strings
•   find({type:‘string’}, {data:1})
    only select string’s data field
the largest objects?
> db.rails.find(
  {type:
     {$in:['string','array','hash']}
  },
  {type:1,length:1}
).sort({length:-1}).limit(3)

{type: "string", length: 2308}
{type: "string", length: 1454}
{type: "string", length: 1238}
mongodb: sort, limit/skip
mongodb: sort, limit/skip
•   sort({length:-1,file:1})
    sort by length desc, file asc
mongodb: sort, limit/skip
•   sort({length:-1,file:1})
    sort by length desc, file asc
•   limit(10)
    first 10 results
mongodb: sort, limit/skip
•   sort({length:-1,file:1})
    sort by length desc, file asc
•   limit(10)
    first 10 results
•   skip(10).limit(10)
    second 10 results
when were objs created?
• useful to look at objects over time
 • each obj has a timestamp of when it was
    created
when were objs created?
• useful to look at objects over time
 • each obj has a timestamp of when it was
    created
 • find minimum time, call it
    start_time
when were objs created?
• useful to look at objects over time
 • each obj has a timestamp of when it was
    created
 • find minimum time, call it
    start_time
 • create buckets for every
    minute of execution since
    start
when were objs created?
• useful to look at objects over time
 • each obj has a timestamp of when it was
    created
 • find minimum time, call it
    start_time
 • create buckets for every
    minute of execution since
    start
 • place objects into buckets
when were objs created?
> db.rails.mapReduce(function(){
    var secs = this.time - start_time;
    var mins_since_start = secs % 60;
    emit(mins_since_start, 1);
  }, function(key, vals){
    for(var i=0,sum=0; i<vals.length;
        sum += vals[i++]);
    return sum;
  }, {
    scope: { start_time: db.rails.find
().sort({time:1}).limit(1)[0].time }
  }               start_time = min(time)
)
{result:"tmp.mr_1272615772_3"}
mongodb: mapReduce
• arguments
 • map: function that emits one or more
    key/value pairs given each object this
 • reduce: function to return aggregate
    result, given key and list of values
 • scope: global variables to set for funcs
mongodb: mapReduce
• arguments
 • map: function that emits one or more
    key/value pairs given each object this
  • reduce: function to return aggregate
    result, given key and list of values
 • scope: global variables to set for funcs
• results
 • stored in a temporary collection
    (tmp.mr_1272615772_3)
when were objs created?
> db.tmp.mr_1272615772_3.count()
12
                script was running for 12 minutes
when were objs created?
> db.tmp.mr_1272615772_3.count()
12
                  script was running for 12 minutes


> db.tmp.mr_1272615772_3.find().sort
({value:-1}).limit(1)
{_id: 8, value: 41231}
           41k objects created 8 minutes after start
references to this object?
ary = [“a”,”b”,”c”]
                                     ary references “a”
                                 “b” referenced by ary

 • ruby makes it easy to “leak” references
  • an object will stay around until all
     references to it are gone
   • more objects = longer GC = bad
     performance
 • must find references to fix leaks
references to this object?
•   db.rails_refs.insert({
       _id:"0xary", refs:["0xa","0xb","0xc"]
    })
    create references lookup table
references to this object?
•   db.rails_refs.insert({
       _id:"0xary", refs:["0xa","0xb","0xc"]
    })
    create references lookup table
•   db.rails_refs.ensureIndex({refs:1})
    add ‘multikey’ index to refs array
references to this object?
•   db.rails_refs.insert({
       _id:"0xary", refs:["0xa","0xb","0xc"]
    })
    create references lookup table
•   db.rails_refs.ensureIndex({refs:1})
    add ‘multikey’ index to refs array
•   db.rails_refs.find({refs:“0xa”})
    efficiently lookup all objs holding a ref to 0xa
mongodb: multikeys

• indexes on array values create a ‘multikey’
  index
• classic example: nested array of tags
  •   find({tags: “ruby”})
      find objs where obj.tags includes “ruby”
version 2: memprof.com
   a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
memprof.com
a web-based heap visualizer and leak analyzer
plugging a leak in rails3
• in dev mode, rails3 is leaking 10mb per request
plugging a leak in rails3
• in dev mode, rails3 is leaking 10mb per request

let’s use memprof to find it!

  # in environment.rb
  require `gem which memprof/signal`.strip
plugging a leak
   in rails3
 send the app some
 requests so it leaks
 $ ab -c 1 -n 30
 http://localhost:3000/
plugging a leak
   in rails3
 send the app some
 requests so it leaks
 $ ab -c 1 -n 30
 http://localhost:3000/


 tell memprof to dump
 out the entire heap to
 json
 $ memprof
   --pid <pid>
   --name <dump name>
   --key <api key>
plugging a leak
   in rails3
 send the app some
 requests so it leaks
 $ ab -c 1 -n 30
 http://localhost:3000/


 tell memprof to dump
 out the entire heap to
 json
 $ memprof
   --pid <pid>
   --name <dump name>
   --key <api key>
2519 classes
2519 classes
  30 copies of
TestController
2519 classes
  30 copies of
TestController
2519 classes
  30 copies of
TestController

                 mongo query for all
                 TestController classes
2519 classes
  30 copies of
TestController

                 mongo query for all
                 TestController classes



                 details for one copy of
                 TestController
find references to object
find references to object
find references to object




holding references
 to all controllers
find references to object




 “leak” is on line 178



holding references
 to all controllers
• In development mode, Rails reloads all your
  application code on every request
• In development mode, Rails reloads all your
  application code on every request
• ActionView::Partials::PartialRenderer is caching
  partials used by each controller as an optimization
• In development mode, Rails reloads all your
  application code on every request
• ActionView::Partials::PartialRenderer is caching
  partials used by each controller as an optimization
• But.. it ends up holding a reference to every single
  reloaded version of those controllers
• In development mode, Rails reloads all your
  application code on every request
• ActionView::Partials::PartialRenderer is caching
  partials used by each controller as an optimization
• But.. it ends up holding a reference to every single
  reloaded version of those controllers
Questions?
 Aman Gupta
  @tmm1

Debugging Ruby (Aman Gupta)

  • 1.
    Debugging Ruby withMongoDB Aman Gupta @tmm1
  • 2.
  • 3.
    debugging ruby? • iuse ruby • my ruby processes use a lot of ram
  • 4.
    debugging ruby? • iuse ruby • my ruby processes use a lot of ram • i want to fix this
  • 5.
    let’s build adebugger • step 1: collect data • list of all ruby objects in memory
  • 6.
    let’s build adebugger • step 1: collect data • list of all ruby objects in memory • step 2: analyze data • group by type • group by file/line
  • 7.
    version 1: collectdata • simple patch to ruby VM (300 lines of C) • http://gist.github.com/73674 • simple text based output format 0x154750 @ -e:1 is OBJECT of type: T 0x15476c @ -e:1 is HASH which has data 0x154788 @ -e:1 is ARRAY of len: 0 0x1547c0 @ -e:1 is STRING (SHARED) len: 2 and val: hi 0x1547dc @ -e:1 is STRING len: 1 and val: T 0x154814 @ -e:1 is CLASS named: T inherits from Object 0x154a98 @ -e:1 is STRING len: 2 and val: hi 0x154b40 @ -e:1 is OBJECT of type: Range
  • 8.
    version 1: analyzedata $ wc -l /tmp/ruby.heap  1571529 /tmp/ruby.heap
  • 9.
    version 1: analyzedata $ wc -l /tmp/ruby.heap  1571529 /tmp/ruby.heap $ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1  236840 memcached/memcached.rb:316
  • 10.
    version 1: analyzedata $ wc -l /tmp/ruby.heap  1571529 /tmp/ruby.heap $ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1  236840 memcached/memcached.rb:316 $ grep "memcached.rb:316" /tmp/ruby.heap | awk '{ print $5 }' | sort | uniq -c | sort -g | tail -5    10948 ARRAY    20355 OBJECT    30744 DATA   64952 HASH   123290 STRING
  • 11.
  • 12.
  • 13.
    version 1 • itworks! • but...
  • 14.
    version 1 • itworks! • but... • must patch and rebuild ruby binary
  • 15.
    version 1 • itworks! • but... • must patch and rebuild ruby binary • no information about references between objects
  • 16.
    version 1 • itworks! • but... • must patch and rebuild ruby binary • no information about references between objects • limited analysis via shell scripting
  • 17.
  • 18.
    version 2 goals •better data format
  • 19.
    version 2 goals •better data format • simple: one line of text per object
  • 20.
    version 2 goals •better data format • simple: one line of text per object • expressive: include all details about object contents and references
  • 21.
    version 2 goals •better data format • simple: one line of text per object • expressive: include all details about object contents and references • easy to use: easy to generate from C code & easy to consume from various scripting languages
  • 22.
  • 23.
    version 2 ismemprof
  • 24.
    version 2 ismemprof • no patches to ruby necessary • gem install memprof • require ‘memprof’ • Memprof.dump_all(“/tmp/app.json”)
  • 25.
    version 2 ismemprof • no patches to ruby necessary • gem install memprof • require ‘memprof’ • Memprof.dump_all(“/tmp/app.json”) • C extension for MRI ruby VM http://github.com/ice799/memprof • uses libyajl to dump out all ruby objects as json
  • 26.
    Memprof.dump{ strings } “hello” + “world”
  • 27.
    Memprof.dump{ strings } “hello” + “world” { "_id": "0x19c610", memory address of object "file": "-e", "line": 1, "type": "string", "class": "0x1ba7f0", "class_name": "String", "length": 10, "data": "helloworld" }
  • 28.
    Memprof.dump{ strings } “hello” + “world” { "_id": "0x19c610", memory address of object "file": "-e", file and line where string "line": 1, was created "type": "string", "class": "0x1ba7f0", "class_name": "String", "length": 10, "data": "helloworld" }
  • 29.
    Memprof.dump{ strings } “hello” + “world” { "_id": "0x19c610", memory address of object "file": "-e", file and line where string "line": 1, was created "type": "string", "class": "0x1ba7f0", address of the class object "class_name": "String", “String” "length": 10, "data": "helloworld" }
  • 30.
    Memprof.dump{ strings } “hello” + “world” { "_id": "0x19c610", memory address of object "file": "-e", file and line where string "line": 1, was created "type": "string", "class": "0x1ba7f0", address of the class object "class_name": "String", “String” "length": 10, length and contents "data": "helloworld" of this string instance }
  • 31.
    arrays Memprof.dump{ [ 1, :b, 2.2, “d” ] }
  • 32.
    arrays Memprof.dump{ [ 1, :b, { "_id": "0x19c5c0", 2.2, “d” "class": "0x1b0d18", ] "class_name": "Array", } "length": 4, "data": [ 1, ":b", "0x19c750", "0x19c598" ] }
  • 33.
    arrays Memprof.dump{ [ 1, :b, { "_id": "0x19c5c0", 2.2, “d” "class": "0x1b0d18", ] "class_name": "Array", } "length": 4, "data": [ 1, integers and symbols are ":b", stored in the array itself "0x19c750", "0x19c598" ] }
  • 34.
    arrays Memprof.dump{ [ 1, :b, { "_id": "0x19c5c0", 2.2, “d” "class": "0x1b0d18", ] "class_name": "Array", } "length": 4, "data": [ 1, integers and symbols are ":b", stored in the array itself "0x19c750", floats and strings are "0x19c598" separate ruby objects ] }
  • 35.
    hashes Memprof.dump{ { :a => 1, “b” => 2.2 } }
  • 36.
    hashes Memprof.dump{ { :a => 1, “b” => 2.2 { } "_id": "0x19c598", } "type": "hash", "class": "0x1af170", "class_name": "Hash", "default": null, "length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ] }
  • 37.
    hashes Memprof.dump{ { :a => 1, “b” => 2.2 { } "_id": "0x19c598", } "type": "hash", "class": "0x1af170", "class_name": "Hash", "default": null, "length": 2, "data": [ [ ":a", 1 ], hash entries as key/value [ "0xc728", "0xc750" ] pairs ] }
  • 38.
    hashes Memprof.dump{ { :a => 1, “b” => 2.2 { } "_id": "0x19c598", } "type": "hash", "class": "0x1af170", "class_name": "Hash", "default": null, no default proc "length": 2, "data": [ [ ":a", 1 ], hash entries as key/value [ "0xc728", "0xc750" ] pairs ] }
  • 39.
    classes Memprof.dump{ class Hello @@var=1 Const=2 def world() end end }
  • 40.
    classes Memprof.dump{ class Hello @@var=1 Const=2 { def world() end "_id": "0x19c408", end "type": "class", } "name": "Hello", "super": "0x1bfa48", "super_name": "Object", "ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" } }
  • 41.
    classes Memprof.dump{ class Hello @@var=1 Const=2 { def world() end "_id": "0x19c408", end "type": "class", } "name": "Hello", "super": "0x1bfa48", superclass object reference "super_name": "Object", "ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" } }
  • 42.
    classes Memprof.dump{ class Hello @@var=1 Const=2 { def world() end "_id": "0x19c408", end "type": "class", } "name": "Hello", "super": "0x1bfa48", superclass object reference "super_name": "Object", "ivars": { class variables and constants "@@var": 1, are stored in the instance "Const": 2 }, variable table "methods": { "world": "0x19c318" } }
  • 43.
    classes Memprof.dump{ class Hello @@var=1 Const=2 { def world() end "_id": "0x19c408", end "type": "class", } "name": "Hello", "super": "0x1bfa48", superclass object reference "super_name": "Object", "ivars": { class variables and constants "@@var": 1, are stored in the instance "Const": 2 }, variable table "methods": { "world": "0x19c318" references to method objects } }
  • 44.
  • 45.
    version 2: memprof.com a web-based heap visualizer and leak analyzer
  • 46.
  • 47.
    built on... $ mongoimport -d memprof -c rails --file /tmp/app.json $ mongo memprof
  • 48.
    built on... $ mongoimport -d memprof -c rails --file /tmp/app.json $ mongo memprof let’s run some queries.
  • 49.
    how many objects? >db.rails.count() 809816 • ruby scripts create a lot of objects • usually not a problem, but... • MRI has a naïve stop-the-world mark/ sweep GC • fewer objects = faster GC = better performance
  • 50.
    what types ofobjects? > db.rails.distinct(‘type’) [‘array’, ‘bignum’, ‘class’, ‘float’, ‘hash’, ‘module’, ‘node’, ‘object’, ‘regexp’, ‘string’, ...]
  • 51.
  • 52.
    mongodb: distinct • distinct(‘type’) list of types of objects
  • 53.
    mongodb: distinct • distinct(‘type’) list of types of objects • distinct(‘file’) list of source files
  • 54.
    mongodb: distinct • distinct(‘type’) list of types of objects • distinct(‘file’) list of source files • distinct(‘class_name’) list of instance class names
  • 55.
    mongodb: distinct • distinct(‘type’) list of types of objects • distinct(‘file’) list of source files • distinct(‘class_name’) list of instance class names • optionally filter first • distinct(‘name’, {type:“class”}) names of all defined classes
  • 56.
    improve performance with indexes > db.rails.ensureIndex({‘type’:1})
  • 57.
    improve performance with indexes > db.rails.ensureIndex({‘type’:1}) > db.rails.ensureIndex( {‘file’:1}, {background:true} )
  • 58.
    mongodb: ensureIndex • addan index on a field (if it doesn’t exist yet) • improve performance of queries against common fields: type, class_name, super, file
  • 59.
    mongodb: ensureIndex • addan index on a field (if it doesn’t exist yet) • improve performance of queries against common fields: type, class_name, super, file • can index embedded field names • ensureIndex(‘methods.add’) • find({‘methods.add’:{$exists:true}}) find classes that define the method add
  • 60.
    how many objsper type? > db.rails.group({ initial: {count:0}, key: {type:true}, group on type cond: {}, reduce: function(obj, out) { out.count++ } }).sort(function(a,b){ return a.count - b.count })
  • 61.
    how many objsper type? > db.rails.group({ initial: {count:0}, key: {type:true}, group on type cond: {}, reduce: function(obj, out) { increment count out.count++ for each obj } }).sort(function(a,b){ return a.count - b.count })
  • 62.
    how many objsper type? > db.rails.group({ initial: {count:0}, key: {type:true}, group on type cond: {}, reduce: function(obj, out) { increment count out.count++ for each obj } }).sort(function(a,b){ return a.count - b.count sort results })
  • 63.
    how many objsper type? [ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285} ]
  • 64.
    how many objsper type? [ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285} ] lots of nodes
  • 65.
    how many objsper type? [ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285} ] lots of nodes • nodes represent ruby code • stored like any other ruby object • makes ruby completely dynamic
  • 66.
  • 67.
    mongodb: group • cond:query to filter objects before grouping
  • 68.
    mongodb: group • cond:query to filter objects before grouping • key: field(s) to group on
  • 69.
    mongodb: group • cond:query to filter objects before grouping • key: field(s) to group on • initial: initial values for each group’s results
  • 70.
    mongodb: group • cond:query to filter objects before grouping • key: field(s) to group on • initial: initial values for each group’s results • reduce: aggregation function
  • 71.
  • 72.
    mongodb: group • bykey:{type:1} type or class • • key: {class_name:1}
  • 73.
    mongodb: group • bykey:{type:1} type or class • • key: {class_name:1} • bykey:&{file:1, line:1} file line •
  • 74.
    mongodb: group • bykey:{type:1} type or class • • key: {class_name:1} • bykey:&{file:1, line:1} file line • • bycond: in a specific file type • {file: “app.rb”}, key: {file:1, line:1}
  • 75.
    mongodb: group • bykey:{type:1} type or class • • key: {class_name:1} • bykey:&{file:1, line:1} file line • • bycond: in a specific file type • {file: “app.rb”}, key: {file:1, line:1} • bycond: {file:“app.rb”,type:‘string’}, length of strings in a specific file • key: {length:1}
  • 76.
    what subclasses String? >db.rails.find( {super_name:"String"}, {name:1} ) {name: "ActiveSupport::SafeBuffer"} {name: "ActiveSupport::StringInquirer"} {name: "SQLite3::Blob"} {name: "ActiveModel::Name"} {name: "Arel::Attribute::Expressions"} {name: "ActiveSupport::JSON::Variable"}
  • 77.
    what subclasses String? >db.rails.find( {super_name:"String"}, {name:1} select only name field ) {name: "ActiveSupport::SafeBuffer"} {name: "ActiveSupport::StringInquirer"} {name: "SQLite3::Blob"} {name: "ActiveModel::Name"} {name: "Arel::Attribute::Expressions"} {name: "ActiveSupport::JSON::Variable"}
  • 78.
  • 79.
    mongodb: find • find({type:‘string’}) all strings
  • 80.
    mongodb: find • find({type:‘string’}) all strings • find({type:{$ne:‘string’}}) everything except strings
  • 81.
    mongodb: find • find({type:‘string’}) all strings • find({type:{$ne:‘string’}}) everything except strings • find({type:‘string’}, {data:1}) only select string’s data field
  • 82.
    the largest objects? >db.rails.find( {type: {$in:['string','array','hash']} }, {type:1,length:1} ).sort({length:-1}).limit(3) {type: "string", length: 2308} {type: "string", length: 1454} {type: "string", length: 1238}
  • 83.
  • 84.
    mongodb: sort, limit/skip • sort({length:-1,file:1}) sort by length desc, file asc
  • 85.
    mongodb: sort, limit/skip • sort({length:-1,file:1}) sort by length desc, file asc • limit(10) first 10 results
  • 86.
    mongodb: sort, limit/skip • sort({length:-1,file:1}) sort by length desc, file asc • limit(10) first 10 results • skip(10).limit(10) second 10 results
  • 87.
    when were objscreated? • useful to look at objects over time • each obj has a timestamp of when it was created
  • 88.
    when were objscreated? • useful to look at objects over time • each obj has a timestamp of when it was created • find minimum time, call it start_time
  • 89.
    when were objscreated? • useful to look at objects over time • each obj has a timestamp of when it was created • find minimum time, call it start_time • create buckets for every minute of execution since start
  • 90.
    when were objscreated? • useful to look at objects over time • each obj has a timestamp of when it was created • find minimum time, call it start_time • create buckets for every minute of execution since start • place objects into buckets
  • 91.
    when were objscreated? > db.rails.mapReduce(function(){ var secs = this.time - start_time; var mins_since_start = secs % 60; emit(mins_since_start, 1); }, function(key, vals){ for(var i=0,sum=0; i<vals.length; sum += vals[i++]); return sum; }, { scope: { start_time: db.rails.find ().sort({time:1}).limit(1)[0].time } } start_time = min(time) ) {result:"tmp.mr_1272615772_3"}
  • 92.
    mongodb: mapReduce • arguments • map: function that emits one or more key/value pairs given each object this • reduce: function to return aggregate result, given key and list of values • scope: global variables to set for funcs
  • 93.
    mongodb: mapReduce • arguments • map: function that emits one or more key/value pairs given each object this • reduce: function to return aggregate result, given key and list of values • scope: global variables to set for funcs • results • stored in a temporary collection (tmp.mr_1272615772_3)
  • 94.
    when were objscreated? > db.tmp.mr_1272615772_3.count() 12 script was running for 12 minutes
  • 95.
    when were objscreated? > db.tmp.mr_1272615772_3.count() 12 script was running for 12 minutes > db.tmp.mr_1272615772_3.find().sort ({value:-1}).limit(1) {_id: 8, value: 41231} 41k objects created 8 minutes after start
  • 96.
    references to thisobject? ary = [“a”,”b”,”c”] ary references “a” “b” referenced by ary • ruby makes it easy to “leak” references • an object will stay around until all references to it are gone • more objects = longer GC = bad performance • must find references to fix leaks
  • 97.
    references to thisobject? • db.rails_refs.insert({ _id:"0xary", refs:["0xa","0xb","0xc"] }) create references lookup table
  • 98.
    references to thisobject? • db.rails_refs.insert({ _id:"0xary", refs:["0xa","0xb","0xc"] }) create references lookup table • db.rails_refs.ensureIndex({refs:1}) add ‘multikey’ index to refs array
  • 99.
    references to thisobject? • db.rails_refs.insert({ _id:"0xary", refs:["0xa","0xb","0xc"] }) create references lookup table • db.rails_refs.ensureIndex({refs:1}) add ‘multikey’ index to refs array • db.rails_refs.find({refs:“0xa”}) efficiently lookup all objs holding a ref to 0xa
  • 100.
    mongodb: multikeys • indexeson array values create a ‘multikey’ index • classic example: nested array of tags • find({tags: “ruby”}) find objs where obj.tags includes “ruby”
  • 101.
    version 2: memprof.com a web-based heap visualizer and leak analyzer
  • 102.
    memprof.com a web-based heapvisualizer and leak analyzer
  • 103.
    memprof.com a web-based heapvisualizer and leak analyzer
  • 104.
    memprof.com a web-based heapvisualizer and leak analyzer
  • 105.
    memprof.com a web-based heapvisualizer and leak analyzer
  • 106.
    memprof.com a web-based heapvisualizer and leak analyzer
  • 107.
    memprof.com a web-based heapvisualizer and leak analyzer
  • 108.
    memprof.com a web-based heapvisualizer and leak analyzer
  • 109.
    memprof.com a web-based heapvisualizer and leak analyzer
  • 110.
    memprof.com a web-based heapvisualizer and leak analyzer
  • 111.
    plugging a leakin rails3 • in dev mode, rails3 is leaking 10mb per request
  • 112.
    plugging a leakin rails3 • in dev mode, rails3 is leaking 10mb per request let’s use memprof to find it! # in environment.rb require `gem which memprof/signal`.strip
  • 113.
    plugging a leak in rails3 send the app some requests so it leaks $ ab -c 1 -n 30 http://localhost:3000/
  • 114.
    plugging a leak in rails3 send the app some requests so it leaks $ ab -c 1 -n 30 http://localhost:3000/ tell memprof to dump out the entire heap to json $ memprof --pid <pid> --name <dump name> --key <api key>
  • 115.
    plugging a leak in rails3 send the app some requests so it leaks $ ab -c 1 -n 30 http://localhost:3000/ tell memprof to dump out the entire heap to json $ memprof --pid <pid> --name <dump name> --key <api key>
  • 117.
  • 118.
    2519 classes 30 copies of TestController
  • 119.
    2519 classes 30 copies of TestController
  • 120.
    2519 classes 30 copies of TestController mongo query for all TestController classes
  • 121.
    2519 classes 30 copies of TestController mongo query for all TestController classes details for one copy of TestController
  • 123.
  • 124.
  • 125.
    find references toobject holding references to all controllers
  • 126.
    find references toobject “leak” is on line 178 holding references to all controllers
  • 127.
    • In developmentmode, Rails reloads all your application code on every request
  • 128.
    • In developmentmode, Rails reloads all your application code on every request • ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization
  • 129.
    • In developmentmode, Rails reloads all your application code on every request • ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization • But.. it ends up holding a reference to every single reloaded version of those controllers
  • 130.
    • In developmentmode, Rails reloads all your application code on every request • ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization • But.. it ends up holding a reference to every single reloaded version of those controllers
  • 131.