• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Debugging Ruby (Aman Gupta)
 

Debugging Ruby (Aman Gupta)

on

  • 5,006 views

 

Statistics

Views

Total Views
5,006
Views on SlideShare
4,940
Embed Views
66

Actions

Likes
12
Downloads
38
Comments
1

3 Embeds 66

http://www.slideshare.net 41
http://www.10gen.com 17
http://www.mongodb.org 8

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • The design of debugger tool and usage with mongodb are awesome. It would sparks people lots :)
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Debugging Ruby (Aman Gupta) Debugging Ruby (Aman Gupta) Presentation Transcript

    • Debugging Ruby with MongoDB Aman Gupta @tmm1
    • debugging ruby? • i use ruby
    • debugging ruby? • i use ruby • my ruby processes use a lot of ram
    • debugging ruby? • i use ruby • my ruby processes use a lot of ram • i want to fix this
    • let’s build a debugger • step 1: collect data • list of all ruby objects in memory
    • let’s build a debugger • step 1: collect data • list of all ruby objects in memory • step 2: analyze data • group by type • group by file/line
    • version 1: collect data • simple patch to ruby VM (300 lines of C) • http://gist.github.com/73674 • simple text based output format 0x154750 @ -e:1 is OBJECT of type: T 0x15476c @ -e:1 is HASH which has data 0x154788 @ -e:1 is ARRAY of len: 0 0x1547c0 @ -e:1 is STRING (SHARED) len: 2 and val: hi 0x1547dc @ -e:1 is STRING len: 1 and val: T 0x154814 @ -e:1 is CLASS named: T inherits from Object 0x154a98 @ -e:1 is STRING len: 2 and val: hi 0x154b40 @ -e:1 is OBJECT of type: Range
    • version 1: analyze data $ wc -l /tmp/ruby.heap  1571529 /tmp/ruby.heap
    • version 1: analyze data $ wc -l /tmp/ruby.heap  1571529 /tmp/ruby.heap $ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1  236840 memcached/memcached.rb:316
    • version 1: analyze data $ wc -l /tmp/ruby.heap  1571529 /tmp/ruby.heap $ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1  236840 memcached/memcached.rb:316 $ grep "memcached.rb:316" /tmp/ruby.heap | awk '{ print $5 }' | sort | uniq -c | sort -g | tail -5    10948 ARRAY    20355 OBJECT    30744 DATA   64952 HASH   123290 STRING
    • version 1
    • version 1 • it works!
    • version 1 • it works! • but...
    • version 1 • it works! • but... • must patch and rebuild ruby binary
    • version 1 • it works! • but... • must patch and rebuild ruby binary • no information about references between objects
    • version 1 • it works! • but... • must patch and rebuild ruby binary • no information about references between objects • limited analysis via shell scripting
    • version 2 goals
    • version 2 goals • better data format
    • version 2 goals • better data format • simple: one line of text per object
    • version 2 goals • better data format • simple: one line of text per object • expressive: include all details about object contents and references
    • version 2 goals • better data format • simple: one line of text per object • expressive: include all details about object contents and references • easy to use: easy to generate from C code & easy to consume from various scripting languages
    • JSON!
    • version 2 is memprof
    • version 2 is memprof • no patches to ruby necessary • gem install memprof • require ‘memprof’ • Memprof.dump_all(“/tmp/app.json”)
    • version 2 is memprof • no patches to ruby necessary • gem install memprof • require ‘memprof’ • Memprof.dump_all(“/tmp/app.json”) • C extension for MRI ruby VM http://github.com/ice799/memprof • uses libyajl to dump out all ruby objects as json
    • Memprof.dump{ strings } “hello” + “world”
    • Memprof.dump{ strings } “hello” + “world” { "_id": "0x19c610", memory address of object "file": "-e", "line": 1, "type": "string", "class": "0x1ba7f0", "class_name": "String", "length": 10, "data": "helloworld" }
    • Memprof.dump{ strings } “hello” + “world” { "_id": "0x19c610", memory address of object "file": "-e", file and line where string "line": 1, was created "type": "string", "class": "0x1ba7f0", "class_name": "String", "length": 10, "data": "helloworld" }
    • Memprof.dump{ strings } “hello” + “world” { "_id": "0x19c610", memory address of object "file": "-e", file and line where string "line": 1, was created "type": "string", "class": "0x1ba7f0", address of the class object "class_name": "String", “String” "length": 10, "data": "helloworld" }
    • Memprof.dump{ strings } “hello” + “world” { "_id": "0x19c610", memory address of object "file": "-e", file and line where string "line": 1, was created "type": "string", "class": "0x1ba7f0", address of the class object "class_name": "String", “String” "length": 10, length and contents "data": "helloworld" of this string instance }
    • arrays Memprof.dump{ [ 1, :b, 2.2, “d” ] }
    • arrays Memprof.dump{ [ 1, :b, { "_id": "0x19c5c0", 2.2, “d” "class": "0x1b0d18", ] "class_name": "Array", } "length": 4, "data": [ 1, ":b", "0x19c750", "0x19c598" ] }
    • arrays Memprof.dump{ [ 1, :b, { "_id": "0x19c5c0", 2.2, “d” "class": "0x1b0d18", ] "class_name": "Array", } "length": 4, "data": [ 1, integers and symbols are ":b", stored in the array itself "0x19c750", "0x19c598" ] }
    • arrays Memprof.dump{ [ 1, :b, { "_id": "0x19c5c0", 2.2, “d” "class": "0x1b0d18", ] "class_name": "Array", } "length": 4, "data": [ 1, integers and symbols are ":b", stored in the array itself "0x19c750", floats and strings are "0x19c598" separate ruby objects ] }
    • hashes Memprof.dump{ { :a => 1, “b” => 2.2 } }
    • hashes Memprof.dump{ { :a => 1, “b” => 2.2 { } "_id": "0x19c598", } "type": "hash", "class": "0x1af170", "class_name": "Hash", "default": null, "length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ] }
    • hashes Memprof.dump{ { :a => 1, “b” => 2.2 { } "_id": "0x19c598", } "type": "hash", "class": "0x1af170", "class_name": "Hash", "default": null, "length": 2, "data": [ [ ":a", 1 ], hash entries as key/value [ "0xc728", "0xc750" ] pairs ] }
    • hashes Memprof.dump{ { :a => 1, “b” => 2.2 { } "_id": "0x19c598", } "type": "hash", "class": "0x1af170", "class_name": "Hash", "default": null, no default proc "length": 2, "data": [ [ ":a", 1 ], hash entries as key/value [ "0xc728", "0xc750" ] pairs ] }
    • classes Memprof.dump{ class Hello @@var=1 Const=2 def world() end end }
    • classes Memprof.dump{ class Hello @@var=1 Const=2 { def world() end "_id": "0x19c408", end "type": "class", } "name": "Hello", "super": "0x1bfa48", "super_name": "Object", "ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" } }
    • classes Memprof.dump{ class Hello @@var=1 Const=2 { def world() end "_id": "0x19c408", end "type": "class", } "name": "Hello", "super": "0x1bfa48", superclass object reference "super_name": "Object", "ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" } }
    • classes Memprof.dump{ class Hello @@var=1 Const=2 { def world() end "_id": "0x19c408", end "type": "class", } "name": "Hello", "super": "0x1bfa48", superclass object reference "super_name": "Object", "ivars": { class variables and constants "@@var": 1, are stored in the instance "Const": 2 }, variable table "methods": { "world": "0x19c318" } }
    • classes Memprof.dump{ class Hello @@var=1 Const=2 { def world() end "_id": "0x19c408", end "type": "class", } "name": "Hello", "super": "0x1bfa48", superclass object reference "super_name": "Object", "ivars": { class variables and constants "@@var": 1, are stored in the instance "Const": 2 }, variable table "methods": { "world": "0x19c318" references to method objects } }
    • version 2: analyze data
    • version 2: memprof.com a web-based heap visualizer and leak analyzer
    • built on...
    • built on... $ mongoimport -d memprof -c rails --file /tmp/app.json $ mongo memprof
    • built on... $ mongoimport -d memprof -c rails --file /tmp/app.json $ mongo memprof let’s run some queries.
    • how many objects? > db.rails.count() 809816 • ruby scripts create a lot of objects • usually not a problem, but... • MRI has a naïve stop-the-world mark/ sweep GC • fewer objects = faster GC = better performance
    • what types of objects? > db.rails.distinct(‘type’) [‘array’, ‘bignum’, ‘class’, ‘float’, ‘hash’, ‘module’, ‘node’, ‘object’, ‘regexp’, ‘string’, ...]
    • mongodb: distinct
    • mongodb: distinct • distinct(‘type’) list of types of objects
    • mongodb: distinct • distinct(‘type’) list of types of objects • distinct(‘file’) list of source files
    • mongodb: distinct • distinct(‘type’) list of types of objects • distinct(‘file’) list of source files • distinct(‘class_name’) list of instance class names
    • mongodb: distinct • distinct(‘type’) list of types of objects • distinct(‘file’) list of source files • distinct(‘class_name’) list of instance class names • optionally filter first • distinct(‘name’, {type:“class”}) names of all defined classes
    • improve performance with indexes > db.rails.ensureIndex({‘type’:1})
    • improve performance with indexes > db.rails.ensureIndex({‘type’:1}) > db.rails.ensureIndex( {‘file’:1}, {background:true} )
    • mongodb: ensureIndex • add an index on a field (if it doesn’t exist yet) • improve performance of queries against common fields: type, class_name, super, file
    • mongodb: ensureIndex • add an index on a field (if it doesn’t exist yet) • improve performance of queries against common fields: type, class_name, super, file • can index embedded field names • ensureIndex(‘methods.add’) • find({‘methods.add’:{$exists:true}}) find classes that define the method add
    • how many objs per type? > db.rails.group({ initial: {count:0}, key: {type:true}, group on type cond: {}, reduce: function(obj, out) { out.count++ } }).sort(function(a,b){ return a.count - b.count })
    • how many objs per type? > db.rails.group({ initial: {count:0}, key: {type:true}, group on type cond: {}, reduce: function(obj, out) { increment count out.count++ for each obj } }).sort(function(a,b){ return a.count - b.count })
    • how many objs per type? > db.rails.group({ initial: {count:0}, key: {type:true}, group on type cond: {}, reduce: function(obj, out) { increment count out.count++ for each obj } }).sort(function(a,b){ return a.count - b.count sort results })
    • how many objs per type? [ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285} ]
    • how many objs per type? [ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285} ] lots of nodes
    • how many objs per type? [ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285} ] lots of nodes • nodes represent ruby code • stored like any other ruby object • makes ruby completely dynamic
    • mongodb: group
    • mongodb: group • cond: query to filter objects before grouping
    • mongodb: group • cond: query to filter objects before grouping • key: field(s) to group on
    • mongodb: group • cond: query to filter objects before grouping • key: field(s) to group on • initial: initial values for each group’s results
    • mongodb: group • cond: query to filter objects before grouping • key: field(s) to group on • initial: initial values for each group’s results • reduce: aggregation function
    • mongodb: group
    • mongodb: group • bykey: {type:1} type or class • • key: {class_name:1}
    • mongodb: group • bykey: {type:1} type or class • • key: {class_name:1} • bykey:&{file:1, line:1} file line •
    • mongodb: group • bykey: {type:1} type or class • • key: {class_name:1} • bykey:&{file:1, line:1} file line • • bycond: in a specific file type • {file: “app.rb”}, key: {file:1, line:1}
    • mongodb: group • bykey: {type:1} type or class • • key: {class_name:1} • bykey:&{file:1, line:1} file line • • bycond: in a specific file type • {file: “app.rb”}, key: {file:1, line:1} • bycond: {file:“app.rb”,type:‘string’}, length of strings in a specific file • key: {length:1}
    • what subclasses String? > db.rails.find( {super_name:"String"}, {name:1} ) {name: "ActiveSupport::SafeBuffer"} {name: "ActiveSupport::StringInquirer"} {name: "SQLite3::Blob"} {name: "ActiveModel::Name"} {name: "Arel::Attribute::Expressions"} {name: "ActiveSupport::JSON::Variable"}
    • what subclasses String? > db.rails.find( {super_name:"String"}, {name:1} select only name field ) {name: "ActiveSupport::SafeBuffer"} {name: "ActiveSupport::StringInquirer"} {name: "SQLite3::Blob"} {name: "ActiveModel::Name"} {name: "Arel::Attribute::Expressions"} {name: "ActiveSupport::JSON::Variable"}
    • mongodb: find
    • mongodb: find • find({type:‘string’}) all strings
    • mongodb: find • find({type:‘string’}) all strings • find({type:{$ne:‘string’}}) everything except strings
    • mongodb: find • find({type:‘string’}) all strings • find({type:{$ne:‘string’}}) everything except strings • find({type:‘string’}, {data:1}) only select string’s data field
    • the largest objects? > db.rails.find( {type: {$in:['string','array','hash']} }, {type:1,length:1} ).sort({length:-1}).limit(3) {type: "string", length: 2308} {type: "string", length: 1454} {type: "string", length: 1238}
    • mongodb: sort, limit/skip
    • mongodb: sort, limit/skip • sort({length:-1,file:1}) sort by length desc, file asc
    • mongodb: sort, limit/skip • sort({length:-1,file:1}) sort by length desc, file asc • limit(10) first 10 results
    • mongodb: sort, limit/skip • sort({length:-1,file:1}) sort by length desc, file asc • limit(10) first 10 results • skip(10).limit(10) second 10 results
    • when were objs created? • useful to look at objects over time • each obj has a timestamp of when it was created
    • when were objs created? • useful to look at objects over time • each obj has a timestamp of when it was created • find minimum time, call it start_time
    • when were objs created? • useful to look at objects over time • each obj has a timestamp of when it was created • find minimum time, call it start_time • create buckets for every minute of execution since start
    • when were objs created? • useful to look at objects over time • each obj has a timestamp of when it was created • find minimum time, call it start_time • create buckets for every minute of execution since start • place objects into buckets
    • when were objs created? > db.rails.mapReduce(function(){ var secs = this.time - start_time; var mins_since_start = secs % 60; emit(mins_since_start, 1); }, function(key, vals){ for(var i=0,sum=0; i<vals.length; sum += vals[i++]); return sum; }, { scope: { start_time: db.rails.find ().sort({time:1}).limit(1)[0].time } } start_time = min(time) ) {result:"tmp.mr_1272615772_3"}
    • mongodb: mapReduce • arguments • map: function that emits one or more key/value pairs given each object this • reduce: function to return aggregate result, given key and list of values • scope: global variables to set for funcs
    • mongodb: mapReduce • arguments • map: function that emits one or more key/value pairs given each object this • reduce: function to return aggregate result, given key and list of values • scope: global variables to set for funcs • results • stored in a temporary collection (tmp.mr_1272615772_3)
    • when were objs created? > db.tmp.mr_1272615772_3.count() 12 script was running for 12 minutes
    • when were objs created? > db.tmp.mr_1272615772_3.count() 12 script was running for 12 minutes > db.tmp.mr_1272615772_3.find().sort ({value:-1}).limit(1) {_id: 8, value: 41231} 41k objects created 8 minutes after start
    • references to this object? ary = [“a”,”b”,”c”] ary references “a” “b” referenced by ary • ruby makes it easy to “leak” references • an object will stay around until all references to it are gone • more objects = longer GC = bad performance • must find references to fix leaks
    • references to this object? • db.rails_refs.insert({ _id:"0xary", refs:["0xa","0xb","0xc"] }) create references lookup table
    • references to this object? • db.rails_refs.insert({ _id:"0xary", refs:["0xa","0xb","0xc"] }) create references lookup table • db.rails_refs.ensureIndex({refs:1}) add ‘multikey’ index to refs array
    • references to this object? • db.rails_refs.insert({ _id:"0xary", refs:["0xa","0xb","0xc"] }) create references lookup table • db.rails_refs.ensureIndex({refs:1}) add ‘multikey’ index to refs array • db.rails_refs.find({refs:“0xa”}) efficiently lookup all objs holding a ref to 0xa
    • mongodb: multikeys • indexes on array values create a ‘multikey’ index • classic example: nested array of tags • find({tags: “ruby”}) find objs where obj.tags includes “ruby”
    • version 2: memprof.com a web-based heap visualizer and leak analyzer
    • memprof.com a web-based heap visualizer and leak analyzer
    • memprof.com a web-based heap visualizer and leak analyzer
    • memprof.com a web-based heap visualizer and leak analyzer
    • memprof.com a web-based heap visualizer and leak analyzer
    • memprof.com a web-based heap visualizer and leak analyzer
    • memprof.com a web-based heap visualizer and leak analyzer
    • memprof.com a web-based heap visualizer and leak analyzer
    • memprof.com a web-based heap visualizer and leak analyzer
    • memprof.com a web-based heap visualizer and leak analyzer
    • plugging a leak in rails3 • in dev mode, rails3 is leaking 10mb per request
    • plugging a leak in rails3 • in dev mode, rails3 is leaking 10mb per request let’s use memprof to find it! # in environment.rb require `gem which memprof/signal`.strip
    • plugging a leak in rails3 send the app some requests so it leaks $ ab -c 1 -n 30 http://localhost:3000/
    • plugging a leak in rails3 send the app some requests so it leaks $ ab -c 1 -n 30 http://localhost:3000/ tell memprof to dump out the entire heap to json $ memprof --pid <pid> --name <dump name> --key <api key>
    • plugging a leak in rails3 send the app some requests so it leaks $ ab -c 1 -n 30 http://localhost:3000/ tell memprof to dump out the entire heap to json $ memprof --pid <pid> --name <dump name> --key <api key>
    • 2519 classes
    • 2519 classes 30 copies of TestController
    • 2519 classes 30 copies of TestController
    • 2519 classes 30 copies of TestController mongo query for all TestController classes
    • 2519 classes 30 copies of TestController mongo query for all TestController classes details for one copy of TestController
    • find references to object
    • find references to object
    • find references to object holding references to all controllers
    • find references to object “leak” is on line 178 holding references to all controllers
    • • In development mode, Rails reloads all your application code on every request
    • • In development mode, Rails reloads all your application code on every request • ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization
    • • In development mode, Rails reloads all your application code on every request • ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization • But.. it ends up holding a reference to every single reloaded version of those controllers
    • • In development mode, Rails reloads all your application code on every request • ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization • But.. it ends up holding a reference to every single reloaded version of those controllers
    • Questions? Aman Gupta @tmm1