Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Debugging Ruby (Aman Gupta)

  1. Debugging Ruby with MongoDB Aman Gupta @tmm1
  2. debugging ruby? • i use ruby
  3. debugging ruby? • i use ruby • my ruby processes use a lot of ram
  4. debugging ruby? • i use ruby • my ruby processes use a lot of ram • i want to fix this
  5. let’s build a debugger • step 1: collect data • list of all ruby objects in memory
  6. let’s build a debugger • step 1: collect data • list of all ruby objects in memory • step 2: analyze data • group by type • group by file/line
  7. version 1: collect data • simple patch to ruby VM (300 lines of C) • http://gist.github.com/73674 • simple text based output format 0x154750 @ -e:1 is OBJECT of type: T 0x15476c @ -e:1 is HASH which has data 0x154788 @ -e:1 is ARRAY of len: 0 0x1547c0 @ -e:1 is STRING (SHARED) len: 2 and val: hi 0x1547dc @ -e:1 is STRING len: 1 and val: T 0x154814 @ -e:1 is CLASS named: T inherits from Object 0x154a98 @ -e:1 is STRING len: 2 and val: hi 0x154b40 @ -e:1 is OBJECT of type: Range
  8. version 1: analyze data $ wc -l /tmp/ruby.heap  1571529 /tmp/ruby.heap
  9. version 1: analyze data $ wc -l /tmp/ruby.heap  1571529 /tmp/ruby.heap $ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1  236840 memcached/memcached.rb:316
  10. version 1: analyze data $ wc -l /tmp/ruby.heap  1571529 /tmp/ruby.heap $ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1  236840 memcached/memcached.rb:316 $ grep "memcached.rb:316" /tmp/ruby.heap | awk '{ print $5 }' | sort | uniq -c | sort -g | tail -5    10948 ARRAY    20355 OBJECT    30744 DATA   64952 HASH   123290 STRING
  11. version 1
  12. version 1 • it works!
  13. version 1 • it works! • but...
  14. version 1 • it works! • but... • must patch and rebuild ruby binary
  15. version 1 • it works! • but... • must patch and rebuild ruby binary • no information about references between objects
  16. version 1 • it works! • but... • must patch and rebuild ruby binary • no information about references between objects • limited analysis via shell scripting
  17. version 2 goals
  18. version 2 goals • better data format
  19. version 2 goals • better data format • simple: one line of text per object
  20. version 2 goals • better data format • simple: one line of text per object • expressive: include all details about object contents and references
  21. version 2 goals • better data format • simple: one line of text per object • expressive: include all details about object contents and references • easy to use: easy to generate from C code & easy to consume from various scripting languages
  22. JSON!
  23. version 2 is memprof
  24. version 2 is memprof • no patches to ruby necessary • gem install memprof • require ‘memprof’ • Memprof.dump_all(“/tmp/app.json”)
  25. version 2 is memprof • no patches to ruby necessary • gem install memprof • require ‘memprof’ • Memprof.dump_all(“/tmp/app.json”) • C extension for MRI ruby VM http://github.com/ice799/memprof • uses libyajl to dump out all ruby objects as json
  26. Memprof.dump{ strings } “hello” + “world”
  27. Memprof.dump{ strings } “hello” + “world” { "_id": "0x19c610", memory address of object "file": "-e", "line": 1, "type": "string", "class": "0x1ba7f0", "class_name": "String", "length": 10, "data": "helloworld" }
  28. Memprof.dump{ strings } “hello” + “world” { "_id": "0x19c610", memory address of object "file": "-e", file and line where string "line": 1, was created "type": "string", "class": "0x1ba7f0", "class_name": "String", "length": 10, "data": "helloworld" }
  29. Memprof.dump{ strings } “hello” + “world” { "_id": "0x19c610", memory address of object "file": "-e", file and line where string "line": 1, was created "type": "string", "class": "0x1ba7f0", address of the class object "class_name": "String", “String” "length": 10, "data": "helloworld" }
  30. Memprof.dump{ strings } “hello” + “world” { "_id": "0x19c610", memory address of object "file": "-e", file and line where string "line": 1, was created "type": "string", "class": "0x1ba7f0", address of the class object "class_name": "String", “String” "length": 10, length and contents "data": "helloworld" of this string instance }
  31. arrays Memprof.dump{ [ 1, :b, 2.2, “d” ] }
  32. arrays Memprof.dump{ [ 1, :b, { "_id": "0x19c5c0", 2.2, “d” "class": "0x1b0d18", ] "class_name": "Array", } "length": 4, "data": [ 1, ":b", "0x19c750", "0x19c598" ] }
  33. arrays Memprof.dump{ [ 1, :b, { "_id": "0x19c5c0", 2.2, “d” "class": "0x1b0d18", ] "class_name": "Array", } "length": 4, "data": [ 1, integers and symbols are ":b", stored in the array itself "0x19c750", "0x19c598" ] }
  34. arrays Memprof.dump{ [ 1, :b, { "_id": "0x19c5c0", 2.2, “d” "class": "0x1b0d18", ] "class_name": "Array", } "length": 4, "data": [ 1, integers and symbols are ":b", stored in the array itself "0x19c750", floats and strings are "0x19c598" separate ruby objects ] }
  35. hashes Memprof.dump{ { :a => 1, “b” => 2.2 } }
  36. hashes Memprof.dump{ { :a => 1, “b” => 2.2 { } "_id": "0x19c598", } "type": "hash", "class": "0x1af170", "class_name": "Hash", "default": null, "length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ] }
  37. hashes Memprof.dump{ { :a => 1, “b” => 2.2 { } "_id": "0x19c598", } "type": "hash", "class": "0x1af170", "class_name": "Hash", "default": null, "length": 2, "data": [ [ ":a", 1 ], hash entries as key/value [ "0xc728", "0xc750" ] pairs ] }
  38. hashes Memprof.dump{ { :a => 1, “b” => 2.2 { } "_id": "0x19c598", } "type": "hash", "class": "0x1af170", "class_name": "Hash", "default": null, no default proc "length": 2, "data": [ [ ":a", 1 ], hash entries as key/value [ "0xc728", "0xc750" ] pairs ] }
  39. classes Memprof.dump{ class Hello @@var=1 Const=2 def world() end end }
  40. classes Memprof.dump{ class Hello @@var=1 Const=2 { def world() end "_id": "0x19c408", end "type": "class", } "name": "Hello", "super": "0x1bfa48", "super_name": "Object", "ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" } }
  41. classes Memprof.dump{ class Hello @@var=1 Const=2 { def world() end "_id": "0x19c408", end "type": "class", } "name": "Hello", "super": "0x1bfa48", superclass object reference "super_name": "Object", "ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" } }
  42. classes Memprof.dump{ class Hello @@var=1 Const=2 { def world() end "_id": "0x19c408", end "type": "class", } "name": "Hello", "super": "0x1bfa48", superclass object reference "super_name": "Object", "ivars": { class variables and constants "@@var": 1, are stored in the instance "Const": 2 }, variable table "methods": { "world": "0x19c318" } }
  43. classes Memprof.dump{ class Hello @@var=1 Const=2 { def world() end "_id": "0x19c408", end "type": "class", } "name": "Hello", "super": "0x1bfa48", superclass object reference "super_name": "Object", "ivars": { class variables and constants "@@var": 1, are stored in the instance "Const": 2 }, variable table "methods": { "world": "0x19c318" references to method objects } }
  44. version 2: analyze data
  45. version 2: memprof.com a web-based heap visualizer and leak analyzer
  46. built on...
  47. built on... $ mongoimport -d memprof -c rails --file /tmp/app.json $ mongo memprof
  48. built on... $ mongoimport -d memprof -c rails --file /tmp/app.json $ mongo memprof let’s run some queries.
  49. how many objects? > db.rails.count() 809816 • ruby scripts create a lot of objects • usually not a problem, but... • MRI has a naïve stop-the-world mark/ sweep GC • fewer objects = faster GC = better performance
  50. what types of objects? > db.rails.distinct(‘type’) [‘array’, ‘bignum’, ‘class’, ‘float’, ‘hash’, ‘module’, ‘node’, ‘object’, ‘regexp’, ‘string’, ...]
  51. mongodb: distinct
  52. mongodb: distinct • distinct(‘type’) list of types of objects
  53. mongodb: distinct • distinct(‘type’) list of types of objects • distinct(‘file’) list of source files
  54. mongodb: distinct • distinct(‘type’) list of types of objects • distinct(‘file’) list of source files • distinct(‘class_name’) list of instance class names
  55. mongodb: distinct • distinct(‘type’) list of types of objects • distinct(‘file’) list of source files • distinct(‘class_name’) list of instance class names • optionally filter first • distinct(‘name’, {type:“class”}) names of all defined classes
  56. improve performance with indexes > db.rails.ensureIndex({‘type’:1})
  57. improve performance with indexes > db.rails.ensureIndex({‘type’:1}) > db.rails.ensureIndex( {‘file’:1}, {background:true} )
  58. mongodb: ensureIndex • add an index on a field (if it doesn’t exist yet) • improve performance of queries against common fields: type, class_name, super, file
  59. mongodb: ensureIndex • add an index on a field (if it doesn’t exist yet) • improve performance of queries against common fields: type, class_name, super, file • can index embedded field names • ensureIndex(‘methods.add’) • find({‘methods.add’:{$exists:true}}) find classes that define the method add
  60. how many objs per type? > db.rails.group({ initial: {count:0}, key: {type:true}, group on type cond: {}, reduce: function(obj, out) { out.count++ } }).sort(function(a,b){ return a.count - b.count })
  61. how many objs per type? > db.rails.group({ initial: {count:0}, key: {type:true}, group on type cond: {}, reduce: function(obj, out) { increment count out.count++ for each obj } }).sort(function(a,b){ return a.count - b.count })
  62. how many objs per type? > db.rails.group({ initial: {count:0}, key: {type:true}, group on type cond: {}, reduce: function(obj, out) { increment count out.count++ for each obj } }).sort(function(a,b){ return a.count - b.count sort results })
  63. how many objs per type? [ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285} ]
  64. how many objs per type? [ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285} ] lots of nodes
  65. how many objs per type? [ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285} ] lots of nodes • nodes represent ruby code • stored like any other ruby object • makes ruby completely dynamic
  66. mongodb: group
  67. mongodb: group • cond: query to filter objects before grouping
  68. mongodb: group • cond: query to filter objects before grouping • key: field(s) to group on
  69. mongodb: group • cond: query to filter objects before grouping • key: field(s) to group on • initial: initial values for each group’s results
  70. mongodb: group • cond: query to filter objects before grouping • key: field(s) to group on • initial: initial values for each group’s results • reduce: aggregation function
  71. mongodb: group
  72. mongodb: group • bykey: {type:1} type or class • • key: {class_name:1}
  73. mongodb: group • bykey: {type:1} type or class • • key: {class_name:1} • bykey:&{file:1, line:1} file line •
  74. mongodb: group • bykey: {type:1} type or class • • key: {class_name:1} • bykey:&{file:1, line:1} file line • • bycond: in a specific file type • {file: “app.rb”}, key: {file:1, line:1}
  75. mongodb: group • bykey: {type:1} type or class • • key: {class_name:1} • bykey:&{file:1, line:1} file line • • bycond: in a specific file type • {file: “app.rb”}, key: {file:1, line:1} • bycond: {file:“app.rb”,type:‘string’}, length of strings in a specific file • key: {length:1}
  76. what subclasses String? > db.rails.find( {super_name:"String"}, {name:1} ) {name: "ActiveSupport::SafeBuffer"} {name: "ActiveSupport::StringInquirer"} {name: "SQLite3::Blob"} {name: "ActiveModel::Name"} {name: "Arel::Attribute::Expressions"} {name: "ActiveSupport::JSON::Variable"}
  77. what subclasses String? > db.rails.find( {super_name:"String"}, {name:1} select only name field ) {name: "ActiveSupport::SafeBuffer"} {name: "ActiveSupport::StringInquirer"} {name: "SQLite3::Blob"} {name: "ActiveModel::Name"} {name: "Arel::Attribute::Expressions"} {name: "ActiveSupport::JSON::Variable"}
  78. mongodb: find
  79. mongodb: find • find({type:‘string’}) all strings
  80. mongodb: find • find({type:‘string’}) all strings • find({type:{$ne:‘string’}}) everything except strings
  81. mongodb: find • find({type:‘string’}) all strings • find({type:{$ne:‘string’}}) everything except strings • find({type:‘string’}, {data:1}) only select string’s data field
  82. the largest objects? > db.rails.find( {type: {$in:['string','array','hash']} }, {type:1,length:1} ).sort({length:-1}).limit(3) {type: "string", length: 2308} {type: "string", length: 1454} {type: "string", length: 1238}
  83. mongodb: sort, limit/skip
  84. mongodb: sort, limit/skip • sort({length:-1,file:1}) sort by length desc, file asc
  85. mongodb: sort, limit/skip • sort({length:-1,file:1}) sort by length desc, file asc • limit(10) first 10 results
  86. mongodb: sort, limit/skip • sort({length:-1,file:1}) sort by length desc, file asc • limit(10) first 10 results • skip(10).limit(10) second 10 results
  87. when were objs created? • useful to look at objects over time • each obj has a timestamp of when it was created
  88. when were objs created? • useful to look at objects over time • each obj has a timestamp of when it was created • find minimum time, call it start_time
  89. when were objs created? • useful to look at objects over time • each obj has a timestamp of when it was created • find minimum time, call it start_time • create buckets for every minute of execution since start
  90. when were objs created? • useful to look at objects over time • each obj has a timestamp of when it was created • find minimum time, call it start_time • create buckets for every minute of execution since start • place objects into buckets
  91. when were objs created? > db.rails.mapReduce(function(){ var secs = this.time - start_time; var mins_since_start = secs % 60; emit(mins_since_start, 1); }, function(key, vals){ for(var i=0,sum=0; i<vals.length; sum += vals[i++]); return sum; }, { scope: { start_time: db.rails.find ().sort({time:1}).limit(1)[0].time } } start_time = min(time) ) {result:"tmp.mr_1272615772_3"}
  92. mongodb: mapReduce • arguments • map: function that emits one or more key/value pairs given each object this • reduce: function to return aggregate result, given key and list of values • scope: global variables to set for funcs
  93. mongodb: mapReduce • arguments • map: function that emits one or more key/value pairs given each object this • reduce: function to return aggregate result, given key and list of values • scope: global variables to set for funcs • results • stored in a temporary collection (tmp.mr_1272615772_3)
  94. when were objs created? > db.tmp.mr_1272615772_3.count() 12 script was running for 12 minutes
  95. when were objs created? > db.tmp.mr_1272615772_3.count() 12 script was running for 12 minutes > db.tmp.mr_1272615772_3.find().sort ({value:-1}).limit(1) {_id: 8, value: 41231} 41k objects created 8 minutes after start
  96. references to this object? ary = [“a”,”b”,”c”] ary references “a” “b” referenced by ary • ruby makes it easy to “leak” references • an object will stay around until all references to it are gone • more objects = longer GC = bad performance • must find references to fix leaks
  97. references to this object? • db.rails_refs.insert({ _id:"0xary", refs:["0xa","0xb","0xc"] }) create references lookup table
  98. references to this object? • db.rails_refs.insert({ _id:"0xary", refs:["0xa","0xb","0xc"] }) create references lookup table • db.rails_refs.ensureIndex({refs:1}) add ‘multikey’ index to refs array
  99. references to this object? • db.rails_refs.insert({ _id:"0xary", refs:["0xa","0xb","0xc"] }) create references lookup table • db.rails_refs.ensureIndex({refs:1}) add ‘multikey’ index to refs array • db.rails_refs.find({refs:“0xa”}) efficiently lookup all objs holding a ref to 0xa
  100. mongodb: multikeys • indexes on array values create a ‘multikey’ index • classic example: nested array of tags • find({tags: “ruby”}) find objs where obj.tags includes “ruby”
  101. version 2: memprof.com a web-based heap visualizer and leak analyzer
  102. memprof.com a web-based heap visualizer and leak analyzer
  103. memprof.com a web-based heap visualizer and leak analyzer
  104. memprof.com a web-based heap visualizer and leak analyzer
  105. memprof.com a web-based heap visualizer and leak analyzer
  106. memprof.com a web-based heap visualizer and leak analyzer
  107. memprof.com a web-based heap visualizer and leak analyzer
  108. memprof.com a web-based heap visualizer and leak analyzer
  109. memprof.com a web-based heap visualizer and leak analyzer
  110. memprof.com a web-based heap visualizer and leak analyzer
  111. plugging a leak in rails3 • in dev mode, rails3 is leaking 10mb per request
  112. plugging a leak in rails3 • in dev mode, rails3 is leaking 10mb per request let’s use memprof to find it! # in environment.rb require `gem which memprof/signal`.strip
  113. plugging a leak in rails3 send the app some requests so it leaks $ ab -c 1 -n 30 http://localhost:3000/
  114. plugging a leak in rails3 send the app some requests so it leaks $ ab -c 1 -n 30 http://localhost:3000/ tell memprof to dump out the entire heap to json $ memprof --pid <pid> --name <dump name> --key <api key>
  115. plugging a leak in rails3 send the app some requests so it leaks $ ab -c 1 -n 30 http://localhost:3000/ tell memprof to dump out the entire heap to json $ memprof --pid <pid> --name <dump name> --key <api key>
  116. 2519 classes
  117. 2519 classes 30 copies of TestController
  118. 2519 classes 30 copies of TestController
  119. 2519 classes 30 copies of TestController mongo query for all TestController classes
  120. 2519 classes 30 copies of TestController mongo query for all TestController classes details for one copy of TestController
  121. find references to object
  122. find references to object
  123. find references to object holding references to all controllers
  124. find references to object “leak” is on line 178 holding references to all controllers
  125. • In development mode, Rails reloads all your application code on every request
  126. • In development mode, Rails reloads all your application code on every request • ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization
  127. • In development mode, Rails reloads all your application code on every request • ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization • But.. it ends up holding a reference to every single reloaded version of those controllers
  128. • In development mode, Rails reloads all your application code on every request • ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization • But.. it ends up holding a reference to every single reloaded version of those controllers
  129. Questions? Aman Gupta @tmm1
Advertisement