<ul>Dynamic Language VMs </ul><ul>Ruby 1.9 </ul><ul>Lourens Naude, WildfireApp.com </ul>
<ul>Background </ul><ul><li>Independent Contractor </li></ul><ul><ul><li>Ruby / C / integrations
Well versed full stack
Architecture </li></ul></ul><ul><li>WildfireApp.com </li></ul><ul><ul><li>Social Marketing platform
Large whitelabel clients
Bursty traffic – Lady Gaga, EA, Gatorade etc. </li></ul></ul>
 
<ul>RUBY VM INTERNALS ? </ul>
<ul>A GOOD CRAFTSMEN KNOWS HIS TOOLS </ul>
<ul>A BAD CRAFTSMEN BLAMES HIS TOOLS </ul>
<ul>Typical public facing apps </ul><ul><li>Interaction patterns </li></ul><ul><ul><li>Request / response
Time
Event driven </li></ul></ul><ul><li>Overheads </li></ul><ul><ul><li>Data transfer (I/0)
Serialization / coercion (CPU)
VM – allocation, symbol tables etc. (CPU + mem)
Business requirements (CPU) </li></ul></ul>
<ul>Ruby daemon - strace </ul>Process 5856 detached % time  calls  syscall ------  ------- ------------- 89.69  5092  recv...
<ul>Ruby daemon - ltrace </ul>% time  calls  function ------  -------- -------- 95.78  635173  memcpy 1.38  25862  malloc ...
<ul>System Resources </ul><ul><li>Data latency </li></ul><ul><ul><li>CPU cache
Memory – local
Disk - local
Memory + disk - remote </li></ul></ul><ul><li>Record retrieval with ORM </li></ul><ul><ul><li>Fetch results (local/remote ...
Serialization + conversion (CPU)
Object instantiation (CPU + memory)
Optional memcached (local or remote memory) </li></ul></ul>
<ul>RUBY ? </ul>
<ul>Conversion – rows to hash </ul>Benchmark.bm do |b| b.report do 1000.times{ ActiveRecord::Base.connection.select_rows &...
<ul>Conversion – rows to objects </ul>Benchmark.bm do |b| b.report do 1000.times{ ActiveRecord::Base.connection.select_all...
<ul>Instantiation </ul>Benchmark.bm do |b| b.report do 100_000.times{ 'string'.dup } end end user  system  total  real 0.0...
<ul>Serialization – load + dump </ul>Benchmark.bm do |b| b.report do 100_000.times{ Marshal.load(Marshal.dump('ruby string...
<ul>Roadmap </ul><ul><li>VM Architecture </li></ul><ul><ul><li>Symbol table
Opcodes / instructions
Dispatch
Optimizations </li></ul></ul><ul><li>Ruby language </li></ul><ul><ul><li>Object model
Garbage Collection
Contexts and control flow
Concurrency </li></ul></ul>
<ul>VM ARCHITECTURE </ul>
 
<ul>Changes </ul><ul><li>Ruby 1.8 artifacts </li></ul><ul><ul><li>Parser && AST nodes
Object model
Garbage Collection
No immediate performance gains for String manipulation etc. </li></ul></ul><ul><li>Codegen phase </li></ul><ul><ul><li>Bet...
Faster runtime </li></ul></ul>
<ul>AST AND CODEGEN </ul>
 
<ul>Abstract Syntax Tree (AST) </ul><ul><li>Structure </li></ul><ul><ul><li>Grammar representation
Annotations attach semantics to nodes
Possible to refactor the tree – more nodes, less complexity </li></ul></ul><ul><li>Example nodes </li></ul><ul><ul><li>Lit...
Method calls, arguments and return values
Jumps – if, else, iterators
Unconditional jumps – exceptions, retry etc. </li></ul></ul>
<ul>Code generation </ul><ul><li>How it works </li></ul><ul><ul><li>Converts the AST to compiled code segments
Reduces a tree to a linear and ordered instruction set
Fast execution – no tree walking + native code </li></ul></ul><ul><li>Workflow </li></ul><ul><ul><li>Preprocessing – AST r...
Codegen, nodes -> instruction sequences
Postprocessing – replace with optimal instruction sequences (peephole optimization)
Pre and postprocessing phases may be multiple passes  </li></ul></ul>
<ul>LOOKUPS </ul>
 
<ul>Symbol / Hash tables </ul><ul><li>How it works </li></ul><ul><ul><li>Constant time access to int/char indexed values
Table defaults: 11 bins, 5 entries per bin
Bins++, sequential lookup inside bins
Lookup of methods, variables, encodings etc. </li></ul></ul><ul><li>Symbol </li></ul><ul><ul><li>Entity with both a String...
!(String || Symbol), points to a table entry
Developer identifies by name, VM by int
Immutable for performance – watch out for memory </li></ul></ul>
<ul>VM INSTRUCTIONS </ul>
<ul>VM instructions / opcodes </ul><ul><li>Stateless functions </li></ul><ul><ul><li>80+ currently
Generated from definitions at interpreter compile time (existing ruby requirement for 1.9)
Instruction / opcode / operands notation </li></ul></ul><ul><li>Categories and examples </li></ul><ul><ul><li>variable: ge...
class / module: definition
method / iterator: invoke method, call block
Optimization: redefines common +, <<, * contracts  </li></ul></ul>
<ul>Managing opcode sequences </ul><ul><li>Stack Machine </li></ul><ul><ul><li>2 instruction types: push && pop
Move / copy values, top of stack -> elsewhere
SP: top of stack pointer, BP: bottom of stack pointer </li></ul></ul><ul><li>Example </li></ul><ul><ul><li>%w(a b c)
Put strings “a”, “b” and “c” on the stack
Fetch top 3 stack elements
Create an array from them </li></ul></ul>
<ul>Instruction sequence </ul><ul><li>Opcode collection </li></ul><ul><ul><li>Instruction dispatch can be a bottleneck
Optimizing simple instructions is very important
Likely a small subset of the typical web app's hot path </li></ul></ul><ul><li>Dispatch techniques </li></ul><ul><ul><li>D...
Switch Dispatch : slower, but portable </li></ul></ul>
<ul>DISPATCH AND CACHE </ul>
<ul>Dispatch techniques </ul><ul><li>Direct Threaded Dispatch </li></ul><ul><ul><li>Represents an instruction by the addre...
Forth, Python 3
Not portable: GCC first class labels </li></ul></ul><ul><li>Switch Dispatch </li></ul><ul><ul><li>CPU branch misprediction...
Up to 50% slower than Threaded dispatch
Portable </li></ul></ul>
<ul>VM Caches </ul><ul><li>Versioning </li></ul><ul><ul><li>State counter scopes caches to the current VM state
Lazy invalidation – just bump the version </li></ul></ul><ul><li>Expires on </li></ul><ul><ul><li>constant definition
constant removal
method definition
method removal
method cache changes (covered later) </li></ul></ul>
<ul>OPTIMIZATIONS </ul>
<ul>Optimization Limitations </ul><ul><li>Static Analysis </li></ul><ul><ul><li>Examine source code without execution
Upcoming SlideShare
Loading in...5
×

RailswayCon 2010 - Dynamic Language VMs

2,226

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,226
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
39
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

RailswayCon 2010 - Dynamic Language VMs

  1. 1. <ul>Dynamic Language VMs </ul><ul>Ruby 1.9 </ul><ul>Lourens Naude, WildfireApp.com </ul>
  2. 2. <ul>Background </ul><ul><li>Independent Contractor </li></ul><ul><ul><li>Ruby / C / integrations
  3. 3. Well versed full stack
  4. 4. Architecture </li></ul></ul><ul><li>WildfireApp.com </li></ul><ul><ul><li>Social Marketing platform
  5. 5. Large whitelabel clients
  6. 6. Bursty traffic – Lady Gaga, EA, Gatorade etc. </li></ul></ul>
  7. 8. <ul>RUBY VM INTERNALS ? </ul>
  8. 9. <ul>A GOOD CRAFTSMEN KNOWS HIS TOOLS </ul>
  9. 10. <ul>A BAD CRAFTSMEN BLAMES HIS TOOLS </ul>
  10. 11. <ul>Typical public facing apps </ul><ul><li>Interaction patterns </li></ul><ul><ul><li>Request / response
  11. 12. Time
  12. 13. Event driven </li></ul></ul><ul><li>Overheads </li></ul><ul><ul><li>Data transfer (I/0)
  13. 14. Serialization / coercion (CPU)
  14. 15. VM – allocation, symbol tables etc. (CPU + mem)
  15. 16. Business requirements (CPU) </li></ul></ul>
  16. 17. <ul>Ruby daemon - strace </ul>Process 5856 detached % time calls syscall ------ ------- ------------- 89.69 5092 recvfrom 5.35 5093 sendto 2.49 26300 stat 2.05 11004 clock_gettime
  17. 18. <ul>Ruby daemon - ltrace </ul>% time calls function ------ -------- -------- 95.78 635173 memcpy 1.38 25862 malloc 0.79 14984 free 0.60 11403 strcmp
  18. 19. <ul>System Resources </ul><ul><li>Data latency </li></ul><ul><ul><li>CPU cache
  19. 20. Memory – local
  20. 21. Disk - local
  21. 22. Memory + disk - remote </li></ul></ul><ul><li>Record retrieval with ORM </li></ul><ul><ul><li>Fetch results (local/remote memory + disk)
  22. 23. Serialization + conversion (CPU)
  23. 24. Object instantiation (CPU + memory)
  24. 25. Optional memcached (local or remote memory) </li></ul></ul>
  25. 26. <ul>RUBY ? </ul>
  26. 27. <ul>Conversion – rows to hash </ul>Benchmark.bm do |b| b.report do 1000.times{ ActiveRecord::Base.connection.select_rows &quot;SELECT * FROM users&quot; } end end user system total real 0.300000 0.040000 0.340000 ( 0.505095)
  27. 28. <ul>Conversion – rows to objects </ul>Benchmark.bm do |b| b.report do 1000.times{ ActiveRecord::Base.connection.select_all &quot;SELECT * FROM users&quot; } end end user system total real 0.510000 0.050000 0.560000 ( 0.719201)
  28. 29. <ul>Instantiation </ul>Benchmark.bm do |b| b.report do 100_000.times{ 'string'.dup } end end user system total real 0.040000 0.000000 0.040000 ( 0.043791)
  29. 30. <ul>Serialization – load + dump </ul>Benchmark.bm do |b| b.report do 100_000.times{ Marshal.load(Marshal.dump('ruby string')) } end end user system total real 1.660000 0.010000 1.670000 ( 1.699882)
  30. 31. <ul>Roadmap </ul><ul><li>VM Architecture </li></ul><ul><ul><li>Symbol table
  31. 32. Opcodes / instructions
  32. 33. Dispatch
  33. 34. Optimizations </li></ul></ul><ul><li>Ruby language </li></ul><ul><ul><li>Object model
  34. 35. Garbage Collection
  35. 36. Contexts and control flow
  36. 37. Concurrency </li></ul></ul>
  37. 38. <ul>VM ARCHITECTURE </ul>
  38. 40. <ul>Changes </ul><ul><li>Ruby 1.8 artifacts </li></ul><ul><ul><li>Parser && AST nodes
  39. 41. Object model
  40. 42. Garbage Collection
  41. 43. No immediate performance gains for String manipulation etc. </li></ul></ul><ul><li>Codegen phase </li></ul><ul><ul><li>Better optimization hooks
  42. 44. Faster runtime </li></ul></ul>
  43. 45. <ul>AST AND CODEGEN </ul>
  44. 47. <ul>Abstract Syntax Tree (AST) </ul><ul><li>Structure </li></ul><ul><ul><li>Grammar representation
  45. 48. Annotations attach semantics to nodes
  46. 49. Possible to refactor the tree – more nodes, less complexity </li></ul></ul><ul><li>Example nodes </li></ul><ul><ul><li>Literals, values and assignments
  47. 50. Method calls, arguments and return values
  48. 51. Jumps – if, else, iterators
  49. 52. Unconditional jumps – exceptions, retry etc. </li></ul></ul>
  50. 53. <ul>Code generation </ul><ul><li>How it works </li></ul><ul><ul><li>Converts the AST to compiled code segments
  51. 54. Reduces a tree to a linear and ordered instruction set
  52. 55. Fast execution – no tree walking + native code </li></ul></ul><ul><li>Workflow </li></ul><ul><ul><li>Preprocessing – AST refactoring (!YARV)
  53. 56. Codegen, nodes -> instruction sequences
  54. 57. Postprocessing – replace with optimal instruction sequences (peephole optimization)
  55. 58. Pre and postprocessing phases may be multiple passes </li></ul></ul>
  56. 59. <ul>LOOKUPS </ul>
  57. 61. <ul>Symbol / Hash tables </ul><ul><li>How it works </li></ul><ul><ul><li>Constant time access to int/char indexed values
  58. 62. Table defaults: 11 bins, 5 entries per bin
  59. 63. Bins++, sequential lookup inside bins
  60. 64. Lookup of methods, variables, encodings etc. </li></ul></ul><ul><li>Symbol </li></ul><ul><ul><li>Entity with both a String and Number representation
  61. 65. !(String || Symbol), points to a table entry
  62. 66. Developer identifies by name, VM by int
  63. 67. Immutable for performance – watch out for memory </li></ul></ul>
  64. 68. <ul>VM INSTRUCTIONS </ul>
  65. 69. <ul>VM instructions / opcodes </ul><ul><li>Stateless functions </li></ul><ul><ul><li>80+ currently
  66. 70. Generated from definitions at interpreter compile time (existing ruby requirement for 1.9)
  67. 71. Instruction / opcode / operands notation </li></ul></ul><ul><li>Categories and examples </li></ul><ul><ul><li>variable: get or set local variable
  68. 72. class / module: definition
  69. 73. method / iterator: invoke method, call block
  70. 74. Optimization: redefines common +, <<, * contracts </li></ul></ul>
  71. 75. <ul>Managing opcode sequences </ul><ul><li>Stack Machine </li></ul><ul><ul><li>2 instruction types: push && pop
  72. 76. Move / copy values, top of stack -> elsewhere
  73. 77. SP: top of stack pointer, BP: bottom of stack pointer </li></ul></ul><ul><li>Example </li></ul><ul><ul><li>%w(a b c)
  74. 78. Put strings “a”, “b” and “c” on the stack
  75. 79. Fetch top 3 stack elements
  76. 80. Create an array from them </li></ul></ul>
  77. 81. <ul>Instruction sequence </ul><ul><li>Opcode collection </li></ul><ul><ul><li>Instruction dispatch can be a bottleneck
  78. 82. Optimizing simple instructions is very important
  79. 83. Likely a small subset of the typical web app's hot path </li></ul></ul><ul><li>Dispatch techniques </li></ul><ul><ul><li>Direct Threaded Dispatch : fastest jump to next opcode / instruction
  80. 84. Switch Dispatch : slower, but portable </li></ul></ul>
  81. 85. <ul>DISPATCH AND CACHE </ul>
  82. 86. <ul>Dispatch techniques </ul><ul><li>Direct Threaded Dispatch </li></ul><ul><ul><li>Represents an instruction by the address of the routine that implements it
  83. 87. Forth, Python 3
  84. 88. Not portable: GCC first class labels </li></ul></ul><ul><li>Switch Dispatch </li></ul><ul><ul><li>CPU branch mispredictions, depending on pipeline length
  85. 89. Up to 50% slower than Threaded dispatch
  86. 90. Portable </li></ul></ul>
  87. 91. <ul>VM Caches </ul><ul><li>Versioning </li></ul><ul><ul><li>State counter scopes caches to the current VM state
  88. 92. Lazy invalidation – just bump the version </li></ul></ul><ul><li>Expires on </li></ul><ul><ul><li>constant definition
  89. 93. constant removal
  90. 94. method definition
  91. 95. method removal
  92. 96. method cache changes (covered later) </li></ul></ul>
  93. 97. <ul>OPTIMIZATIONS </ul>
  94. 98. <ul>Optimization Limitations </ul><ul><li>Static Analysis </li></ul><ul><ul><li>Examine source code without execution
  95. 99. Dynamic analysis – runtime introspection </li></ul></ul>Dynamic nature of Ruby <ul><ul><li>Literals are generally safe to consider for optimizations
  96. 100. Constants can be redefined
  97. 101. Open classes – variable method table
  98. 102. Object#method_missing
  99. 103. No explicit return types </li></ul></ul>
  100. 104. <ul>Common optimizations </ul><ul><li>Constant folding
  101. 105. Constant propagation
  102. 106. Dead code elimination
  103. 107. Subexpression elimination
  104. 108. Method in-lining
  105. 109. Cloning
  106. 110. Peephole Optimization
  107. 111. * not all implemented in YARV </li></ul>
  108. 112. <ul>Constant folding </ul><ul>1 + 2 # 3 <li>2 * 3 # 3 + 3
  109. 113. 2 * 1 # 2
  110. 114. 2 ** 2 # 2 *2
  111. 115. class Fixnum
  112. 116. def +(*args) # dynamic Ruby spec
  113. 117. end
  114. 118. end </li></ul>
  115. 119. <ul>Code elimination </ul>loop { # loop { begin # begin # eval'ed code # eval'ed code break # break break # ensure ensure # end end # } }
  116. 120. <ul>Subexpression elimination </ul>x = x – (y * 2) z = z – (y * 2) t = y * 2 x = x – t z = z - t
  117. 121. <ul>Constant propagation </ul>def a b = 20 c(3 * b) end def a # def a b = 20 # c(60) c(3 * 20) # end end
  118. 122. <ul>In-lining </ul>def b 2 * 3 end def a # def a def a 2 + b # 2 + 2 * 3 2 + (2 * 3) end # end end
  119. 123. <ul>Cloning </ul>def a(b, c) b << c expire_cache end a('railsway', 'con') def a_railsway_con 'railsway' << 'con' expire_cache end
  120. 124. <ul>Peephole Optimization (before) </ul>x = true # 0008 getlocal x if x # 0010 branchunless 17 else # 0012 jump 14 end # 0014 putnil 0015 jump 18 0017 putnil 0018 leave
  121. 125. <ul>Peephole Optimization (after) </ul>x = true # 0008 getlocal x if x # 0010 branchunless 15 else # 0012 putnil end # 0013 leave 0014 pop 0015 putnil 0016 leave
  122. 126. <ul>OBJECTS </ul>
  123. 127. <ul>Object Requirements </ul><ul><li>Stateful
  124. 128. Identity </li></ul><ul><ul><li>Unique identifier to represent the object at runtime </li></ul></ul><ul><li>Methods </li></ul><ul><ul><li>Change or query object state
  125. 129. Command and Query pattern </li></ul></ul>
  126. 130. <ul>Object structure </ul><ul>typedef unsigned long VALUE; </ul>struct RBasic { VALUE flags; # object flags VALUE klass; # instance of ... }
  127. 131. <ul>Object structure (cont) </ul><ul><li>Casting </li></ul><ul><ul><li>Pointer type that represent addresses to language structures
  128. 132. RBASIC(obj)->flags
  129. 133. ((struct RBasic *)obj)->flags </li></ul></ul>Flags <ul><ul><li>frozen
  130. 134. marked
  131. 135. tainted
  132. 136. embedded status </li></ul></ul>
  133. 137. <ul>Classes / modules structure </ul><ul>struct RClass { </ul>struct RBasic basic; # object structure rb_classext_t *ptr; # external class struct st_table *m_tbl; # method table struct st_table *iv_index_tbl; # ivars }
  134. 138. <ul>Class / module structure (cont) </ul><ul><li>Casting </li></ul><ul><ul><li>RCLASS(a_str)->ptr.super #=> Object
  135. 139. RCLASS(a_fixnum)->ptr.super #=> Integer </li></ul></ul>Attributes <ul><ul><li>Symbol tables for methods and ivars
  136. 140. Class / module distinction through flags </li></ul></ul>
  137. 141. <ul>Special objects </ul><ul><li>Immediates </li></ul><ul><ul><li>No runtime casting overheads – fits in VALUE
  138. 142. nil #=> 4
  139. 143. true #=> 2
  140. 144. false #=> 0
  141. 145. Symbols
  142. 146. Fixnums <= 30 bits
  143. 147. Floats and Bignum are complex objects – hence poor Floating Point benchmarks
  144. 148. RFLOAT(float_obj)->float_value #=> a double </li></ul></ul>
  145. 149. <ul>Object memory layout </ul><ul><li>Object#object_id (32 bit architecture) </li></ul><ul><ul><li>sizeof(VALUE) is 4 bytes
  146. 150. Objects, even, multiples of 4
  147. 151. Symbols, even, multiples of 8
  148. 152. Integers, odd
  149. 153. Immediates <= 4 </li></ul></ul>
  150. 154. <ul>Mutable Objects </ul><ul>struct RString { </ul>struct RBasic basic; union {struct {long len; char *ptr union { long capa; VALUE shared; }aux; }heap;
  151. 155. <ul>Mutable Objects (cont) </ul><ul><li>String and Array </li></ul><ul><ul><li>require the ability to shrink / grow capacity
  152. 156. allocates slightly more data than required
  153. 157. Avoids malloc, realloc and memmove overhead
  154. 158. Short strings “str”
  155. 159. Short arrays %w(a r y) </li></ul></ul>
  156. 160. <ul>Shared Objects </ul><ul>str = 'railsway'; </ul>str2 = “#{str}con” # shared ref str3 = str << 'con' # copy + mod ary = %w(railsway con) ary2 = ary.dup # shared ref ary3 = ary2.delete_at(1) # copy + mod
  157. 161. <ul>Method Dispatch </ul><ul><li>Language constraints </li></ul><ul><ul><li>Loose typing
  158. 162. Open classes
  159. 163. Method calls can never be reduced to CALL(a_method)
  160. 164. Search overhead </li></ul></ul><ul><li>Language constraints
  161. 165. Dispatch sequence
  162. 166. Deref class pointer
  163. 167. Check methods table
  164. 168. Call method or delegate to superclass </li></ul>
  165. 170. <ul>call VS send </ul><ul><li>obj.__send__ :method </li></ul><ul><ul><li>We never call methods
  166. 171. Send query / command messages to objects
  167. 172. Methods return values – RPC style messaging </li></ul></ul><ul><li>Method cache </li></ul><ul><ul><li>Method cache == router
  168. 173. 95% hit rate when warm
  169. 174. Method redefinition, module inclusion etc. clears the method cache / “routing table”
  170. 175. Introduces significant overhead for subsequent method calls </li></ul></ul>
  171. 176. <ul>Method cache don'ts </ul><ul>class SomeController < AC::Base <li>def show
  172. 177. # busts method cache for the whole VM
  173. 178. @user.extend SomeBehavior
  174. 179. end
  175. 180. end </li></ul>
  176. 184. <ul>Instance var changes </ul><ul><li>Optimizations </li></ul><ul><ul><li>First 3 ivars is embedded on the object
  177. 185. Avoids symbol table lookups </li></ul></ul><ul><li>ivar table </li></ul><ul><ul><li>Table is per class, not per object
  178. 186. Ivar table is shared by all instances of the same class
  179. 187. Saves on memory footprint of a table per instance </li></ul></ul>
  180. 188. <ul>GARBAGE COLLECTION </ul>
  181. 189. <ul>Process memory layout </ul><ul><li>Code segment </li></ul><ul><ul><li>Executable code
  182. 190. Read only </li></ul></ul><ul><li>Stack segment </li></ul><ul><ul><li>Stack storage
  183. 191. Addressed with stack pointers </li></ul></ul><ul><li>Heap Memory available for program / developer use </li></ul>
  184. 192. <ul>Malloc </ul><ul><li>Usable / free space </li></ul><ul><ul><li>Managed by a free list
  185. 193. Linear search overhead to find free chunks </li></ul></ul><ul><li>Better layout </li></ul><ul><ul><li>Index free chunks by size intervals </li></ul></ul>
  186. 196. <ul>GC terminology </ul><ul><li>Root set </li></ul><ul><ul><li>Directly accessible without pointer scanning
  187. 197. C stack, global vars, global constants etc. </li></ul></ul><ul><li>Unreachable hooks </li></ul><ul><ul><li>Variable assignment to nil
  188. 198. method return etc. </li></ul></ul><ul><li>Conservative VM hands out raw pointers to objects </li></ul>
  189. 199. <ul>GC strategies </ul><ul><li>Stop the World </li></ul><ul><ul><li>Minimal allocation overhead
  190. 200. Hands out objects while heap space is available
  191. 201. Halts execution to reclaim memory
  192. 202. Very disruptive in the hot path </li></ul></ul><ul><li>Incremental </li></ul><ul><ul><li>Collection activity during allocation
  193. 203. Smoother, but with some minor overhead
  194. 204. Suitable for hard realtime environments </li></ul></ul>
  195. 205. <ul>Scripting GC </ul><ul><li>Mark and Sweep </li></ul><ul><ul><li>Identifies live objects
  196. 206. Assumes remainder is for collection
  197. 207. Concerned with unreachable objects </li></ul></ul><ul><li>Stop and Copy </li></ul><ul><ul><li>2 heap spaces (double memory overhead)
  198. 208. 1 active, 1 inactive
  199. 209. Copies reachable chunks to the new active area
  200. 210. Concerned with live objects </li></ul></ul>
  201. 211. <ul>Common GC Issues </ul><ul><li>Conservative GC </li></ul><ul><ul><li>Memory fragmentation
  202. 212. Dangling pointers
  203. 213. Memory leaks from circular garbage </li></ul></ul><ul><li>Allocation </li></ul><ul><ul><li>Bursty allocation
  204. 214. Knowledge of pointer layout and chunks required </li></ul></ul>
  205. 215. <ul>Ruby heap layout </ul><ul><li>Multiple heaps </li></ul><ul><ul><li>Referenced through heap list
  206. 216. Composed of multiple slots
  207. 217. Freed when empty ...
  208. 218. IF all slots is tagged as being free
  209. 219. A Rails app allocates 4 to 6 heaps on startup </li></ul></ul>
  210. 223. <ul>Slot layouts </ul><ul><li>Per heap </li></ul><ul><ul><li>Each slot references a single object
  211. 224. Defaults to 10 000 slots for the first heap
  212. 225. Threshold of 4096 free slots per heap
  213. 226. Free list points to the next free slot </li></ul></ul><ul><li>Heap growth </li></ul><ul><ul><li>Next allocated heap has 1.8 capacity of the last one
  214. 227. That's why memory consumption's so high ... </li></ul></ul>
  215. 228. <ul>Heap growth – small app </ul><ul>>> 8 * 1.8 <li>=> 14.4
  216. 229. >> 8 * 1.8 * 1.8
  217. 230. => 25.92
  218. 231. >> 8 * 1.8 * 1.8 * 1.8
  219. 232. => 46.656
  220. 233. >> 8 * 1.8 * 1.8 * 1.8 * 1.8
  221. 234. => 83.9808 </li></ul>
  222. 235. <ul>Heap growth – mid to large app </ul><ul>=> 83.9808 <li>>> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
  223. 236. => 151.16544
  224. 237. >> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
  225. 238. => 272.097792
  226. 239. >> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
  227. 240. => 489.7760256 </li></ul>
  228. 241. <ul>Slot structure </ul><ul>typedef struct RVALUE { <li>union {
  229. 242. struct {
  230. 243. VALUE flags; /* 0 when free */
  231. 244. struct RVALUE *next;
  232. 245. }free;
  233. 246. struct RObject object;
  234. 247. struct RFloat float;
  235. 248. ... </li></ul>
  236. 249. <ul>Pointer layout </ul><ul><li>Self describing </li></ul><ul><ul><li>Program data area and heap
  237. 250. RVALUE union can accommodate any ruby object
  238. 251. Frames, variable structures etc. well defined also
  239. 252. 40 bytes (64 bit arch) represents a slot
  240. 253. Free list points to the next free slot </li></ul></ul>
  241. 254. <ul>Ruby heap VS OS heap </ul><ul><li>Ruby heap </li></ul><ul><ul><li>20 bytes represents a slot
  242. 255. slot points to OS data, on the OS / system heap </li></ul></ul><ul><li>OS heap </li></ul><ul><ul><li>Thus a 20 byte slot can reference a 2MB chunk on the system heap </li></ul></ul>
  243. 257. <ul>CRuby: Mark and Sweep </ul><ul><li>Conservative </li></ul><ul><ul><li>Cannot determine with certainty if a value references an object – assume it's in use </li></ul></ul><ul><li>Two phase implementation </li></ul><ul><ul><li>Mark phase: identifies and flags reachable objects from the current program context
  244. 258. Sweep phase: iterates through the object space and …
  245. 259. free all objects not marked
  246. 260. unmark marked objects </li></ul></ul>
  247. 261. <ul>Concerns </ul><ul><li>Performance </li></ul><ul><ul><li>Runtime pauses
  248. 262. Work proportional to heap size
  249. 263. Prone to memory fragmentation (no compaction)
  250. 264. Recursive </li></ul></ul><ul><li>Triggers </li></ul><ul><ul><li>8m malloc calls triggers GC
  251. 265. Every 8MB allocated triggers GC
  252. 266. Not enough heap reserve </li></ul></ul>
  253. 267. <ul>GC in action </ul><ul># 4 objs, 1 Array, 3 Strings <li>ary1 = %w(a b c)
  254. 268. ary2 = %w(d e f)
  255. 269. # both ary1 and ary2 is reachable
  256. 270. ary1 = nil
  257. 271. # ary1 and it's contents is unreachable </li></ul>
  258. 275. <ul>Generational GC </ul><ul><li>Observations </li></ul><ul><ul><li>Vast majority of objects are short lived – 80%+
  259. 276. Expensive to account for long lived objects
  260. 277. Parition by age and frequently collect short lived ones </li></ul></ul><ul><li>How it works </li></ul><ul><ul><li>Restrict GC to the most recently modified slots
  261. 278. These “sub heaps” are referred to as generations
  262. 279. Perform a full GC only when the youngest generation
  263. 280. fails to meet memory requirements </li></ul></ul>
  264. 281. <ul>CONCURRENCY </ul>
  265. 282. <ul>Threading </ul><ul><li>Changes </li></ul><ul><ul><li>Native OS Threads
  266. 283. Ruby Thread == pthread
  267. 284. Multiple cores ftw! </li></ul></ul><ul><li>… but </li></ul><ul><ul><li>Syscalls schedule, synchronize and create
  268. 285. Much more expensive to spawn and switch than green threads
  269. 286. Global VM Lock (GVL) </li></ul></ul>
  270. 287. <ul>Global VM Lock (GVL) </ul><ul><li>How it works </li></ul><ul><ul><li>Thread that owns the GVL is allowed to execute
  271. 288. Blocking operations should release the GVL
  272. 289. Automatically released when scheduled
  273. 290. C extensions : author does not concern with syncronization </li></ul></ul>
  274. 291. <ul>Blocking VM operations </ul><ul><li>I/O </li></ul><ul><ul><li>blocking reads and writes
  275. 292. DNS resolution or connects
  276. 293. Often has huge handshake overheads </li></ul></ul><ul><li>Computations, processes and locks </li></ul><ul><ul><li>Expensive Bignum ops blocked 1.8 interpreters
  277. 294. Process.waitpid
  278. 295. File locks </li></ul></ul>
  279. 296. <ul>Releasing the GVL </ul><ul><li>Stable API </li></ul><ul><ul><li>Blocking function: slow system call / computation
  280. 297. Unblock function: called on Thread interrupt </li></ul></ul><ul><li>Pitfalls
  281. 298. Cannot access VALUEs (objects) in blocking functions
  282. 299. No integration with Ruby's exception / error handler </li></ul>
  283. 300. <ul>Lightweight Concurrency </ul><ul><li>Fibers </li></ul><ul><ul><li>Coroutines – 4k stack size
  284. 301. Very fast user space context switches
  285. 302. Cooperative scheduling required
  286. 303. Fiber.yield pauses the activation record, which keeps context across multiple calls </li></ul></ul><ul><li>Use cases </li></ul><ul><ul><li>Generators
  287. 304. Blocking I/0 - Neverblock </li></ul></ul>
  288. 305. <ul>In the pipeline </ul><ul><li>MVM: Multiple Virtual Machines </li></ul><ul><ul><li>Shared process state
  289. 306. Sandboxed per VM application state
  290. 307. Distribute VMs across available cores
  291. 308. Message passing for inter VM communication
  292. 309. Most Ruby deployments aren't thread safe
  293. 310. MVM is well suited for this </li></ul></ul>
  294. 311. <ul>QUESTIONS ? </ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×