Your SlideShare is downloading. ×
0
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

RailswayCon 2010 - Dynamic Language VMs

2,200

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,200
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
39
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. <ul>Dynamic Language VMs </ul><ul>Ruby 1.9 </ul><ul>Lourens Naude, WildfireApp.com </ul>
  • 2. <ul>Background </ul><ul><li>Independent Contractor </li></ul><ul><ul><li>Ruby / C / integrations
  • 3. Well versed full stack
  • 4. Architecture </li></ul></ul><ul><li>WildfireApp.com </li></ul><ul><ul><li>Social Marketing platform
  • 5. Large whitelabel clients
  • 6. Bursty traffic – Lady Gaga, EA, Gatorade etc. </li></ul></ul>
  • 7.  
  • 8. <ul>RUBY VM INTERNALS ? </ul>
  • 9. <ul>A GOOD CRAFTSMEN KNOWS HIS TOOLS </ul>
  • 10. <ul>A BAD CRAFTSMEN BLAMES HIS TOOLS </ul>
  • 11. <ul>Typical public facing apps </ul><ul><li>Interaction patterns </li></ul><ul><ul><li>Request / response
  • 12. Time
  • 13. Event driven </li></ul></ul><ul><li>Overheads </li></ul><ul><ul><li>Data transfer (I/0)
  • 14. Serialization / coercion (CPU)
  • 15. VM – allocation, symbol tables etc. (CPU + mem)
  • 16. Business requirements (CPU) </li></ul></ul>
  • 17. <ul>Ruby daemon - strace </ul>Process 5856 detached % time calls syscall ------ ------- ------------- 89.69 5092 recvfrom 5.35 5093 sendto 2.49 26300 stat 2.05 11004 clock_gettime
  • 18. <ul>Ruby daemon - ltrace </ul>% time calls function ------ -------- -------- 95.78 635173 memcpy 1.38 25862 malloc 0.79 14984 free 0.60 11403 strcmp
  • 19. <ul>System Resources </ul><ul><li>Data latency </li></ul><ul><ul><li>CPU cache
  • 20. Memory – local
  • 21. Disk - local
  • 22. Memory + disk - remote </li></ul></ul><ul><li>Record retrieval with ORM </li></ul><ul><ul><li>Fetch results (local/remote memory + disk)
  • 23. Serialization + conversion (CPU)
  • 24. Object instantiation (CPU + memory)
  • 25. Optional memcached (local or remote memory) </li></ul></ul>
  • 26. <ul>RUBY ? </ul>
  • 27. <ul>Conversion – rows to hash </ul>Benchmark.bm do |b| b.report do 1000.times{ ActiveRecord::Base.connection.select_rows &quot;SELECT * FROM users&quot; } end end user system total real 0.300000 0.040000 0.340000 ( 0.505095)
  • 28. <ul>Conversion – rows to objects </ul>Benchmark.bm do |b| b.report do 1000.times{ ActiveRecord::Base.connection.select_all &quot;SELECT * FROM users&quot; } end end user system total real 0.510000 0.050000 0.560000 ( 0.719201)
  • 29. <ul>Instantiation </ul>Benchmark.bm do |b| b.report do 100_000.times{ 'string'.dup } end end user system total real 0.040000 0.000000 0.040000 ( 0.043791)
  • 30. <ul>Serialization – load + dump </ul>Benchmark.bm do |b| b.report do 100_000.times{ Marshal.load(Marshal.dump('ruby string')) } end end user system total real 1.660000 0.010000 1.670000 ( 1.699882)
  • 31. <ul>Roadmap </ul><ul><li>VM Architecture </li></ul><ul><ul><li>Symbol table
  • 32. Opcodes / instructions
  • 33. Dispatch
  • 34. Optimizations </li></ul></ul><ul><li>Ruby language </li></ul><ul><ul><li>Object model
  • 35. Garbage Collection
  • 36. Contexts and control flow
  • 37. Concurrency </li></ul></ul>
  • 38. <ul>VM ARCHITECTURE </ul>
  • 39.  
  • 40. <ul>Changes </ul><ul><li>Ruby 1.8 artifacts </li></ul><ul><ul><li>Parser && AST nodes
  • 41. Object model
  • 42. Garbage Collection
  • 43. No immediate performance gains for String manipulation etc. </li></ul></ul><ul><li>Codegen phase </li></ul><ul><ul><li>Better optimization hooks
  • 44. Faster runtime </li></ul></ul>
  • 45. <ul>AST AND CODEGEN </ul>
  • 46.  
  • 47. <ul>Abstract Syntax Tree (AST) </ul><ul><li>Structure </li></ul><ul><ul><li>Grammar representation
  • 48. Annotations attach semantics to nodes
  • 49. Possible to refactor the tree – more nodes, less complexity </li></ul></ul><ul><li>Example nodes </li></ul><ul><ul><li>Literals, values and assignments
  • 50. Method calls, arguments and return values
  • 51. Jumps – if, else, iterators
  • 52. Unconditional jumps – exceptions, retry etc. </li></ul></ul>
  • 53. <ul>Code generation </ul><ul><li>How it works </li></ul><ul><ul><li>Converts the AST to compiled code segments
  • 54. Reduces a tree to a linear and ordered instruction set
  • 55. Fast execution – no tree walking + native code </li></ul></ul><ul><li>Workflow </li></ul><ul><ul><li>Preprocessing – AST refactoring (!YARV)
  • 56. Codegen, nodes -> instruction sequences
  • 57. Postprocessing – replace with optimal instruction sequences (peephole optimization)
  • 58. Pre and postprocessing phases may be multiple passes </li></ul></ul>
  • 59. <ul>LOOKUPS </ul>
  • 60.  
  • 61. <ul>Symbol / Hash tables </ul><ul><li>How it works </li></ul><ul><ul><li>Constant time access to int/char indexed values
  • 62. Table defaults: 11 bins, 5 entries per bin
  • 63. Bins++, sequential lookup inside bins
  • 64. Lookup of methods, variables, encodings etc. </li></ul></ul><ul><li>Symbol </li></ul><ul><ul><li>Entity with both a String and Number representation
  • 65. !(String || Symbol), points to a table entry
  • 66. Developer identifies by name, VM by int
  • 67. Immutable for performance – watch out for memory </li></ul></ul>
  • 68. <ul>VM INSTRUCTIONS </ul>
  • 69. <ul>VM instructions / opcodes </ul><ul><li>Stateless functions </li></ul><ul><ul><li>80+ currently
  • 70. Generated from definitions at interpreter compile time (existing ruby requirement for 1.9)
  • 71. Instruction / opcode / operands notation </li></ul></ul><ul><li>Categories and examples </li></ul><ul><ul><li>variable: get or set local variable
  • 72. class / module: definition
  • 73. method / iterator: invoke method, call block
  • 74. Optimization: redefines common +, <<, * contracts </li></ul></ul>
  • 75. <ul>Managing opcode sequences </ul><ul><li>Stack Machine </li></ul><ul><ul><li>2 instruction types: push && pop
  • 76. Move / copy values, top of stack -> elsewhere
  • 77. SP: top of stack pointer, BP: bottom of stack pointer </li></ul></ul><ul><li>Example </li></ul><ul><ul><li>%w(a b c)
  • 78. Put strings “a”, “b” and “c” on the stack
  • 79. Fetch top 3 stack elements
  • 80. Create an array from them </li></ul></ul>
  • 81. <ul>Instruction sequence </ul><ul><li>Opcode collection </li></ul><ul><ul><li>Instruction dispatch can be a bottleneck
  • 82. Optimizing simple instructions is very important
  • 83. Likely a small subset of the typical web app's hot path </li></ul></ul><ul><li>Dispatch techniques </li></ul><ul><ul><li>Direct Threaded Dispatch : fastest jump to next opcode / instruction
  • 84. Switch Dispatch : slower, but portable </li></ul></ul>
  • 85. <ul>DISPATCH AND CACHE </ul>
  • 86. <ul>Dispatch techniques </ul><ul><li>Direct Threaded Dispatch </li></ul><ul><ul><li>Represents an instruction by the address of the routine that implements it
  • 87. Forth, Python 3
  • 88. Not portable: GCC first class labels </li></ul></ul><ul><li>Switch Dispatch </li></ul><ul><ul><li>CPU branch mispredictions, depending on pipeline length
  • 89. Up to 50% slower than Threaded dispatch
  • 90. Portable </li></ul></ul>
  • 91. <ul>VM Caches </ul><ul><li>Versioning </li></ul><ul><ul><li>State counter scopes caches to the current VM state
  • 92. Lazy invalidation – just bump the version </li></ul></ul><ul><li>Expires on </li></ul><ul><ul><li>constant definition
  • 93. constant removal
  • 94. method definition
  • 95. method removal
  • 96. method cache changes (covered later) </li></ul></ul>
  • 97. <ul>OPTIMIZATIONS </ul>
  • 98. <ul>Optimization Limitations </ul><ul><li>Static Analysis </li></ul><ul><ul><li>Examine source code without execution
  • 99. Dynamic analysis – runtime introspection </li></ul></ul>Dynamic nature of Ruby <ul><ul><li>Literals are generally safe to consider for optimizations
  • 100. Constants can be redefined
  • 101. Open classes – variable method table
  • 102. Object#method_missing
  • 103. No explicit return types </li></ul></ul>
  • 104. <ul>Common optimizations </ul><ul><li>Constant folding
  • 105. Constant propagation
  • 106. Dead code elimination
  • 107. Subexpression elimination
  • 108. Method in-lining
  • 109. Cloning
  • 110. Peephole Optimization
  • 111. * not all implemented in YARV </li></ul>
  • 112. <ul>Constant folding </ul><ul>1 + 2 # 3 <li>2 * 3 # 3 + 3
  • 113. 2 * 1 # 2
  • 114. 2 ** 2 # 2 *2
  • 115. class Fixnum
  • 116. def +(*args) # dynamic Ruby spec
  • 117. end
  • 118. end </li></ul>
  • 119. <ul>Code elimination </ul>loop { # loop { begin # begin # eval'ed code # eval'ed code break # break break # ensure ensure # end end # } }
  • 120. <ul>Subexpression elimination </ul>x = x – (y * 2) z = z – (y * 2) t = y * 2 x = x – t z = z - t
  • 121. <ul>Constant propagation </ul>def a b = 20 c(3 * b) end def a # def a b = 20 # c(60) c(3 * 20) # end end
  • 122. <ul>In-lining </ul>def b 2 * 3 end def a # def a def a 2 + b # 2 + 2 * 3 2 + (2 * 3) end # end end
  • 123. <ul>Cloning </ul>def a(b, c) b << c expire_cache end a('railsway', 'con') def a_railsway_con 'railsway' << 'con' expire_cache end
  • 124. <ul>Peephole Optimization (before) </ul>x = true # 0008 getlocal x if x # 0010 branchunless 17 else # 0012 jump 14 end # 0014 putnil 0015 jump 18 0017 putnil 0018 leave
  • 125. <ul>Peephole Optimization (after) </ul>x = true # 0008 getlocal x if x # 0010 branchunless 15 else # 0012 putnil end # 0013 leave 0014 pop 0015 putnil 0016 leave
  • 126. <ul>OBJECTS </ul>
  • 127. <ul>Object Requirements </ul><ul><li>Stateful
  • 128. Identity </li></ul><ul><ul><li>Unique identifier to represent the object at runtime </li></ul></ul><ul><li>Methods </li></ul><ul><ul><li>Change or query object state
  • 129. Command and Query pattern </li></ul></ul>
  • 130. <ul>Object structure </ul><ul>typedef unsigned long VALUE; </ul>struct RBasic { VALUE flags; # object flags VALUE klass; # instance of ... }
  • 131. <ul>Object structure (cont) </ul><ul><li>Casting </li></ul><ul><ul><li>Pointer type that represent addresses to language structures
  • 132. RBASIC(obj)->flags
  • 133. ((struct RBasic *)obj)->flags </li></ul></ul>Flags <ul><ul><li>frozen
  • 134. marked
  • 135. tainted
  • 136. embedded status </li></ul></ul>
  • 137. <ul>Classes / modules structure </ul><ul>struct RClass { </ul>struct RBasic basic; # object structure rb_classext_t *ptr; # external class struct st_table *m_tbl; # method table struct st_table *iv_index_tbl; # ivars }
  • 138. <ul>Class / module structure (cont) </ul><ul><li>Casting </li></ul><ul><ul><li>RCLASS(a_str)->ptr.super #=> Object
  • 139. RCLASS(a_fixnum)->ptr.super #=> Integer </li></ul></ul>Attributes <ul><ul><li>Symbol tables for methods and ivars
  • 140. Class / module distinction through flags </li></ul></ul>
  • 141. <ul>Special objects </ul><ul><li>Immediates </li></ul><ul><ul><li>No runtime casting overheads – fits in VALUE
  • 142. nil #=> 4
  • 143. true #=> 2
  • 144. false #=> 0
  • 145. Symbols
  • 146. Fixnums <= 30 bits
  • 147. Floats and Bignum are complex objects – hence poor Floating Point benchmarks
  • 148. RFLOAT(float_obj)->float_value #=> a double </li></ul></ul>
  • 149. <ul>Object memory layout </ul><ul><li>Object#object_id (32 bit architecture) </li></ul><ul><ul><li>sizeof(VALUE) is 4 bytes
  • 150. Objects, even, multiples of 4
  • 151. Symbols, even, multiples of 8
  • 152. Integers, odd
  • 153. Immediates <= 4 </li></ul></ul>
  • 154. <ul>Mutable Objects </ul><ul>struct RString { </ul>struct RBasic basic; union {struct {long len; char *ptr union { long capa; VALUE shared; }aux; }heap;
  • 155. <ul>Mutable Objects (cont) </ul><ul><li>String and Array </li></ul><ul><ul><li>require the ability to shrink / grow capacity
  • 156. allocates slightly more data than required
  • 157. Avoids malloc, realloc and memmove overhead
  • 158. Short strings “str”
  • 159. Short arrays %w(a r y) </li></ul></ul>
  • 160. <ul>Shared Objects </ul><ul>str = 'railsway'; </ul>str2 = “#{str}con” # shared ref str3 = str << 'con' # copy + mod ary = %w(railsway con) ary2 = ary.dup # shared ref ary3 = ary2.delete_at(1) # copy + mod
  • 161. <ul>Method Dispatch </ul><ul><li>Language constraints </li></ul><ul><ul><li>Loose typing
  • 162. Open classes
  • 163. Method calls can never be reduced to CALL(a_method)
  • 164. Search overhead </li></ul></ul><ul><li>Language constraints
  • 165. Dispatch sequence
  • 166. Deref class pointer
  • 167. Check methods table
  • 168. Call method or delegate to superclass </li></ul>
  • 169.  
  • 170. <ul>call VS send </ul><ul><li>obj.__send__ :method </li></ul><ul><ul><li>We never call methods
  • 171. Send query / command messages to objects
  • 172. Methods return values – RPC style messaging </li></ul></ul><ul><li>Method cache </li></ul><ul><ul><li>Method cache == router
  • 173. 95% hit rate when warm
  • 174. Method redefinition, module inclusion etc. clears the method cache / “routing table”
  • 175. Introduces significant overhead for subsequent method calls </li></ul></ul>
  • 176. <ul>Method cache don'ts </ul><ul>class SomeController < AC::Base <li>def show
  • 177. # busts method cache for the whole VM
  • 178. @user.extend SomeBehavior
  • 179. end
  • 180. end </li></ul>
  • 181.  
  • 182.  
  • 183.  
  • 184. <ul>Instance var changes </ul><ul><li>Optimizations </li></ul><ul><ul><li>First 3 ivars is embedded on the object
  • 185. Avoids symbol table lookups </li></ul></ul><ul><li>ivar table </li></ul><ul><ul><li>Table is per class, not per object
  • 186. Ivar table is shared by all instances of the same class
  • 187. Saves on memory footprint of a table per instance </li></ul></ul>
  • 188. <ul>GARBAGE COLLECTION </ul>
  • 189. <ul>Process memory layout </ul><ul><li>Code segment </li></ul><ul><ul><li>Executable code
  • 190. Read only </li></ul></ul><ul><li>Stack segment </li></ul><ul><ul><li>Stack storage
  • 191. Addressed with stack pointers </li></ul></ul><ul><li>Heap Memory available for program / developer use </li></ul>
  • 192. <ul>Malloc </ul><ul><li>Usable / free space </li></ul><ul><ul><li>Managed by a free list
  • 193. Linear search overhead to find free chunks </li></ul></ul><ul><li>Better layout </li></ul><ul><ul><li>Index free chunks by size intervals </li></ul></ul>
  • 194.  
  • 195.  
  • 196. <ul>GC terminology </ul><ul><li>Root set </li></ul><ul><ul><li>Directly accessible without pointer scanning
  • 197. C stack, global vars, global constants etc. </li></ul></ul><ul><li>Unreachable hooks </li></ul><ul><ul><li>Variable assignment to nil
  • 198. method return etc. </li></ul></ul><ul><li>Conservative VM hands out raw pointers to objects </li></ul>
  • 199. <ul>GC strategies </ul><ul><li>Stop the World </li></ul><ul><ul><li>Minimal allocation overhead
  • 200. Hands out objects while heap space is available
  • 201. Halts execution to reclaim memory
  • 202. Very disruptive in the hot path </li></ul></ul><ul><li>Incremental </li></ul><ul><ul><li>Collection activity during allocation
  • 203. Smoother, but with some minor overhead
  • 204. Suitable for hard realtime environments </li></ul></ul>
  • 205. <ul>Scripting GC </ul><ul><li>Mark and Sweep </li></ul><ul><ul><li>Identifies live objects
  • 206. Assumes remainder is for collection
  • 207. Concerned with unreachable objects </li></ul></ul><ul><li>Stop and Copy </li></ul><ul><ul><li>2 heap spaces (double memory overhead)
  • 208. 1 active, 1 inactive
  • 209. Copies reachable chunks to the new active area
  • 210. Concerned with live objects </li></ul></ul>
  • 211. <ul>Common GC Issues </ul><ul><li>Conservative GC </li></ul><ul><ul><li>Memory fragmentation
  • 212. Dangling pointers
  • 213. Memory leaks from circular garbage </li></ul></ul><ul><li>Allocation </li></ul><ul><ul><li>Bursty allocation
  • 214. Knowledge of pointer layout and chunks required </li></ul></ul>
  • 215. <ul>Ruby heap layout </ul><ul><li>Multiple heaps </li></ul><ul><ul><li>Referenced through heap list
  • 216. Composed of multiple slots
  • 217. Freed when empty ...
  • 218. IF all slots is tagged as being free
  • 219. A Rails app allocates 4 to 6 heaps on startup </li></ul></ul>
  • 220.  
  • 221.  
  • 222.  
  • 223. <ul>Slot layouts </ul><ul><li>Per heap </li></ul><ul><ul><li>Each slot references a single object
  • 224. Defaults to 10 000 slots for the first heap
  • 225. Threshold of 4096 free slots per heap
  • 226. Free list points to the next free slot </li></ul></ul><ul><li>Heap growth </li></ul><ul><ul><li>Next allocated heap has 1.8 capacity of the last one
  • 227. That's why memory consumption's so high ... </li></ul></ul>
  • 228. <ul>Heap growth – small app </ul><ul>>> 8 * 1.8 <li>=> 14.4
  • 229. >> 8 * 1.8 * 1.8
  • 230. => 25.92
  • 231. >> 8 * 1.8 * 1.8 * 1.8
  • 232. => 46.656
  • 233. >> 8 * 1.8 * 1.8 * 1.8 * 1.8
  • 234. => 83.9808 </li></ul>
  • 235. <ul>Heap growth – mid to large app </ul><ul>=> 83.9808 <li>>> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
  • 236. => 151.16544
  • 237. >> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
  • 238. => 272.097792
  • 239. >> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
  • 240. => 489.7760256 </li></ul>
  • 241. <ul>Slot structure </ul><ul>typedef struct RVALUE { <li>union {
  • 242. struct {
  • 243. VALUE flags; /* 0 when free */
  • 244. struct RVALUE *next;
  • 245. }free;
  • 246. struct RObject object;
  • 247. struct RFloat float;
  • 248. ... </li></ul>
  • 249. <ul>Pointer layout </ul><ul><li>Self describing </li></ul><ul><ul><li>Program data area and heap
  • 250. RVALUE union can accommodate any ruby object
  • 251. Frames, variable structures etc. well defined also
  • 252. 40 bytes (64 bit arch) represents a slot
  • 253. Free list points to the next free slot </li></ul></ul>
  • 254. <ul>Ruby heap VS OS heap </ul><ul><li>Ruby heap </li></ul><ul><ul><li>20 bytes represents a slot
  • 255. slot points to OS data, on the OS / system heap </li></ul></ul><ul><li>OS heap </li></ul><ul><ul><li>Thus a 20 byte slot can reference a 2MB chunk on the system heap </li></ul></ul>
  • 256.  
  • 257. <ul>CRuby: Mark and Sweep </ul><ul><li>Conservative </li></ul><ul><ul><li>Cannot determine with certainty if a value references an object – assume it's in use </li></ul></ul><ul><li>Two phase implementation </li></ul><ul><ul><li>Mark phase: identifies and flags reachable objects from the current program context
  • 258. Sweep phase: iterates through the object space and …
  • 259. free all objects not marked
  • 260. unmark marked objects </li></ul></ul>
  • 261. <ul>Concerns </ul><ul><li>Performance </li></ul><ul><ul><li>Runtime pauses
  • 262. Work proportional to heap size
  • 263. Prone to memory fragmentation (no compaction)
  • 264. Recursive </li></ul></ul><ul><li>Triggers </li></ul><ul><ul><li>8m malloc calls triggers GC
  • 265. Every 8MB allocated triggers GC
  • 266. Not enough heap reserve </li></ul></ul>
  • 267. <ul>GC in action </ul><ul># 4 objs, 1 Array, 3 Strings <li>ary1 = %w(a b c)
  • 268. ary2 = %w(d e f)
  • 269. # both ary1 and ary2 is reachable
  • 270. ary1 = nil
  • 271. # ary1 and it's contents is unreachable </li></ul>
  • 272.  
  • 273.  
  • 274.  
  • 275. <ul>Generational GC </ul><ul><li>Observations </li></ul><ul><ul><li>Vast majority of objects are short lived – 80%+
  • 276. Expensive to account for long lived objects
  • 277. Parition by age and frequently collect short lived ones </li></ul></ul><ul><li>How it works </li></ul><ul><ul><li>Restrict GC to the most recently modified slots
  • 278. These “sub heaps” are referred to as generations
  • 279. Perform a full GC only when the youngest generation
  • 280. fails to meet memory requirements </li></ul></ul>
  • 281. <ul>CONCURRENCY </ul>
  • 282. <ul>Threading </ul><ul><li>Changes </li></ul><ul><ul><li>Native OS Threads
  • 283. Ruby Thread == pthread
  • 284. Multiple cores ftw! </li></ul></ul><ul><li>… but </li></ul><ul><ul><li>Syscalls schedule, synchronize and create
  • 285. Much more expensive to spawn and switch than green threads
  • 286. Global VM Lock (GVL) </li></ul></ul>
  • 287. <ul>Global VM Lock (GVL) </ul><ul><li>How it works </li></ul><ul><ul><li>Thread that owns the GVL is allowed to execute
  • 288. Blocking operations should release the GVL
  • 289. Automatically released when scheduled
  • 290. C extensions : author does not concern with syncronization </li></ul></ul>
  • 291. <ul>Blocking VM operations </ul><ul><li>I/O </li></ul><ul><ul><li>blocking reads and writes
  • 292. DNS resolution or connects
  • 293. Often has huge handshake overheads </li></ul></ul><ul><li>Computations, processes and locks </li></ul><ul><ul><li>Expensive Bignum ops blocked 1.8 interpreters
  • 294. Process.waitpid
  • 295. File locks </li></ul></ul>
  • 296. <ul>Releasing the GVL </ul><ul><li>Stable API </li></ul><ul><ul><li>Blocking function: slow system call / computation
  • 297. Unblock function: called on Thread interrupt </li></ul></ul><ul><li>Pitfalls
  • 298. Cannot access VALUEs (objects) in blocking functions
  • 299. No integration with Ruby's exception / error handler </li></ul>
  • 300. <ul>Lightweight Concurrency </ul><ul><li>Fibers </li></ul><ul><ul><li>Coroutines – 4k stack size
  • 301. Very fast user space context switches
  • 302. Cooperative scheduling required
  • 303. Fiber.yield pauses the activation record, which keeps context across multiple calls </li></ul></ul><ul><li>Use cases </li></ul><ul><ul><li>Generators
  • 304. Blocking I/0 - Neverblock </li></ul></ul>
  • 305. <ul>In the pipeline </ul><ul><li>MVM: Multiple Virtual Machines </li></ul><ul><ul><li>Shared process state
  • 306. Sandboxed per VM application state
  • 307. Distribute VMs across available cores
  • 308. Message passing for inter VM communication
  • 309. Most Ruby deployments aren't thread safe
  • 310. MVM is well suited for this </li></ul></ul>
  • 311. <ul>QUESTIONS ? </ul>

×