RailswayCon 2010 - Dynamic Language VMs
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

RailswayCon 2010 - Dynamic Language VMs

on

  • 2,832 views

 

Statistics

Views

Total Views
2,832
Views on SlideShare
2,789
Embed Views
43

Actions

Likes
3
Downloads
38
Comments
0

1 Embed 43

http://www.slideshare.net 43

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

RailswayCon 2010 - Dynamic Language VMs Presentation Transcript

  • 1.
      Dynamic Language VMs
      Ruby 1.9
      Lourens Naude, WildfireApp.com
  • 2.
      Background
    • Independent Contractor
      • Ruby / C / integrations
      • 3. Well versed full stack
      • 4. Architecture
    • WildfireApp.com
      • Social Marketing platform
      • 5. Large whitelabel clients
      • 6. Bursty traffic – Lady Gaga, EA, Gatorade etc.
  • 7.  
  • 8.
      RUBY VM INTERNALS ?
  • 9.
      A GOOD CRAFTSMEN KNOWS HIS TOOLS
  • 10.
      A BAD CRAFTSMEN BLAMES HIS TOOLS
  • 11.
      Typical public facing apps
    • Interaction patterns
      • Request / response
      • 12. Time
      • 13. Event driven
    • Overheads
      • Data transfer (I/0)
      • 14. Serialization / coercion (CPU)
      • 15. VM – allocation, symbol tables etc. (CPU + mem)
      • 16. Business requirements (CPU)
  • 17.
      Ruby daemon - strace
    Process 5856 detached % time calls syscall ------ ------- ------------- 89.69 5092 recvfrom 5.35 5093 sendto 2.49 26300 stat 2.05 11004 clock_gettime
  • 18.
      Ruby daemon - ltrace
    % time calls function ------ -------- -------- 95.78 635173 memcpy 1.38 25862 malloc 0.79 14984 free 0.60 11403 strcmp
  • 19.
      System Resources
    • Data latency
      • CPU cache
      • 20. Memory – local
      • 21. Disk - local
      • 22. Memory + disk - remote
    • Record retrieval with ORM
      • Fetch results (local/remote memory + disk)
      • 23. Serialization + conversion (CPU)
      • 24. Object instantiation (CPU + memory)
      • 25. Optional memcached (local or remote memory)
  • 26.
      RUBY ?
  • 27.
      Conversion – rows to hash
    Benchmark.bm do |b| b.report do 1000.times{ ActiveRecord::Base.connection.select_rows "SELECT * FROM users" } end end user system total real 0.300000 0.040000 0.340000 ( 0.505095)
  • 28.
      Conversion – rows to objects
    Benchmark.bm do |b| b.report do 1000.times{ ActiveRecord::Base.connection.select_all "SELECT * FROM users" } end end user system total real 0.510000 0.050000 0.560000 ( 0.719201)
  • 29.
      Instantiation
    Benchmark.bm do |b| b.report do 100_000.times{ 'string'.dup } end end user system total real 0.040000 0.000000 0.040000 ( 0.043791)
  • 30.
      Serialization – load + dump
    Benchmark.bm do |b| b.report do 100_000.times{ Marshal.load(Marshal.dump('ruby string')) } end end user system total real 1.660000 0.010000 1.670000 ( 1.699882)
  • 31.
      Roadmap
    • VM Architecture
      • Symbol table
      • 32. Opcodes / instructions
      • 33. Dispatch
      • 34. Optimizations
    • Ruby language
      • Object model
      • 35. Garbage Collection
      • 36. Contexts and control flow
      • 37. Concurrency
  • 38.
      VM ARCHITECTURE
  • 39.  
  • 40.
      Changes
    • Ruby 1.8 artifacts
      • Parser && AST nodes
      • 41. Object model
      • 42. Garbage Collection
      • 43. No immediate performance gains for String manipulation etc.
    • Codegen phase
      • Better optimization hooks
      • 44. Faster runtime
  • 45.
      AST AND CODEGEN
  • 46.  
  • 47.
      Abstract Syntax Tree (AST)
    • Structure
      • Grammar representation
      • 48. Annotations attach semantics to nodes
      • 49. Possible to refactor the tree – more nodes, less complexity
    • Example nodes
      • Literals, values and assignments
      • 50. Method calls, arguments and return values
      • 51. Jumps – if, else, iterators
      • 52. Unconditional jumps – exceptions, retry etc.
  • 53.
      Code generation
    • How it works
      • Converts the AST to compiled code segments
      • 54. Reduces a tree to a linear and ordered instruction set
      • 55. Fast execution – no tree walking + native code
    • Workflow
      • Preprocessing – AST refactoring (!YARV)
      • 56. Codegen, nodes -> instruction sequences
      • 57. Postprocessing – replace with optimal instruction sequences (peephole optimization)
      • 58. Pre and postprocessing phases may be multiple passes
  • 59.
      LOOKUPS
  • 60.  
  • 61.
      Symbol / Hash tables
    • How it works
      • Constant time access to int/char indexed values
      • 62. Table defaults: 11 bins, 5 entries per bin
      • 63. Bins++, sequential lookup inside bins
      • 64. Lookup of methods, variables, encodings etc.
    • Symbol
      • Entity with both a String and Number representation
      • 65. !(String || Symbol), points to a table entry
      • 66. Developer identifies by name, VM by int
      • 67. Immutable for performance – watch out for memory
  • 68.
      VM INSTRUCTIONS
  • 69.
      VM instructions / opcodes
    • Stateless functions
      • 80+ currently
      • 70. Generated from definitions at interpreter compile time (existing ruby requirement for 1.9)
      • 71. Instruction / opcode / operands notation
    • Categories and examples
      • variable: get or set local variable
      • 72. class / module: definition
      • 73. method / iterator: invoke method, call block
      • 74. Optimization: redefines common +, <<, * contracts
  • 75.
      Managing opcode sequences
    • Stack Machine
      • 2 instruction types: push && pop
      • 76. Move / copy values, top of stack -> elsewhere
      • 77. SP: top of stack pointer, BP: bottom of stack pointer
    • Example
      • %w(a b c)
      • 78. Put strings “a”, “b” and “c” on the stack
      • 79. Fetch top 3 stack elements
      • 80. Create an array from them
  • 81.
      Instruction sequence
    • Opcode collection
      • Instruction dispatch can be a bottleneck
      • 82. Optimizing simple instructions is very important
      • 83. Likely a small subset of the typical web app's hot path
    • Dispatch techniques
      • Direct Threaded Dispatch : fastest jump to next opcode / instruction
      • 84. Switch Dispatch : slower, but portable
  • 85.
      DISPATCH AND CACHE
  • 86.
      Dispatch techniques
    • Direct Threaded Dispatch
      • Represents an instruction by the address of the routine that implements it
      • 87. Forth, Python 3
      • 88. Not portable: GCC first class labels
    • Switch Dispatch
      • CPU branch mispredictions, depending on pipeline length
      • 89. Up to 50% slower than Threaded dispatch
      • 90. Portable
  • 91.
      VM Caches
    • Versioning
      • State counter scopes caches to the current VM state
      • 92. Lazy invalidation – just bump the version
    • Expires on
      • constant definition
      • 93. constant removal
      • 94. method definition
      • 95. method removal
      • 96. method cache changes (covered later)
  • 97.
      OPTIMIZATIONS
  • 98.
      Optimization Limitations
    • Static Analysis
      • Examine source code without execution
      • 99. Dynamic analysis – runtime introspection
    Dynamic nature of Ruby
      • Literals are generally safe to consider for optimizations
      • 100. Constants can be redefined
      • 101. Open classes – variable method table
      • 102. Object#method_missing
      • 103. No explicit return types
  • 104.
      Common optimizations
    • Constant folding
    • 105. Constant propagation
    • 106. Dead code elimination
    • 107. Subexpression elimination
    • 108. Method in-lining
    • 109. Cloning
    • 110. Peephole Optimization
    • 111. * not all implemented in YARV
  • 112.
      Constant folding
      1 + 2 # 3
    • 2 * 3 # 3 + 3
    • 113. 2 * 1 # 2
    • 114. 2 ** 2 # 2 *2
    • 115. class Fixnum
    • 116. def +(*args) # dynamic Ruby spec
    • 117. end
    • 118. end
  • 119.
      Code elimination
    loop { # loop { begin # begin # eval'ed code # eval'ed code break # break break # ensure ensure # end end # } }
  • 120.
      Subexpression elimination
    x = x – (y * 2) z = z – (y * 2) t = y * 2 x = x – t z = z - t
  • 121.
      Constant propagation
    def a b = 20 c(3 * b) end def a # def a b = 20 # c(60) c(3 * 20) # end end
  • 122.
      In-lining
    def b 2 * 3 end def a # def a def a 2 + b # 2 + 2 * 3 2 + (2 * 3) end # end end
  • 123.
      Cloning
    def a(b, c) b << c expire_cache end a('railsway', 'con') def a_railsway_con 'railsway' << 'con' expire_cache end
  • 124.
      Peephole Optimization (before)
    x = true # 0008 getlocal x if x # 0010 branchunless 17 else # 0012 jump 14 end # 0014 putnil 0015 jump 18 0017 putnil 0018 leave
  • 125.
      Peephole Optimization (after)
    x = true # 0008 getlocal x if x # 0010 branchunless 15 else # 0012 putnil end # 0013 leave 0014 pop 0015 putnil 0016 leave
  • 126.
      OBJECTS
  • 127.
      Object Requirements
    • Stateful
    • 128. Identity
      • Unique identifier to represent the object at runtime
    • Methods
      • Change or query object state
      • 129. Command and Query pattern
  • 130.
      Object structure
      typedef unsigned long VALUE;
    struct RBasic { VALUE flags; # object flags VALUE klass; # instance of ... }
  • 131.
      Object structure (cont)
    • Casting
      • Pointer type that represent addresses to language structures
      • 132. RBASIC(obj)->flags
      • 133. ((struct RBasic *)obj)->flags
    Flags
      • frozen
      • 134. marked
      • 135. tainted
      • 136. embedded status
  • 137.
      Classes / modules structure
      struct RClass {
    struct RBasic basic; # object structure rb_classext_t *ptr; # external class struct st_table *m_tbl; # method table struct st_table *iv_index_tbl; # ivars }
  • 138.
      Class / module structure (cont)
    • Casting
      • RCLASS(a_str)->ptr.super #=> Object
      • 139. RCLASS(a_fixnum)->ptr.super #=> Integer
    Attributes
      • Symbol tables for methods and ivars
      • 140. Class / module distinction through flags
  • 141.
      Special objects
    • Immediates
      • No runtime casting overheads – fits in VALUE
      • 142. nil #=> 4
      • 143. true #=> 2
      • 144. false #=> 0
      • 145. Symbols
      • 146. Fixnums <= 30 bits
      • 147. Floats and Bignum are complex objects – hence poor Floating Point benchmarks
      • 148. RFLOAT(float_obj)->float_value #=> a double
  • 149.
      Object memory layout
    • Object#object_id (32 bit architecture)
      • sizeof(VALUE) is 4 bytes
      • 150. Objects, even, multiples of 4
      • 151. Symbols, even, multiples of 8
      • 152. Integers, odd
      • 153. Immediates <= 4
  • 154.
      Mutable Objects
      struct RString {
    struct RBasic basic; union {struct {long len; char *ptr union { long capa; VALUE shared; }aux; }heap;
  • 155.
      Mutable Objects (cont)
    • String and Array
      • require the ability to shrink / grow capacity
      • 156. allocates slightly more data than required
      • 157. Avoids malloc, realloc and memmove overhead
      • 158. Short strings “str”
      • 159. Short arrays %w(a r y)
  • 160.
      Shared Objects
      str = 'railsway';
    str2 = “#{str}con” # shared ref str3 = str << 'con' # copy + mod ary = %w(railsway con) ary2 = ary.dup # shared ref ary3 = ary2.delete_at(1) # copy + mod
  • 161.
      Method Dispatch
    • Language constraints
      • Loose typing
      • 162. Open classes
      • 163. Method calls can never be reduced to CALL(a_method)
      • 164. Search overhead
    • Language constraints
    • 165. Dispatch sequence
    • 166. Deref class pointer
    • 167. Check methods table
    • 168. Call method or delegate to superclass
  • 169.  
  • 170.
      call VS send
    • obj.__send__ :method
      • We never call methods
      • 171. Send query / command messages to objects
      • 172. Methods return values – RPC style messaging
    • Method cache
      • Method cache == router
      • 173. 95% hit rate when warm
      • 174. Method redefinition, module inclusion etc. clears the method cache / “routing table”
      • 175. Introduces significant overhead for subsequent method calls
  • 176.
      Method cache don'ts
      class SomeController < AC::Base
    • def show
    • 177. # busts method cache for the whole VM
    • 178. @user.extend SomeBehavior
    • 179. end
    • 180. end
  • 181.  
  • 182.  
  • 183.  
  • 184.
      Instance var changes
    • Optimizations
      • First 3 ivars is embedded on the object
      • 185. Avoids symbol table lookups
    • ivar table
      • Table is per class, not per object
      • 186. Ivar table is shared by all instances of the same class
      • 187. Saves on memory footprint of a table per instance
  • 188.
      GARBAGE COLLECTION
  • 189.
      Process memory layout
    • Code segment
      • Executable code
      • 190. Read only
    • Stack segment
      • Stack storage
      • 191. Addressed with stack pointers
    • Heap Memory available for program / developer use
  • 192.
      Malloc
    • Usable / free space
      • Managed by a free list
      • 193. Linear search overhead to find free chunks
    • Better layout
      • Index free chunks by size intervals
  • 194.  
  • 195.  
  • 196.
      GC terminology
    • Root set
      • Directly accessible without pointer scanning
      • 197. C stack, global vars, global constants etc.
    • Unreachable hooks
      • Variable assignment to nil
      • 198. method return etc.
    • Conservative VM hands out raw pointers to objects
  • 199.
      GC strategies
    • Stop the World
      • Minimal allocation overhead
      • 200. Hands out objects while heap space is available
      • 201. Halts execution to reclaim memory
      • 202. Very disruptive in the hot path
    • Incremental
      • Collection activity during allocation
      • 203. Smoother, but with some minor overhead
      • 204. Suitable for hard realtime environments
  • 205.
      Scripting GC
    • Mark and Sweep
      • Identifies live objects
      • 206. Assumes remainder is for collection
      • 207. Concerned with unreachable objects
    • Stop and Copy
      • 2 heap spaces (double memory overhead)
      • 208. 1 active, 1 inactive
      • 209. Copies reachable chunks to the new active area
      • 210. Concerned with live objects
  • 211.
      Common GC Issues
    • Conservative GC
      • Memory fragmentation
      • 212. Dangling pointers
      • 213. Memory leaks from circular garbage
    • Allocation
      • Bursty allocation
      • 214. Knowledge of pointer layout and chunks required
  • 215.
      Ruby heap layout
    • Multiple heaps
      • Referenced through heap list
      • 216. Composed of multiple slots
      • 217. Freed when empty ...
      • 218. IF all slots is tagged as being free
      • 219. A Rails app allocates 4 to 6 heaps on startup
  • 220.  
  • 221.  
  • 222.  
  • 223.
      Slot layouts
    • Per heap
      • Each slot references a single object
      • 224. Defaults to 10 000 slots for the first heap
      • 225. Threshold of 4096 free slots per heap
      • 226. Free list points to the next free slot
    • Heap growth
      • Next allocated heap has 1.8 capacity of the last one
      • 227. That's why memory consumption's so high ...
  • 228.
      Heap growth – small app
      >> 8 * 1.8
    • => 14.4
    • 229. >> 8 * 1.8 * 1.8
    • 230. => 25.92
    • 231. >> 8 * 1.8 * 1.8 * 1.8
    • 232. => 46.656
    • 233. >> 8 * 1.8 * 1.8 * 1.8 * 1.8
    • 234. => 83.9808
  • 235.
      Heap growth – mid to large app
      => 83.9808
    • >> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
    • 236. => 151.16544
    • 237. >> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
    • 238. => 272.097792
    • 239. >> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
    • 240. => 489.7760256
  • 241.
      Slot structure
      typedef struct RVALUE {
    • union {
    • 242. struct {
    • 243. VALUE flags; /* 0 when free */
    • 244. struct RVALUE *next;
    • 245. }free;
    • 246. struct RObject object;
    • 247. struct RFloat float;
    • 248. ...
  • 249.
      Pointer layout
    • Self describing
      • Program data area and heap
      • 250. RVALUE union can accommodate any ruby object
      • 251. Frames, variable structures etc. well defined also
      • 252. 40 bytes (64 bit arch) represents a slot
      • 253. Free list points to the next free slot
  • 254.
      Ruby heap VS OS heap
    • Ruby heap
      • 20 bytes represents a slot
      • 255. slot points to OS data, on the OS / system heap
    • OS heap
      • Thus a 20 byte slot can reference a 2MB chunk on the system heap
  • 256.  
  • 257.
      CRuby: Mark and Sweep
    • Conservative
      • Cannot determine with certainty if a value references an object – assume it's in use
    • Two phase implementation
      • Mark phase: identifies and flags reachable objects from the current program context
      • 258. Sweep phase: iterates through the object space and …
      • 259. free all objects not marked
      • 260. unmark marked objects
  • 261.
      Concerns
    • Performance
      • Runtime pauses
      • 262. Work proportional to heap size
      • 263. Prone to memory fragmentation (no compaction)
      • 264. Recursive
    • Triggers
      • 8m malloc calls triggers GC
      • 265. Every 8MB allocated triggers GC
      • 266. Not enough heap reserve
  • 267.
      GC in action
      # 4 objs, 1 Array, 3 Strings
    • ary1 = %w(a b c)
    • 268. ary2 = %w(d e f)
    • 269. # both ary1 and ary2 is reachable
    • 270. ary1 = nil
    • 271. # ary1 and it's contents is unreachable
  • 272.  
  • 273.  
  • 274.  
  • 275.
      Generational GC
    • Observations
      • Vast majority of objects are short lived – 80%+
      • 276. Expensive to account for long lived objects
      • 277. Parition by age and frequently collect short lived ones
    • How it works
      • Restrict GC to the most recently modified slots
      • 278. These “sub heaps” are referred to as generations
      • 279. Perform a full GC only when the youngest generation
      • 280. fails to meet memory requirements
  • 281.
      CONCURRENCY
  • 282.
      Threading
    • Changes
      • Native OS Threads
      • 283. Ruby Thread == pthread
      • 284. Multiple cores ftw!
    • … but
      • Syscalls schedule, synchronize and create
      • 285. Much more expensive to spawn and switch than green threads
      • 286. Global VM Lock (GVL)
  • 287.
      Global VM Lock (GVL)
    • How it works
      • Thread that owns the GVL is allowed to execute
      • 288. Blocking operations should release the GVL
      • 289. Automatically released when scheduled
      • 290. C extensions : author does not concern with syncronization
  • 291.
      Blocking VM operations
    • I/O
      • blocking reads and writes
      • 292. DNS resolution or connects
      • 293. Often has huge handshake overheads
    • Computations, processes and locks
      • Expensive Bignum ops blocked 1.8 interpreters
      • 294. Process.waitpid
      • 295. File locks
  • 296.
      Releasing the GVL
    • Stable API
      • Blocking function: slow system call / computation
      • 297. Unblock function: called on Thread interrupt
    • Pitfalls
    • 298. Cannot access VALUEs (objects) in blocking functions
    • 299. No integration with Ruby's exception / error handler
  • 300.
      Lightweight Concurrency
    • Fibers
      • Coroutines – 4k stack size
      • 301. Very fast user space context switches
      • 302. Cooperative scheduling required
      • 303. Fiber.yield pauses the activation record, which keeps context across multiple calls
    • Use cases
      • Generators
      • 304. Blocking I/0 - Neverblock
  • 305.
      In the pipeline
    • MVM: Multiple Virtual Machines
      • Shared process state
      • 306. Sandboxed per VM application state
      • 307. Distribute VMs across available cores
      • 308. Message passing for inter VM communication
      • 309. Most Ruby deployments aren't thread safe
      • 310. MVM is well suited for this
  • 311.
      QUESTIONS ?