RailswayCon 2010 - Dynamic Language VMs
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

RailswayCon 2010 - Dynamic Language VMs

  • 2,882 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,882
On Slideshare
2,839
From Embeds
43
Number of Embeds
1

Actions

Shares
Downloads
38
Comments
0
Likes
3

Embeds 43

http://www.slideshare.net 43

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1.
      Dynamic Language VMs
      Ruby 1.9
      Lourens Naude, WildfireApp.com
  • 2.
      Background
    • Independent Contractor
      • Ruby / C / integrations
      • 3. Well versed full stack
      • 4. Architecture
    • WildfireApp.com
      • Social Marketing platform
      • 5. Large whitelabel clients
      • 6. Bursty traffic – Lady Gaga, EA, Gatorade etc.
  • 7.  
  • 8.
      RUBY VM INTERNALS ?
  • 9.
      A GOOD CRAFTSMEN KNOWS HIS TOOLS
  • 10.
      A BAD CRAFTSMEN BLAMES HIS TOOLS
  • 11.
      Typical public facing apps
    • Interaction patterns
    • Overheads
      • Data transfer (I/0)
      • 14. Serialization / coercion (CPU)
      • 15. VM – allocation, symbol tables etc. (CPU + mem)
      • 16. Business requirements (CPU)
  • 17.
      Ruby daemon - strace
    Process 5856 detached % time calls syscall ------ ------- ------------- 89.69 5092 recvfrom 5.35 5093 sendto 2.49 26300 stat 2.05 11004 clock_gettime
  • 18.
      Ruby daemon - ltrace
    % time calls function ------ -------- -------- 95.78 635173 memcpy 1.38 25862 malloc 0.79 14984 free 0.60 11403 strcmp
  • 19.
      System Resources
    • Data latency
      • CPU cache
      • 20. Memory – local
      • 21. Disk - local
      • 22. Memory + disk - remote
    • Record retrieval with ORM
      • Fetch results (local/remote memory + disk)
      • 23. Serialization + conversion (CPU)
      • 24. Object instantiation (CPU + memory)
      • 25. Optional memcached (local or remote memory)
  • 26.
      RUBY ?
  • 27.
      Conversion – rows to hash
    Benchmark.bm do |b| b.report do 1000.times{ ActiveRecord::Base.connection.select_rows "SELECT * FROM users" } end end user system total real 0.300000 0.040000 0.340000 ( 0.505095)
  • 28.
      Conversion – rows to objects
    Benchmark.bm do |b| b.report do 1000.times{ ActiveRecord::Base.connection.select_all "SELECT * FROM users" } end end user system total real 0.510000 0.050000 0.560000 ( 0.719201)
  • 29.
      Instantiation
    Benchmark.bm do |b| b.report do 100_000.times{ 'string'.dup } end end user system total real 0.040000 0.000000 0.040000 ( 0.043791)
  • 30.
      Serialization – load + dump
    Benchmark.bm do |b| b.report do 100_000.times{ Marshal.load(Marshal.dump('ruby string')) } end end user system total real 1.660000 0.010000 1.670000 ( 1.699882)
  • 31.
      Roadmap
    • VM Architecture
    • Ruby language
      • Object model
      • 35. Garbage Collection
      • 36. Contexts and control flow
      • 37. Concurrency
  • 38.
      VM ARCHITECTURE
  • 39.  
  • 40.
      Changes
    • Ruby 1.8 artifacts
      • Parser && AST nodes
      • 41. Object model
      • 42. Garbage Collection
      • 43. No immediate performance gains for String manipulation etc.
    • Codegen phase
      • Better optimization hooks
      • 44. Faster runtime
  • 45.
      AST AND CODEGEN
  • 46.  
  • 47.
      Abstract Syntax Tree (AST)
    • Structure
      • Grammar representation
      • 48. Annotations attach semantics to nodes
      • 49. Possible to refactor the tree – more nodes, less complexity
    • Example nodes
      • Literals, values and assignments
      • 50. Method calls, arguments and return values
      • 51. Jumps – if, else, iterators
      • 52. Unconditional jumps – exceptions, retry etc.
  • 53.
      Code generation
    • How it works
      • Converts the AST to compiled code segments
      • 54. Reduces a tree to a linear and ordered instruction set
      • 55. Fast execution – no tree walking + native code
    • Workflow
      • Preprocessing – AST refactoring (!YARV)
      • 56. Codegen, nodes -> instruction sequences
      • 57. Postprocessing – replace with optimal instruction sequences (peephole optimization)
      • 58. Pre and postprocessing phases may be multiple passes
  • 59.
      LOOKUPS
  • 60.  
  • 61.
      Symbol / Hash tables
    • How it works
      • Constant time access to int/char indexed values
      • 62. Table defaults: 11 bins, 5 entries per bin
      • 63. Bins++, sequential lookup inside bins
      • 64. Lookup of methods, variables, encodings etc.
    • Symbol
      • Entity with both a String and Number representation
      • 65. !(String || Symbol), points to a table entry
      • 66. Developer identifies by name, VM by int
      • 67. Immutable for performance – watch out for memory
  • 68.
      VM INSTRUCTIONS
  • 69.
      VM instructions / opcodes
    • Stateless functions
      • 80+ currently
      • 70. Generated from definitions at interpreter compile time (existing ruby requirement for 1.9)
      • 71. Instruction / opcode / operands notation
    • Categories and examples
      • variable: get or set local variable
      • 72. class / module: definition
      • 73. method / iterator: invoke method, call block
      • 74. Optimization: redefines common +, <<, * contracts
  • 75.
      Managing opcode sequences
    • Stack Machine
      • 2 instruction types: push && pop
      • 76. Move / copy values, top of stack -> elsewhere
      • 77. SP: top of stack pointer, BP: bottom of stack pointer
    • Example
      • %w(a b c)
      • 78. Put strings “a”, “b” and “c” on the stack
      • 79. Fetch top 3 stack elements
      • 80. Create an array from them
  • 81.
      Instruction sequence
    • Opcode collection
      • Instruction dispatch can be a bottleneck
      • 82. Optimizing simple instructions is very important
      • 83. Likely a small subset of the typical web app's hot path
    • Dispatch techniques
      • Direct Threaded Dispatch : fastest jump to next opcode / instruction
      • 84. Switch Dispatch : slower, but portable
  • 85.
      DISPATCH AND CACHE
  • 86.
      Dispatch techniques
    • Direct Threaded Dispatch
      • Represents an instruction by the address of the routine that implements it
      • 87. Forth, Python 3
      • 88. Not portable: GCC first class labels
    • Switch Dispatch
      • CPU branch mispredictions, depending on pipeline length
      • 89. Up to 50% slower than Threaded dispatch
      • 90. Portable
  • 91.
      VM Caches
    • Versioning
      • State counter scopes caches to the current VM state
      • 92. Lazy invalidation – just bump the version
    • Expires on
      • constant definition
      • 93. constant removal
      • 94. method definition
      • 95. method removal
      • 96. method cache changes (covered later)
  • 97.
      OPTIMIZATIONS
  • 98.
      Optimization Limitations
    • Static Analysis
      • Examine source code without execution
      • 99. Dynamic analysis – runtime introspection
    Dynamic nature of Ruby
      • Literals are generally safe to consider for optimizations
      • 100. Constants can be redefined
      • 101. Open classes – variable method table
      • 102. Object#method_missing
      • 103. No explicit return types
  • 104.
      Common optimizations
  • 112.
      Constant folding
  • 119.
      Code elimination
    loop { # loop { begin # begin # eval'ed code # eval'ed code break # break break # ensure ensure # end end # } }
  • 120.
      Subexpression elimination
    x = x – (y * 2) z = z – (y * 2) t = y * 2 x = x – t z = z - t
  • 121.
      Constant propagation
    def a b = 20 c(3 * b) end def a # def a b = 20 # c(60) c(3 * 20) # end end
  • 122.
      In-lining
    def b 2 * 3 end def a # def a def a 2 + b # 2 + 2 * 3 2 + (2 * 3) end # end end
  • 123.
      Cloning
    def a(b, c) b << c expire_cache end a('railsway', 'con') def a_railsway_con 'railsway' << 'con' expire_cache end
  • 124.
      Peephole Optimization (before)
    x = true # 0008 getlocal x if x # 0010 branchunless 17 else # 0012 jump 14 end # 0014 putnil 0015 jump 18 0017 putnil 0018 leave
  • 125.
      Peephole Optimization (after)
    x = true # 0008 getlocal x if x # 0010 branchunless 15 else # 0012 putnil end # 0013 leave 0014 pop 0015 putnil 0016 leave
  • 126.
      OBJECTS
  • 127.
      Object Requirements
      • Unique identifier to represent the object at runtime
    • Methods
      • Change or query object state
      • 129. Command and Query pattern
  • 130.
      Object structure
      typedef unsigned long VALUE;
    struct RBasic { VALUE flags; # object flags VALUE klass; # instance of ... }
  • 131.
      Object structure (cont)
    • Casting
      • Pointer type that represent addresses to language structures
      • 132. RBASIC(obj)->flags
      • 133. ((struct RBasic *)obj)->flags
    Flags
  • 137.
      Classes / modules structure
      struct RClass {
    struct RBasic basic; # object structure rb_classext_t *ptr; # external class struct st_table *m_tbl; # method table struct st_table *iv_index_tbl; # ivars }
  • 138.
      Class / module structure (cont)
    • Casting
      • RCLASS(a_str)->ptr.super #=> Object
      • 139. RCLASS(a_fixnum)->ptr.super #=> Integer
    Attributes
      • Symbol tables for methods and ivars
      • 140. Class / module distinction through flags
  • 141.
      Special objects
    • Immediates
      • No runtime casting overheads – fits in VALUE
      • 142. nil #=> 4
      • 143. true #=> 2
      • 144. false #=> 0
      • 145. Symbols
      • 146. Fixnums <= 30 bits
      • 147. Floats and Bignum are complex objects – hence poor Floating Point benchmarks
      • 148. RFLOAT(float_obj)->float_value #=> a double
  • 149.
      Object memory layout
    • Object#object_id (32 bit architecture)
      • sizeof(VALUE) is 4 bytes
      • 150. Objects, even, multiples of 4
      • 151. Symbols, even, multiples of 8
      • 152. Integers, odd
      • 153. Immediates <= 4
  • 154.
      Mutable Objects
      struct RString {
    struct RBasic basic; union {struct {long len; char *ptr union { long capa; VALUE shared; }aux; }heap;
  • 155.
      Mutable Objects (cont)
    • String and Array
      • require the ability to shrink / grow capacity
      • 156. allocates slightly more data than required
      • 157. Avoids malloc, realloc and memmove overhead
      • 158. Short strings “str”
      • 159. Short arrays %w(a r y)
  • 160.
      Shared Objects
      str = 'railsway';
    str2 = “#{str}con” # shared ref str3 = str << 'con' # copy + mod ary = %w(railsway con) ary2 = ary.dup # shared ref ary3 = ary2.delete_at(1) # copy + mod
  • 161.
      Method Dispatch
    • Language constraints
      • Loose typing
      • 162. Open classes
      • 163. Method calls can never be reduced to CALL(a_method)
      • 164. Search overhead
    • Language constraints
    • 165. Dispatch sequence
    • 166. Deref class pointer
    • 167. Check methods table
    • 168. Call method or delegate to superclass
  • 169.  
  • 170.
      call VS send
    • obj.__send__ :method
      • We never call methods
      • 171. Send query / command messages to objects
      • 172. Methods return values – RPC style messaging
    • Method cache
      • Method cache == router
      • 173. 95% hit rate when warm
      • 174. Method redefinition, module inclusion etc. clears the method cache / “routing table”
      • 175. Introduces significant overhead for subsequent method calls
  • 176.
      Method cache don'ts
      class SomeController < AC::Base
    • def show
    • 177. # busts method cache for the whole VM
    • 178. @user.extend SomeBehavior
    • 179. end
    • 180. end
  • 181.  
  • 182.  
  • 183.  
  • 184.
      Instance var changes
    • Optimizations
      • First 3 ivars is embedded on the object
      • 185. Avoids symbol table lookups
    • ivar table
      • Table is per class, not per object
      • 186. Ivar table is shared by all instances of the same class
      • 187. Saves on memory footprint of a table per instance
  • 188.
      GARBAGE COLLECTION
  • 189.
      Process memory layout
    • Code segment
      • Executable code
      • 190. Read only
    • Stack segment
      • Stack storage
      • 191. Addressed with stack pointers
    • Heap Memory available for program / developer use
  • 192.
      Malloc
    • Usable / free space
      • Managed by a free list
      • 193. Linear search overhead to find free chunks
    • Better layout
      • Index free chunks by size intervals
  • 194.  
  • 195.  
  • 196.
      GC terminology
    • Root set
      • Directly accessible without pointer scanning
      • 197. C stack, global vars, global constants etc.
    • Unreachable hooks
      • Variable assignment to nil
      • 198. method return etc.
    • Conservative VM hands out raw pointers to objects
  • 199.
      GC strategies
    • Stop the World
      • Minimal allocation overhead
      • 200. Hands out objects while heap space is available
      • 201. Halts execution to reclaim memory
      • 202. Very disruptive in the hot path
    • Incremental
      • Collection activity during allocation
      • 203. Smoother, but with some minor overhead
      • 204. Suitable for hard realtime environments
  • 205.
      Scripting GC
    • Mark and Sweep
      • Identifies live objects
      • 206. Assumes remainder is for collection
      • 207. Concerned with unreachable objects
    • Stop and Copy
      • 2 heap spaces (double memory overhead)
      • 208. 1 active, 1 inactive
      • 209. Copies reachable chunks to the new active area
      • 210. Concerned with live objects
  • 211.
      Common GC Issues
    • Conservative GC
      • Memory fragmentation
      • 212. Dangling pointers
      • 213. Memory leaks from circular garbage
    • Allocation
      • Bursty allocation
      • 214. Knowledge of pointer layout and chunks required
  • 215.
      Ruby heap layout
    • Multiple heaps
      • Referenced through heap list
      • 216. Composed of multiple slots
      • 217. Freed when empty ...
      • 218. IF all slots is tagged as being free
      • 219. A Rails app allocates 4 to 6 heaps on startup
  • 220.  
  • 221.  
  • 222.  
  • 223.
      Slot layouts
    • Per heap
      • Each slot references a single object
      • 224. Defaults to 10 000 slots for the first heap
      • 225. Threshold of 4096 free slots per heap
      • 226. Free list points to the next free slot
    • Heap growth
      • Next allocated heap has 1.8 capacity of the last one
      • 227. That's why memory consumption's so high ...
  • 228.
      Heap growth – small app
  • 235.
      Heap growth – mid to large app
      => 83.9808
    • >> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
    • 236. => 151.16544
    • 237. >> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
    • 238. => 272.097792
    • 239. >> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8
    • 240. => 489.7760256
  • 241.
      Slot structure
  • 249.
      Pointer layout
    • Self describing
      • Program data area and heap
      • 250. RVALUE union can accommodate any ruby object
      • 251. Frames, variable structures etc. well defined also
      • 252. 40 bytes (64 bit arch) represents a slot
      • 253. Free list points to the next free slot
  • 254.
      Ruby heap VS OS heap
    • Ruby heap
      • 20 bytes represents a slot
      • 255. slot points to OS data, on the OS / system heap
    • OS heap
      • Thus a 20 byte slot can reference a 2MB chunk on the system heap
  • 256.  
  • 257.
      CRuby: Mark and Sweep
    • Conservative
      • Cannot determine with certainty if a value references an object – assume it's in use
    • Two phase implementation
      • Mark phase: identifies and flags reachable objects from the current program context
      • 258. Sweep phase: iterates through the object space and …
      • 259. free all objects not marked
      • 260. unmark marked objects
  • 261.
      Concerns
    • Performance
      • Runtime pauses
      • 262. Work proportional to heap size
      • 263. Prone to memory fragmentation (no compaction)
      • 264. Recursive
    • Triggers
      • 8m malloc calls triggers GC
      • 265. Every 8MB allocated triggers GC
      • 266. Not enough heap reserve
  • 267.
      GC in action
      # 4 objs, 1 Array, 3 Strings
    • ary1 = %w(a b c)
    • 268. ary2 = %w(d e f)
    • 269. # both ary1 and ary2 is reachable
    • 270. ary1 = nil
    • 271. # ary1 and it's contents is unreachable
  • 272.  
  • 273.  
  • 274.  
  • 275.
      Generational GC
    • Observations
      • Vast majority of objects are short lived – 80%+
      • 276. Expensive to account for long lived objects
      • 277. Parition by age and frequently collect short lived ones
    • How it works
      • Restrict GC to the most recently modified slots
      • 278. These “sub heaps” are referred to as generations
      • 279. Perform a full GC only when the youngest generation
      • 280. fails to meet memory requirements
  • 281.
      CONCURRENCY
  • 282.
      Threading
    • Changes
      • Native OS Threads
      • 283. Ruby Thread == pthread
      • 284. Multiple cores ftw!
    • … but
      • Syscalls schedule, synchronize and create
      • 285. Much more expensive to spawn and switch than green threads
      • 286. Global VM Lock (GVL)
  • 287.
      Global VM Lock (GVL)
    • How it works
      • Thread that owns the GVL is allowed to execute
      • 288. Blocking operations should release the GVL
      • 289. Automatically released when scheduled
      • 290. C extensions : author does not concern with syncronization
  • 291.
      Blocking VM operations
    • I/O
      • blocking reads and writes
      • 292. DNS resolution or connects
      • 293. Often has huge handshake overheads
    • Computations, processes and locks
      • Expensive Bignum ops blocked 1.8 interpreters
      • 294. Process.waitpid
      • 295. File locks
  • 296.
      Releasing the GVL
    • Stable API
      • Blocking function: slow system call / computation
      • 297. Unblock function: called on Thread interrupt
    • Pitfalls
    • 298. Cannot access VALUEs (objects) in blocking functions
    • 299. No integration with Ruby's exception / error handler
  • 300.
      Lightweight Concurrency
    • Fibers
      • Coroutines – 4k stack size
      • 301. Very fast user space context switches
      • 302. Cooperative scheduling required
      • 303. Fiber.yield pauses the activation record, which keeps context across multiple calls
    • Use cases
      • Generators
      • 304. Blocking I/0 - Neverblock
  • 305.
      In the pipeline
    • MVM: Multiple Virtual Machines
      • Shared process state
      • 306. Sandboxed per VM application state
      • 307. Distribute VMs across available cores
      • 308. Message passing for inter VM communication
      • 309. Most Ruby deployments aren't thread safe
      • 310. MVM is well suited for this
  • 311.
      QUESTIONS ?