Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Symbol GC

10,126 views

Published on

RubyKaigi 2014

Published in: Technology
  • Be the first to comment

Symbol GC

  1. 1. Symbol GC #rubykaigi 2014 Narihiro Nakamura - @nari3
  2. 2. Self introduction
  3. 3. Self introduction ✔ Nari, @nari3, authorNari ✔ A CRuby committer. ✔ I work at NaCl. ✔ “Nakamura” ✔ is the most powerful clan in Ruby World.
  4. 4. Author http://tatsu-zziinnee..ccoomm//bbooookkss//ggccbbooookk
  5. 5. An unmotivated rubyist.
  6. 6. Today's topic obj = Object.new 100_000.times do |i| obj.respond_to?("sym#{i}".to_sym) end GC.start puts"symbol : #{Symbol.all_symbols.size}" $ ruby-2.1.2 a.rb symbol : 102416 $ ruby-trunk symbol : 2833
  7. 7. What is Symbol?
  8. 8. Symbol ✔ A symbol is a primitive data type whose instances have a unique human-readable form. ✔ Symbols can be used as identifiers.
  9. 9. :symbol
  10. 10. A pitfall of Symbol
  11. 11. A pitfall of Symbol ✔ All symbols are not garbage collected. ✔ Many beginners don't know this fact. ✔ Make a mistake even good rubyists. ✔ Prone to vulnerability ✔ User input → symbol ✔ Compress the memory
  12. 12. Simple cases ✖ if user.respond_to(params[:method].to_sym) Is this method callable? NG: params[:method] is user input ✖ params[params[:attr].to_sym] Get a value of a hash via a symbol key. NG: params[:attr] is user input.
  13. 13. Rails DoS Vulnerability CVE-2012-3424 HTTP Request: GET …. WWW-Authenticate: Digest digest = { to_sym :realm => “..”, to_sym :nonce => “..”, realm="..", nonce="...", algorithm=MD5, qop="auth" Parse to a hash } , foo=”xxx”, .., :foo => “..”, to_sym . . .,
  14. 14. We want Symbol GC ✔ There is this request from long time ago. ✔ Sasada-san has an idea. ✔ I will implement this idea.
  15. 15. Are symbols in other programming languages garbage collectable?
  16. 16. Programming languages which supported for Symbol ✔ Too Many Parentheses Languages ✔ Erlang ✔ Smalltalk ✔ Scala
  17. 17. Symbol GC support Language Symbol GC Erlang ✖ Gauche ✖ Clojure ○ EmacsLisp ✖ VisualWorks(Smalltalk) ○ Scala ○
  18. 18. Implementation dependency? ✔ Not unified. ✔ Symbol GC is undocumented in programing language specifications. ✔ Implementation = Specification?
  19. 19. EmacsLisp ✔ Function: unintern ✔ (unintern 'foo) ✔ Declare an unnecessary symbol. ✔ It's like manual memory management.
  20. 20. Scala Java main.scala 01: val a = 'sym Symbol Table String “sym”
  21. 21. Scala Java main.scala 01: val a = 'sym 02: a = null String “sym” Symbol Table Weak Reference GCG SCT ASRTATRT
  22. 22. Details of CRuby's Symbol
  23. 23. “sym”.to_sym C Ruby global_symbols ““ssyymm”” “sym” String sym_id(hash) ・ ・・ last_id(long) 1000 “sym” freeze freeze String “sym” 1001 Frozen String
  24. 24. “sym”.to_sym C Ruby global_symbols ““ssyymm”” “sym” String sym_id(hash) 1001 ・ ・・ last_id(long) “sym” freeze freeze String “sym” 1001 ID: 1001 ID2SYM(ID) SYMBOL (VALUE) :sym Frozen String
  25. 25. ID ✔ ID: Used by C Level. ✔ Store ID to a method table or a variable table. ✔ An unique number that corresponds to a symbol. ✔ Created by rb_intern(“foo”) of C API. ✔ :sym == :sym → 1001 == 1001
  26. 26. SYMBOL(VALUE) ✔ SYMBOL(VALUE): Used by Ruby Level. ✔ An raw data of :sym or ”sym”.to_sym ✔ Uncollectable.
  27. 27. Why can't collect garbage symbols.
  28. 28. For example, it stores ID to the static area of the C extension C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 SYMBOL (VALUE) :foo Ruby's C extension static public ID id; SYM2ID(:foo) 1001
  29. 29. If :foo is collected, ID in sym_id will be deleted. C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 SYMBOL (VALUE) :foo Ruby's C extension static public ID id; 1001 GC START
  30. 30. Then “foo”.to_sym is called. :foo == :foo but different ID C Ruby global_symbols sym_id(hash) “foo” 1002 ・ ・・ last_id(long) 1001 SYMBOL (VALUE) :foo Ruby's C extension static public ID id; 1001 1002 Different SYM2ID(:foo) != id
  31. 31. Why can't collect garbage symbols ✔ Problem: ID remaining in the C side. ✔ We can't detect and manage all IDs in C extension. ✔ Same symbol but different ID ✔ It will create an inconsistent ID.
  32. 32. In Ruby world RRIIPP.. AA ssyymmbbooll iiss ddeeaadd...... Photo by MIKI Yoshihito, https://www.flickr.com/pphhoottooss//mmuujjiittrraa//77557711002222449900
  33. 33. In C world WWRRRRRRYYYYYYYYYY!!!!!! II''mm ssttiillll aalliivvee........!! IIDD Photo by Zufallsfaktor, https://www.flickr.com/photos/zzuuffaallllssffaakkttoorr//55991111333388995599
  34. 34. How do you create Symbol GC?
  35. 35. Idea
  36. 36. Separates into two types of symbols Immortal Symbol Mortal Symbol CC WWoorrlldd RRuubbyy WWoorrlldd
  37. 37. Immortal Symbol ✔ These symbols have the ID corresponding ✔ e.g. method name, variable name, constant name, etc... ✔ use in C-level mainly ✔ Uncollectable ✔ Symbol stay alive after numbering the ID once ✔ There is no transition to Mortal Symbol.
  38. 38. def foo; end C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 Frozen String “foo”
  39. 39. Store an ID to the method table C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 Frozen String “foo” Method table 1001 def foo; end
  40. 40. ID2SYM(ID) → VALUE C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Method table 1001 def foo; end
  41. 41. Mortal Symbol ✔ These symbols don't have ID ✔ “sym”.to_sym → Mortal Symbol ✔ use in Ruby-level mainly ✔ Collectable ✔ Unreachable symbols are collected. ✔ There is transition to Immortal Symbol.
  42. 42. “bar”.to_sym C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Mortal Symbol (VALUE) :bar Frozen String “bar” “bar” :bar
  43. 43. Splits uncollectable or collectable objects C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 Uncollectable ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Mortal Symbol (VALUE) :bar Collectable Frozen String “bar” “bar” :bar
  44. 44. :bar will be collected C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Mortal Symbol (VALUE) :bar Frozen String “bar” “bar” :bar
  45. 45. If you already have Immortal Symbol of the same name
  46. 46. def foo; end C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String
  47. 47. “foo”.to_sym C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID:1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Mortal Symbol (VALUE) :foo Check Use this one
  48. 48. From Mortal Symbol to Immortal Symbol
  49. 49. define_method(“foo”.to_sym){} C Ruby Mortal Symbol (VALUE) :foo global_symbols sym_id(hash) “foo” :foo
  50. 50. define_method(“foo”.to_sym){} C Ruby Immortal Mortal Symbol (VALUE) :foo global_symbols sym_id(hash) Method table 0x2c8d0 def foo; end SYM2ID(VALUE) 0x2c8d0 Pin down UUnnccoolllleeccttaabbllee Address = ID “foo” :foo
  51. 51. CAUTION
  52. 52. A new pitfall is coming!
  53. 53. Immortal Symbol ✔ All symbols are garbage collected. ✔ Immortal symbols are not garbage collected. ✔ Mortal → Immortal symbol when numbering an ID. ✔ This still lead to vulnerability!
  54. 54. A new pitfall ✔ Immortal Symbol is increase unintentionally. ✔ For instance: Get a name from a symbol ✔ rb_id2str(SYM2ID(sym)) ✔ Mortal → Immortal ✔ Please use rb_sym2str() ✔ Please attention to unconsidered SYM2ID().
  55. 55. Please keep to monitor ✔ Check Symbol.all_symbols.size ✔ Please report a bug to ruby-core or library author if increase number of symbols. ✔ It's a transition period now. ✔ It will get better gradually.
  56. 56. Details of implementation (for CRuby Hackers)
  57. 57. Static Symbol, Dynamic Symbol ✔ Static Symbol = Immediate value ✔ Immortal ✔ Dynamic Symbol = RVALUE ✔ Mortal or Immortal ✔ Change to immortal symbol when needs ID. ✔ Similar to Float and FLONUM
  58. 58. Details of RSymbol struct struct RSymbol { struct RBasic basic; VALUE fstr; ID type; }; Frozen String “foo” ID_LOCAL 0b00000 ID_INSTANCE 0b00010 ID_GLOBAL 0b00110 ID_ATTRSET 0b01000 ・・ ・
  59. 59. ID Structure 0bxxx.....xxx 000 High-order 61 bits = Counter Low-order 3 bits = ID type 0bxxx.....xx 000 1 Low-order 1 bit = Static Symbol Flag
  60. 60. Fast recognize ID ✔ Low-order 1bit = 1 → Static Symbol ✔ Dynamic Symbol ID = RVALUE address ✔ Low order 1 bit = 0 ✔ It's only check of the lower 1 bit.
  61. 61. Conclusion
  62. 62. Conclusion ✔ Most symbols will be garbage collected. ✔ But some symbols won't be garbage collected. ✔ “sym”.to_sym → OK ✔ define_method(“sym”.to_sym){} → NG
  63. 63. Acknowledgments ✔ Sasada-san ✔ Teaches me an idea of Symbol GC. ✔ Refines code of Symbol GC. ✔ Nakada-san, Tsujimoto-san, U.Nakamura-san, etc... ✔ Fixes many bugs. ✔ NaCl members
  64. 64. Thank you!

×