Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Symbol GC 
#rubykaigi 2014 
Narihiro Nakamura - @nari3
Self introduction
Self introduction 
✔ Nari, @nari3, authorNari 
✔ A CRuby committer. 
✔ I work at NaCl. 
✔ “Nakamura” 
✔ is the most powerf...
Author 
http://tatsu-zziinnee..ccoomm//bbooookkss//ggccbbooookk
An unmotivated rubyist.
Today's topic 
obj = Object.new 
100_000.times do |i| 
obj.respond_to?("sym#{i}".to_sym) 
end 
GC.start 
puts"symbol : #{S...
What is Symbol?
Symbol 
✔ A symbol is a primitive data type whose 
instances have a unique human-readable 
form. 
✔ Symbols can be used as...
:symbol
A pitfall of Symbol
A pitfall of Symbol 
✔ All symbols are not garbage collected. 
✔ Many beginners don't know this fact. 
✔ Make a mistake ev...
Simple cases 
✖ if user.respond_to(params[:method].to_sym) 
Is this method callable? 
NG: params[:method] is user input 
✖...
Rails DoS Vulnerability 
CVE-2012-3424 
HTTP Request: GET 
…. 
WWW-Authenticate: 
Digest 
digest = { 
to_sym :realm => “.....
We want Symbol GC 
✔ There is this request from long time ago. 
✔ Sasada-san has an idea. 
✔ I will implement this idea.
Are symbols in other 
programming 
languages garbage 
collectable?
Programming languages 
which supported for Symbol 
✔ Too Many Parentheses Languages 
✔ Erlang 
✔ Smalltalk 
✔ Scala
Symbol GC support 
Language Symbol GC 
Erlang ✖ 
Gauche ✖ 
Clojure ○ 
EmacsLisp ✖ 
VisualWorks(Smalltalk) ○ 
Scala ○
Implementation dependency? 
✔ Not unified. 
✔ Symbol GC is undocumented in 
programing language specifications. 
✔ Impleme...
EmacsLisp 
✔ Function: unintern 
✔ (unintern 'foo) 
✔ Declare an unnecessary symbol. 
✔ It's like manual memory management...
Scala 
Java main.scala 
01: val a = 'sym 
Symbol Table String 
“sym”
Scala 
Java main.scala 
01: val a = 'sym 
02: a = null 
String 
“sym” 
Symbol Table 
Weak Reference 
GCG SCT ASRTATRT
Details of CRuby's 
Symbol
“sym”.to_sym 
C Ruby 
global_symbols ““ssyymm”” 
“sym” 
String 
sym_id(hash) 
・ 
・・ 
last_id(long) 
1000 
“sym” freeze 
fr...
“sym”.to_sym 
C Ruby 
global_symbols ““ssyymm”” 
“sym” 
String 
sym_id(hash) 
1001 
・ 
・・ 
last_id(long) 
“sym” freeze 
fr...
ID 
✔ ID: Used by C Level. 
✔ Store ID to a method table or a variable table. 
✔ An unique number that corresponds to a sy...
SYMBOL(VALUE) 
✔ SYMBOL(VALUE): Used by Ruby Level. 
✔ An raw data of :sym or ”sym”.to_sym 
✔ Uncollectable.
Why can't collect 
garbage symbols.
For example, it stores ID to the static 
area of the C extension 
C Ruby 
global_symbols 
sym_id(hash) 
“foo” 
1001 
・ 
・・...
If :foo is collected, 
ID in sym_id will be deleted. 
C Ruby 
global_symbols 
sym_id(hash) 
“foo” 
1001 
・ 
・・ 
last_id(lo...
Then “foo”.to_sym is called. 
:foo == :foo but different ID 
C Ruby 
global_symbols 
sym_id(hash) 
“foo” 
1002 
・ 
・・ 
las...
Why can't collect 
garbage symbols 
✔ Problem: ID remaining in the C side. 
✔ We can't detect and manage all IDs in C exte...
In Ruby world 
RRIIPP.. AA ssyymmbbooll iiss ddeeaadd...... 
Photo by MIKI Yoshihito, https://www.flickr.com/pphhoottooss/...
In C world 
WWRRRRRRYYYYYYYYYY!!!!!! 
II''mm ssttiillll aalliivvee........!! 
IIDD 
Photo by Zufallsfaktor, https://www.fl...
How do you create 
Symbol GC?
Idea
Separates into two types of symbols 
Immortal 
Symbol 
Mortal 
Symbol 
CC WWoorrlldd RRuubbyy WWoorrlldd
Immortal Symbol 
✔ These symbols have the ID corresponding 
✔ e.g. method name, variable name, constant name, etc... 
✔ us...
def foo; end 
C Ruby 
global_symbols 
sym_id(hash) 
“foo” 
1001 
・ 
・・ 
last_id(long) 
1001 
Frozen String 
“foo”
Store an ID to the method table 
C Ruby 
global_symbols 
sym_id(hash) 
“foo” 
1001 
・ 
・・ 
last_id(long) 
1001 
Frozen Str...
ID2SYM(ID) → VALUE 
C Ruby 
global_symbols 
sym_id(hash) 
“foo” 
1001 
・ 
・・ 
last_id(long) 
1001 
“foo” 
ID: 1001 
ID2SYM...
Mortal Symbol 
✔ These symbols don't have ID 
✔ “sym”.to_sym → Mortal Symbol 
✔ use in Ruby-level mainly 
✔ Collectable 
✔...
“bar”.to_sym 
C Ruby 
global_symbols 
sym_id(hash) 
“foo” 
1001 
・ 
・・ 
last_id(long) 
1001 
“foo” 
ID: 1001 
ID2SYM(ID) 
...
Splits uncollectable or 
collectable objects 
C Ruby 
global_symbols 
sym_id(hash) 
“foo” 
1001 
・ 
・・ 
last_id(long) 
100...
:bar will be collected 
C Ruby 
global_symbols 
sym_id(hash) 
“foo” 
1001 
・ 
・・ 
last_id(long) 
1001 
“foo” 
ID: 1001 
ID...
If you already have 
Immortal Symbol of 
the same name
def foo; end 
C Ruby 
global_symbols 
sym_id(hash) 
“foo” 
1001 
・ 
・・ 
last_id(long) 
1001 
“foo” 
ID: 1001 
ID2SYM(ID) 
...
“foo”.to_sym 
C Ruby 
global_symbols 
sym_id(hash) 
“foo” 
1001 
・ 
・・ 
last_id(long) 
1001 
“foo” 
ID:1001 
ID2SYM(ID) 
I...
From Mortal Symbol 
to Immortal Symbol
define_method(“foo”.to_sym){} 
C Ruby 
Mortal 
Symbol 
(VALUE) 
:foo 
global_symbols 
sym_id(hash) 
“foo” 
:foo
define_method(“foo”.to_sym){} 
C Ruby 
Immortal 
Mortal 
Symbol 
(VALUE) 
:foo 
global_symbols 
sym_id(hash) 
Method table...
CAUTION
A new pitfall is 
coming!
Immortal Symbol 
✔ All symbols are garbage collected. 
✔ Immortal symbols are not garbage 
collected. 
✔ Mortal → Immortal...
A new pitfall 
✔ Immortal Symbol is increase 
unintentionally. 
✔ For instance: Get a name from a symbol 
✔ rb_id2str(SYM2...
Please keep to monitor 
✔ Check Symbol.all_symbols.size 
✔ Please report a bug to ruby-core or library author if 
increase...
Details of 
implementation 
(for CRuby Hackers)
Static Symbol, 
Dynamic Symbol 
✔ Static Symbol = Immediate value 
✔ Immortal 
✔ Dynamic Symbol = RVALUE 
✔ Mortal or Immo...
Details of RSymbol struct 
struct RSymbol { 
struct RBasic basic; 
VALUE fstr; 
ID type; 
}; 
Frozen String 
“foo” 
ID_LOC...
ID Structure 
0bxxx.....xxx 000 
High-order 61 bits = Counter Low-order 3 bits = ID type 
0bxxx.....xx 000 
1 
Low-order 1...
Fast recognize ID 
✔ Low-order 1bit = 1 → Static Symbol 
✔ Dynamic Symbol ID = RVALUE address 
✔ Low order 1 bit = 0 
✔ It...
Conclusion
Conclusion 
✔ Most symbols will be garbage collected. 
✔ But some symbols won't be garbage 
collected. 
✔ “sym”.to_sym → O...
Acknowledgments 
✔ Sasada-san 
✔ Teaches me an idea of Symbol GC. 
✔ Refines code of Symbol GC. 
✔ Nakada-san, Tsujimoto-s...
Thank you!
Upcoming SlideShare
Loading in …5
×

23

Share

Download to read offline

Symbol GC

Download to read offline

RubyKaigi 2014

Related Books

Free with a 30 day trial from Scribd

See all

Symbol GC

  1. 1. Symbol GC #rubykaigi 2014 Narihiro Nakamura - @nari3
  2. 2. Self introduction
  3. 3. Self introduction ✔ Nari, @nari3, authorNari ✔ A CRuby committer. ✔ I work at NaCl. ✔ “Nakamura” ✔ is the most powerful clan in Ruby World.
  4. 4. Author http://tatsu-zziinnee..ccoomm//bbooookkss//ggccbbooookk
  5. 5. An unmotivated rubyist.
  6. 6. Today's topic obj = Object.new 100_000.times do |i| obj.respond_to?("sym#{i}".to_sym) end GC.start puts"symbol : #{Symbol.all_symbols.size}" $ ruby-2.1.2 a.rb symbol : 102416 $ ruby-trunk symbol : 2833
  7. 7. What is Symbol?
  8. 8. Symbol ✔ A symbol is a primitive data type whose instances have a unique human-readable form. ✔ Symbols can be used as identifiers.
  9. 9. :symbol
  10. 10. A pitfall of Symbol
  11. 11. A pitfall of Symbol ✔ All symbols are not garbage collected. ✔ Many beginners don't know this fact. ✔ Make a mistake even good rubyists. ✔ Prone to vulnerability ✔ User input → symbol ✔ Compress the memory
  12. 12. Simple cases ✖ if user.respond_to(params[:method].to_sym) Is this method callable? NG: params[:method] is user input ✖ params[params[:attr].to_sym] Get a value of a hash via a symbol key. NG: params[:attr] is user input.
  13. 13. Rails DoS Vulnerability CVE-2012-3424 HTTP Request: GET …. WWW-Authenticate: Digest digest = { to_sym :realm => “..”, to_sym :nonce => “..”, realm="..", nonce="...", algorithm=MD5, qop="auth" Parse to a hash } , foo=”xxx”, .., :foo => “..”, to_sym . . .,
  14. 14. We want Symbol GC ✔ There is this request from long time ago. ✔ Sasada-san has an idea. ✔ I will implement this idea.
  15. 15. Are symbols in other programming languages garbage collectable?
  16. 16. Programming languages which supported for Symbol ✔ Too Many Parentheses Languages ✔ Erlang ✔ Smalltalk ✔ Scala
  17. 17. Symbol GC support Language Symbol GC Erlang ✖ Gauche ✖ Clojure ○ EmacsLisp ✖ VisualWorks(Smalltalk) ○ Scala ○
  18. 18. Implementation dependency? ✔ Not unified. ✔ Symbol GC is undocumented in programing language specifications. ✔ Implementation = Specification?
  19. 19. EmacsLisp ✔ Function: unintern ✔ (unintern 'foo) ✔ Declare an unnecessary symbol. ✔ It's like manual memory management.
  20. 20. Scala Java main.scala 01: val a = 'sym Symbol Table String “sym”
  21. 21. Scala Java main.scala 01: val a = 'sym 02: a = null String “sym” Symbol Table Weak Reference GCG SCT ASRTATRT
  22. 22. Details of CRuby's Symbol
  23. 23. “sym”.to_sym C Ruby global_symbols ““ssyymm”” “sym” String sym_id(hash) ・ ・・ last_id(long) 1000 “sym” freeze freeze String “sym” 1001 Frozen String
  24. 24. “sym”.to_sym C Ruby global_symbols ““ssyymm”” “sym” String sym_id(hash) 1001 ・ ・・ last_id(long) “sym” freeze freeze String “sym” 1001 ID: 1001 ID2SYM(ID) SYMBOL (VALUE) :sym Frozen String
  25. 25. ID ✔ ID: Used by C Level. ✔ Store ID to a method table or a variable table. ✔ An unique number that corresponds to a symbol. ✔ Created by rb_intern(“foo”) of C API. ✔ :sym == :sym → 1001 == 1001
  26. 26. SYMBOL(VALUE) ✔ SYMBOL(VALUE): Used by Ruby Level. ✔ An raw data of :sym or ”sym”.to_sym ✔ Uncollectable.
  27. 27. Why can't collect garbage symbols.
  28. 28. For example, it stores ID to the static area of the C extension C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 SYMBOL (VALUE) :foo Ruby's C extension static public ID id; SYM2ID(:foo) 1001
  29. 29. If :foo is collected, ID in sym_id will be deleted. C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 SYMBOL (VALUE) :foo Ruby's C extension static public ID id; 1001 GC START
  30. 30. Then “foo”.to_sym is called. :foo == :foo but different ID C Ruby global_symbols sym_id(hash) “foo” 1002 ・ ・・ last_id(long) 1001 SYMBOL (VALUE) :foo Ruby's C extension static public ID id; 1001 1002 Different SYM2ID(:foo) != id
  31. 31. Why can't collect garbage symbols ✔ Problem: ID remaining in the C side. ✔ We can't detect and manage all IDs in C extension. ✔ Same symbol but different ID ✔ It will create an inconsistent ID.
  32. 32. In Ruby world RRIIPP.. AA ssyymmbbooll iiss ddeeaadd...... Photo by MIKI Yoshihito, https://www.flickr.com/pphhoottooss//mmuujjiittrraa//77557711002222449900
  33. 33. In C world WWRRRRRRYYYYYYYYYY!!!!!! II''mm ssttiillll aalliivvee........!! IIDD Photo by Zufallsfaktor, https://www.flickr.com/photos/zzuuffaallllssffaakkttoorr//55991111333388995599
  34. 34. How do you create Symbol GC?
  35. 35. Idea
  36. 36. Separates into two types of symbols Immortal Symbol Mortal Symbol CC WWoorrlldd RRuubbyy WWoorrlldd
  37. 37. Immortal Symbol ✔ These symbols have the ID corresponding ✔ e.g. method name, variable name, constant name, etc... ✔ use in C-level mainly ✔ Uncollectable ✔ Symbol stay alive after numbering the ID once ✔ There is no transition to Mortal Symbol.
  38. 38. def foo; end C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 Frozen String “foo”
  39. 39. Store an ID to the method table C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 Frozen String “foo” Method table 1001 def foo; end
  40. 40. ID2SYM(ID) → VALUE C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Method table 1001 def foo; end
  41. 41. Mortal Symbol ✔ These symbols don't have ID ✔ “sym”.to_sym → Mortal Symbol ✔ use in Ruby-level mainly ✔ Collectable ✔ Unreachable symbols are collected. ✔ There is transition to Immortal Symbol.
  42. 42. “bar”.to_sym C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Mortal Symbol (VALUE) :bar Frozen String “bar” “bar” :bar
  43. 43. Splits uncollectable or collectable objects C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 Uncollectable ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Mortal Symbol (VALUE) :bar Collectable Frozen String “bar” “bar” :bar
  44. 44. :bar will be collected C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Mortal Symbol (VALUE) :bar Frozen String “bar” “bar” :bar
  45. 45. If you already have Immortal Symbol of the same name
  46. 46. def foo; end C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID: 1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String
  47. 47. “foo”.to_sym C Ruby global_symbols sym_id(hash) “foo” 1001 ・ ・・ last_id(long) 1001 “foo” ID:1001 ID2SYM(ID) Immortal Symbol (VALUE) :foo Frozen String Mortal Symbol (VALUE) :foo Check Use this one
  48. 48. From Mortal Symbol to Immortal Symbol
  49. 49. define_method(“foo”.to_sym){} C Ruby Mortal Symbol (VALUE) :foo global_symbols sym_id(hash) “foo” :foo
  50. 50. define_method(“foo”.to_sym){} C Ruby Immortal Mortal Symbol (VALUE) :foo global_symbols sym_id(hash) Method table 0x2c8d0 def foo; end SYM2ID(VALUE) 0x2c8d0 Pin down UUnnccoolllleeccttaabbllee Address = ID “foo” :foo
  51. 51. CAUTION
  52. 52. A new pitfall is coming!
  53. 53. Immortal Symbol ✔ All symbols are garbage collected. ✔ Immortal symbols are not garbage collected. ✔ Mortal → Immortal symbol when numbering an ID. ✔ This still lead to vulnerability!
  54. 54. A new pitfall ✔ Immortal Symbol is increase unintentionally. ✔ For instance: Get a name from a symbol ✔ rb_id2str(SYM2ID(sym)) ✔ Mortal → Immortal ✔ Please use rb_sym2str() ✔ Please attention to unconsidered SYM2ID().
  55. 55. Please keep to monitor ✔ Check Symbol.all_symbols.size ✔ Please report a bug to ruby-core or library author if increase number of symbols. ✔ It's a transition period now. ✔ It will get better gradually.
  56. 56. Details of implementation (for CRuby Hackers)
  57. 57. Static Symbol, Dynamic Symbol ✔ Static Symbol = Immediate value ✔ Immortal ✔ Dynamic Symbol = RVALUE ✔ Mortal or Immortal ✔ Change to immortal symbol when needs ID. ✔ Similar to Float and FLONUM
  58. 58. Details of RSymbol struct struct RSymbol { struct RBasic basic; VALUE fstr; ID type; }; Frozen String “foo” ID_LOCAL 0b00000 ID_INSTANCE 0b00010 ID_GLOBAL 0b00110 ID_ATTRSET 0b01000 ・・ ・
  59. 59. ID Structure 0bxxx.....xxx 000 High-order 61 bits = Counter Low-order 3 bits = ID type 0bxxx.....xx 000 1 Low-order 1 bit = Static Symbol Flag
  60. 60. Fast recognize ID ✔ Low-order 1bit = 1 → Static Symbol ✔ Dynamic Symbol ID = RVALUE address ✔ Low order 1 bit = 0 ✔ It's only check of the lower 1 bit.
  61. 61. Conclusion
  62. 62. Conclusion ✔ Most symbols will be garbage collected. ✔ But some symbols won't be garbage collected. ✔ “sym”.to_sym → OK ✔ define_method(“sym”.to_sym){} → NG
  63. 63. Acknowledgments ✔ Sasada-san ✔ Teaches me an idea of Symbol GC. ✔ Refines code of Symbol GC. ✔ Nakada-san, Tsujimoto-san, U.Nakamura-san, etc... ✔ Fixes many bugs. ✔ NaCl members
  64. 64. Thank you!
  • wellington1993

    Aug. 19, 2021
  • ssuserd71a16

    Jan. 14, 2016
  • blackanger

    Sep. 23, 2015
  • ken1flan

    Jan. 19, 2015
  • diskshima

    Jan. 11, 2015
  • kewin2010

    Jan. 7, 2015
  • yugmix

    Jan. 5, 2015
  • arrix

    Jan. 5, 2015
  • flyoscar

    Dec. 26, 2014
  • RevathKumar

    Dec. 25, 2014
  • yukitawara58

    Dec. 19, 2014
  • n0ts

    Sep. 28, 2014
  • itkrt2y

    Sep. 25, 2014
  • ssuserfd68c8

    Sep. 23, 2014
  • kakkun61

    Sep. 19, 2014
  • date

    Sep. 19, 2014
  • AijazAhmed10

    Sep. 19, 2014
  • nekogeruge_987

    Sep. 18, 2014
  • utrhira

    Sep. 18, 2014
  • objectx

    Sep. 18, 2014

RubyKaigi 2014

Views

Total views

11,837

On Slideshare

0

From embeds

0

Number of embeds

1,670

Actions

Downloads

30

Shares

0

Comments

0

Likes

23

×