Ruby 2.0.0 での正規表現の新機能

1,706 views

Published on

2.0.0 は正規表現エンジンが Onigmo になって機能が増えていますが、あまり情報がなかったので、調べてわかった範囲の内容を発表しました。

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,706
On SlideShare
0
From Embeds
0
Number of Embeds
61
Actions
Shares
0
Downloads
6
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Ruby 2.0.0 での正規表現の新機能

  1. 1. Regexp.new(2.0.0)Ruby 2.0.0 での正規表現の新機能西山和広日本Rubyの会Powered by Rabbit 2.0.8
  2. 2. OnigmoOnigmo (Oniguruma-mod)NEWS of Ruby 2.0.0 says following only:Merge Onigmohttps://github.com/k-takata/OnigmoDetails are unknown詳細不明1/12
  3. 3. New feature (1) Kexamples without /K/"foobar".sub(/(?<=foo)bar/, "") #=> "foo""foobar".sub(/(?<=fo*)bar/, "")# SyntaxError: invalid pattern in look-behind: /(?<=fo*)bar/examples with /K/"foobar".sub(/fooKbar/, "") #=> "foo""foobar".sub(/fo*Kbar/, "") #=> "foo"2/12
  4. 4. New feature (1) KTreat the first non-blank character of the line.examples with /K/gsub(/^ *K(d+)/) { $1.to_i+1 }examples without /K/gsub(/^( *)(d+)/) { "#{$1}#{$2.to_i+1}" }3/12
  5. 5. New feature (2) RLinebreak改行文字Unicode:(?>x0Dx0A|[x0A-x0Dx{85}x{2028}x{2029}])Not Unicode:(?>x0Dx0A|[x0A-x0D])4/12
  6. 6. New feature (3) XeXtended grapheme cluster拡張書記素クラスタUnicode:(?>P{M}p{M}*)Not Unicode:(?m:.)5/12
  7. 7. Extended grapheme clusterexample:"u{304B 3099}"[/X/].size #=> 2U+304B HIRAGANA LETTER KAU+3099 COMBINING KATAKANA-HIRAGANAVOICED SOUND MARKsee [UAX #29] for more detail(Unicode標準附属書29)6/12
  8. 8. New feature (4)conditional expression:(?(cond)yes)(?(cond)yes|no)example:" :f o o "[/:(["])?(?(1)[ws]+1|w+)/]#=> ":f"":f o o"[/:(["])?(?(1)[ws]+1|w+)/]#=> ":f o o"7/12
  9. 9. (?adu)character set option (character rangeoption)文字集合オプション (文字範囲オプション)d: Default (compatible with Ruby 1.9.3)a: ASCIIu: Unicodesee doc/RE in Onigmo for more detail8/12
  10. 10. (?adu)examples:"u{3042}"[/w/] #=> nil"u{3042}"[/(?a)w/] #=> nil"u{3042}"[/(?d)w/] #=> nil"u{3042}"[/(?u)w/] #=> "あ"/ab/ =~ "au{3042}" #=> nil/(?a)ab/ =~ "au{3042}" #=> 0/(?d)ab/ =~ "au{3042}" #=> nil/(?u)ab/ =~ "au{3042}" #=> nil9/12
  11. 11. (?adu)(?-a), (?-d), (?-u) do not foundunlike (?-i), (?-m), (?-x)10/12
  12. 12. Character Propertysupport for Unicode blocksexample:/p{InHiragana}/ =~ "u3042" #=> 0/p{InCJKUnifiedIdeographs}/ =~ "u3042" #=> nilsee tool/enc-unicode.rb in Onigmo formore detail11/12
  13. 13. /z/

×