How to check
valid email?
Not only in ruby
brought to you by
Piotr Wasiak
Find using regex(p?)
Regular Expression
is a character sequence, that define a search pattern
The purpose is:
● validate the string by the pattern
● get parts of the content (e.g. find or find_and_replace in text editors)
2
REGEX history
● Concept of language arose in the 1950s
● Different syntaxes (1980+):
○ POSIX (Basic - or Extended Regular Expressions)
○ Perl (influenced/imported to other languages as PCRE 1997, PCRE2 2015)
3
Basics
4
Find regex
In replace we can use
matched whole
phrase or groups.
Group number is
ordered by starting
bracket index and is
limited to 1 - 9
5
REGEX as a finite state machine
6
Statement validation: /(?<name>ADAM|PIOTR)s?[=><]{1,2}s*"(?:PIENIĄDZ|KUKU)"/g
Valid email (1/3)
Rails best known gem solution:
7
Valid email (2/3)
8
Email validation:
/(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"
(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|[x01-x09x0bx0c
x0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9]
(?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0
bx0cx0e-x1fx21-x5ax5d-x7f]|[x01-x09x0bx0cx0e-x7f])+)])/g
Valid email (3/3)
9
Email validation:
/(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"
(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|[x01-x09x0bx0c
x0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9]
(?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0
bx0cx0e-x1fx21-x5ax5d-x7f]|[x01-x09x0bx0cx0e-x7f])+)])/g
10
original_regexp =
%r{(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]|[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[[:alnum:]](
?:[a-z0-9-]*[[:alnum:]])?.)+[[:alnum:]](?:[a-z0-9-]*[[:alnum:]])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[[:alnum:]]:(
?:[x01-x08x0bx0cx0e-x1f!-Z]-x7f]|[x01-x09x0bx0cx0e-x7f])+)])}
# alphanumeric = /[[:alnum:]]/.source
alnum_with_hypen = /[a-z0-9-]/.source
ip_number_type = /25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/.source
username_without_backslash_prepended_set = /[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]/.source
domain_port_unescaped_set = /[x01-x08x0bx0cx0e-x1f!-Z]-x7f]/.source
domain_port_escaped_chars_set = /[x01-x09x0bx0cx0e-x7f]/.source
non_ending_chars = %r{[a-z0-9!#$%&'*+/=?^_`{|}~-]+}.source
final_with_variables =
/(?:#{non_ending_chars}(?:.#{non_ending_chars})*|"(?:#{username_without_backslash_prepended_set}|#{domain_port_escaped_chars_set})*")@(?:(?:[[:alnum:]](
?:#{alnum_with_hypen}*[[:alnum:]])?.)+[[:alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?|[(?:(?:#{ip_number_type}).){3}(?:#{ip_number_type}|#{alnum_with_hypen}*[[:
alnum:]]:(?:#{domain_port_unescaped_set}|#{domain_port_escaped_chars_set})+)])/
11
Simplify valid email
Ruby simply string methods are faster and more meaningful:
● .start_with? / .end_with?
● .include?(‘some substring’)
● .chomp
● .strip
● .lines
● .split(‘ ’) # without regexp
● .tr(‘from chars’, ‘1-9’)
12
Do not overuse regular expression (1/2)
Libraries and gems for common concepts:
● URI(url)
+ .host / .path / .query / .fragment
● File(path_to_file)
+ .dirname / .basename / .extname
● Nokogiri::HTML(
open('https://nokogiri.org/’)
)
13
Do not overuse regular expression (2/2)
Do not use REGEX as language parser
Programming languages depend more on language nodes/tree.
There will be always a problem with some exceptions, different coding
styles
In ruby we need to use Ripper or other tools to decompose ruby code
into pieces
Markup languages can be parsed by e.g. Nokogiri, Ox, Oj gems easier
and more secure
14
Clear Regex
● extract common parts in alternation
● put more likely to appear words on front in alternation
● use comments and whitespace
● name captured groups, use also non-captured
● split code to smaller logical pieces
● lint code with ruby -w for warnings
15
Tools / Websites
● https://www.debuggex.com/
visualized graphs with cheatset
● There is no working visualization plugin for Visual Studio Code
● https://regexr.com/
nice editor with colorized code, explanation on hover and cheatset
● https://rubular.com/ just to be sure to check if regex works in ruby
● https://ruby-doc.org/core-2.7.0/Regexp.html ruby docs
● https://www.regular-expressions.info/ruby.html
good resource about regular_expressions, how to train it, pitfalls
● https://support.google.com/analytics/answer/1034324?hl=pl
regex is useful in google analytics
16
Thanks for listening
What’s your question?
17

How to check valid email? Find using regex(p?)

  • 1.
    How to check validemail? Not only in ruby brought to you by Piotr Wasiak Find using regex(p?)
  • 2.
    Regular Expression is acharacter sequence, that define a search pattern The purpose is: ● validate the string by the pattern ● get parts of the content (e.g. find or find_and_replace in text editors) 2
  • 3.
    REGEX history ● Conceptof language arose in the 1950s ● Different syntaxes (1980+): ○ POSIX (Basic - or Extended Regular Expressions) ○ Perl (influenced/imported to other languages as PCRE 1997, PCRE2 2015) 3
  • 4.
  • 5.
    Find regex In replacewe can use matched whole phrase or groups. Group number is ordered by starting bracket index and is limited to 1 - 9 5
  • 6.
    REGEX as afinite state machine 6 Statement validation: /(?<name>ADAM|PIOTR)s?[=><]{1,2}s*"(?:PIENIĄDZ|KUKU)"/g
  • 7.
    Valid email (1/3) Railsbest known gem solution: 7
  • 8.
    Valid email (2/3) 8 Emailvalidation: /(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|" (?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|[x01-x09x0bx0c x0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9] (?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0 bx0cx0e-x1fx21-x5ax5d-x7f]|[x01-x09x0bx0cx0e-x7f])+)])/g
  • 9.
    Valid email (3/3) 9 Emailvalidation: /(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|" (?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|[x01-x09x0bx0c x0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9] (?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0 bx0cx0e-x1fx21-x5ax5d-x7f]|[x01-x09x0bx0cx0e-x7f])+)])/g
  • 10.
  • 11.
    original_regexp = %r{(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]|[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[[:alnum:]]( ?:[a-z0-9-]*[[:alnum:]])?.)+[[:alnum:]](?:[a-z0-9-]*[[:alnum:]])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[[:alnum:]]:( ?:[x01-x08x0bx0cx0e-x1f!-Z]-x7f]|[x01-x09x0bx0cx0e-x7f])+)])} # alphanumeric= /[[:alnum:]]/.source alnum_with_hypen = /[a-z0-9-]/.source ip_number_type = /25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/.source username_without_backslash_prepended_set = /[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]/.source domain_port_unescaped_set = /[x01-x08x0bx0cx0e-x1f!-Z]-x7f]/.source domain_port_escaped_chars_set = /[x01-x09x0bx0cx0e-x7f]/.source non_ending_chars = %r{[a-z0-9!#$%&'*+/=?^_`{|}~-]+}.source final_with_variables = /(?:#{non_ending_chars}(?:.#{non_ending_chars})*|"(?:#{username_without_backslash_prepended_set}|#{domain_port_escaped_chars_set})*")@(?:(?:[[:alnum:]]( ?:#{alnum_with_hypen}*[[:alnum:]])?.)+[[:alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?|[(?:(?:#{ip_number_type}).){3}(?:#{ip_number_type}|#{alnum_with_hypen}*[[: alnum:]]:(?:#{domain_port_unescaped_set}|#{domain_port_escaped_chars_set})+)])/ 11 Simplify valid email
  • 12.
    Ruby simply stringmethods are faster and more meaningful: ● .start_with? / .end_with? ● .include?(‘some substring’) ● .chomp ● .strip ● .lines ● .split(‘ ’) # without regexp ● .tr(‘from chars’, ‘1-9’) 12 Do not overuse regular expression (1/2)
  • 13.
    Libraries and gemsfor common concepts: ● URI(url) + .host / .path / .query / .fragment ● File(path_to_file) + .dirname / .basename / .extname ● Nokogiri::HTML( open('https://nokogiri.org/’) ) 13 Do not overuse regular expression (2/2)
  • 14.
    Do not useREGEX as language parser Programming languages depend more on language nodes/tree. There will be always a problem with some exceptions, different coding styles In ruby we need to use Ripper or other tools to decompose ruby code into pieces Markup languages can be parsed by e.g. Nokogiri, Ox, Oj gems easier and more secure 14
  • 15.
    Clear Regex ● extractcommon parts in alternation ● put more likely to appear words on front in alternation ● use comments and whitespace ● name captured groups, use also non-captured ● split code to smaller logical pieces ● lint code with ruby -w for warnings 15
  • 16.
    Tools / Websites ●https://www.debuggex.com/ visualized graphs with cheatset ● There is no working visualization plugin for Visual Studio Code ● https://regexr.com/ nice editor with colorized code, explanation on hover and cheatset ● https://rubular.com/ just to be sure to check if regex works in ruby ● https://ruby-doc.org/core-2.7.0/Regexp.html ruby docs ● https://www.regular-expressions.info/ruby.html good resource about regular_expressions, how to train it, pitfalls ● https://support.google.com/analytics/answer/1034324?hl=pl regex is useful in google analytics 16
  • 17.