SlideShare a Scribd company logo
How to check
valid email?
Not only in Ruby
brought to DRUG by
Piotr Wasiak 20.02.2023
Find using RegEx(p?)
Agenda
2
1. RegEx overview
2. Recommendations
3. Ruby quirks / amenities
4. Tools / Resources
5. Advanced RE(2)
6. Ruby 3.2 RE changes
Who am I?
Piotr Wasiak
Ruby, Rails developer
Current PRUG organiser
3
Interests:
● climbing, hiking, squash
● contract bridge, chess
● ruby, programming, crypto
Regular Expression
is a character sequence, that defines a search pattern
The purpose is:
● validate the string by the pattern
● get parts of the content (e.g. find or find_and_replace in text editors)
4
RegEx history
● Concept of language arose in the 1950s
● Different syntaxes (1980+):
○ POSIX (Basic - or Extended Regular Expressions)
○ Perl (influenced/imported to other languages as PCRE 1997, PCRE2 2015)
5
RegEx as a state machine
6
Statement validation: /(?<name>ADAM|PIOTR)s?[=><]{1,2}s*"(?:PIENIĄDZ|KUKU)"/g
Basics
7
Find RegEx
In replace we can use
matched whole
phrase or groups.
Group number is
ordered by starting
bracket index and is
limited to 1 - 9
8
Valid email (1/3)
Rails popular gem solution:
9
Valid email (2/3)
10
Email validation:
/(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"
(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|[x01-x09x0bx0c
x0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9]
(?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0
bx0cx0e-x1fx21-x5ax5d-x7f]|[x01-x09x0bx0cx0e-x7f])+)])/g
Valid email (3/3)
11
Email validation:
/(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"
(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|[x01-x09x0bx0c
x0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9]
(?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0
bx0cx0e-x1fx21-x5ax5d-x7f]|[x01-x09x0bx0cx0e-x7f])+)])/g
12
2. Recommendations
original_regexp =
%r{(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]|[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[[:alnum:]](?:[a-z0-9
-]*[[:alnum:]])?.)+[[:alnum:]](?:[a-z0-9-]*[[:alnum:]])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[[:alnum:]]:(?:[x01-x08x0bx
0cx0e-x1f!-Z]-x7f]|[x01-x09x0bx0cx0e-x7f])+)])}
alnum_with_hypen = /[a-z0-9-]/.source # posix alternative /[-[:alnum:]]/
ip_number_type = /25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/.source
common_parts = /[x01-x08x0bx0cx0e-x1f]-x7f]/.source
username_without_backslash_prepended_set = /[#{common_parts}!#-x5b]/.source
domain_port_unescaped_set = /[#{common_parts}!-Z]/.source
domain_port_escaped_chars_set = /[#{common_parts}x0e-x7f]/.source
non_ending_chars = %r{[a-z0-9!#$%&'*+/=?^_`{|}~-]+}.source
final_with_variables =
/(?:#{non_ending_chars}(?:.#{non_ending_chars})*|"(?:#{username_without_backslash
_prepended_set}|#{domain_port_escaped_chars_set})*")@(?:(?:[[:alnum:]](?:#{alnum
_with_hypen}*[[:alnum:]])?.)+[[:alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?|[(?
:(?:#{ip_number_type}).){3}(?:#{ip_number_type}|#{alnum_with_hypen}*[[:alnum:]]:(
?:#{domain_port_unescaped_set}|#{domain_port_escaped_chars_set})+)])/
13
Simplify valid email
original_regexp =
%r{(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]|[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[[:alnum:]](?:[a-z0-9
-]*[[:alnum:]])?.)+[[:alnum:]](?:[a-z0-9-]*[[:alnum:]])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[[:alnum:]]:(?:[x01-x08x0bx
0cx0e-x1f!-Z]-x7f]|[x01-x09x0bx0cx0e-x7f])+)])}
alnum_with_hypen = /[a-z0-9-]/.source # posix alternative /[-[:alnum:]]/
ip_number_type = /25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/.source
ascii_wo_tabs_cr_nl = /[[:ascii:]&&[^x09-x0ax0d]]/.source
domain_port_escaped_chars_set = /[#{ascii_wo_tabs_cr_nl}x09x20"]/.source
domain_port_unescaped_set = /[#{ascii_wo_tabs_cr_nl}&&[^x20]]/.source
username = /[#{domain_port_unescaped_set}&&[^"]]/.source
non_ending_chars = %r{[a-z0-9!#$%&'*+/=?^_`{|}~-]+}.source
final_with_variables =
/(?:#{non_ending_chars}(?:.#{non_ending_chars})*|"(?:#{username}|#{domain_port_
escaped_chars_set})*")@(?:(?:[[:alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?.)+[[
:alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?|[(?:(?:#{ip_number_type}).){3}(?:#
{ip_number_type}|#{alnum_with_hypen}*[[:alnum:]]:(?:#{domain_port_unescaped_set}|
#{domain_port_escaped_chars_set})+)])/
14
Simplify valid email (more ruby version)
original_regexp = %r{ # there is no heredoc for regexp
(?: # strings with some special chars, but not ending with .
[a-z0-9!#$%&'*+/=?^_`{|}~-]+
(?:
.[a-z0-9!#$%&'*+/=?^_`{|}~-]+
)*
|
"
(?: # special chars enquoted
[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]
|
 # prepended with backslash, here escaped
[x01-x09x0bx0cx0e-x7f] # more special chars
)*
" # closing quote
)
@ # the most crucial ampersand
(?: # domain regexp
(?: # at least one subdomain joined and finished with .
[[:alnum:]]
(?:
[a-z0-9-]* # subdomain can have many alphanumeric or - inside
[[:alnum:]] # subdomain have to finish with alphanumeric char
)?
. # dot separator
)+
[[:alnum:]] # domain have to start with alphanumeric char
(?:
[a-z0-9-]* # domain can have many alphanumeric or - inside
[[:alnum:]] # domain have to finish with alphanumeric char
)? 15
/x comments mode
| # or direct ip implementation or 3 numbers
with . suffix and some special usecases
[ # enquoted with square brackets
(?:
(?: # numbers are quite complex in RegEx
25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]? #
0-255
). # . suffix
){3} # 3 times
(?:
25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]? # 0-255
| # or 3 numbers with . suffix and some
special usecases
[a-z0-9-]* # alnums also starting with -
[[:alnum:]] # finishing without -
:
(?:
[x01-x08x0bx0cx0e-x1f!-Z]-x7f] #
many chars
|
 # more ansii chars prefixed with
backslash
[x01-x09x0bx0cx0e-x7f]
)+
)
] # closing square bracket
)
}x # switch to treat spaces/new lines and `# `
suffix as comments
Ruby simply string methods are faster and more meaningful:
● .start_with? / .end_with?
● .include?(‘some substring’)
● .chomp
● .strip
● .lines
● .split(‘ ’) # without regexp
● .tr(‘ !?‘, ‘1-9’)
16
Do not overuse regular expression (1/2)
Libraries and gems for common concepts:
● URI(url)
+ .host / .path / .query / .fragment
● File(path_to_file)
+ .dirname / .basename / .extname
● Nokogiri::HTML(
open('https://nokogiri.org/’)
)
17
Do not overuse regular expression (2/2)
Do not use REGEX as language parser
Programming languages depend more on language nodes/tree.
There will be always a problem with some exceptions, different coding
styles
In Ruby we need to use Ripper or other tools to decompose Ruby code
into pieces
Markup languages can be parsed by e.g. Nokogiri, Ox, Oj gems easier
and more secure
18
Clear RegEx
● extract common parts in alternation
● put more likely to appear words in the front of alternation
● use comments and whitespace with /x modifier
● give a name for captured groups, use also non-captured
● split code to smaller logical pieces
● lint code with ruby -w for warnings
19
3. Ruby quirks / flavor
20
mix ? Interpolation of RegEx
MULTILINE
IGNORECASE
EXTENDED
21
Joke
Scrabble: what is a longest word from combined RE switch letters?
22
I M N O X
Joke
Scrabble: what is a longest word from combined RE switch letters?
23
I M N O X
- in general "dot matches at line breaks mode" is turn on with s flag
instead of ruby m flag
- In Ruby, ^ and $ always match on every line.
If you want to specify the beginning of the string, use A.
For the very end of the string, use z (or Z including final line break).
Quirks in Ruby RegEx engine (1/3)
24
Quirks in Ruby RegEx engine (2/3)
Ruby does not allow
● look-ahead
● negative look-behind
inside a look-behind, such as:
25
- Intersection […&&[…]]
- Subtraction […&&[^…]]
26
Quirks in Ruby RegEx engine (3/3)
Character classes operators
Ruby amenities (1/3)
27
Ruby amenities (2/3)
28
Ruby amenities (3/3)
29
4. Tools / Resources
30
Tools / Websites
● regex101.com/
nicest editor, explanation on hover, cheatset, performance analysis
● www.debuggex.com/ visualized graphs with cheat-set
● Visualization plugins for Visual Studio Code
● rubocop and rubocop-performance have some rules for regex
● rubular.com/ check if RegEx works in Ruby 2.5. Other with 2.1
● rubyapi.org/3.1/o/regexp good Ruby docs
31
32
5. Advanced RE(2)
33
Backtracking
problem
34
/d-d+$/g
Catastrophic backtracking case /a?n
an
=~ an
/
35
“Most modern engines are regex-directed because this is the only way to
implement useful features such as lazy quantifiers and backreferences;
and atomic grouping and possessive quantifiers that give extra control
to backtracking.”
PCRE like solutions
36
37
38
Back to Finite Automaton - (D/N) FA
39
/abb*a/
RegEx to Deterministic Finite Automaton
What RegEx is it?
40
RegEx to Deterministic Finite Automaton
/(100?)*1/ matches: [ 1010101, 1, 10101, 1001001]
41
RegEx to Deterministic Finite Automaton
/(100?)*1/
42
RegEx to Deterministic Finite Automaton
/(100?)*1/
43
RE2
PCRE2
44
6. Ruby 3.2 RE changes
45
Regexp improvements against ReDoS
It is known that Regexp matching may take unexpectedly long.
If your code attempts to match a possibly inefficient Regexp against an
untrusted input, an attacker may exploit it for efficient Denial of Service
ReDoS improvements (1/2)
46
ReDoS improvements (2/2)
47
Improved Regexp matching algorithm using a memoization technique
Sources
48
● devopedia.org/regex-engines
● patshaughnessy.net/2012/4/3/ (...) rubys-regular-expression-algorithm
● github.com/google/re2/wiki/Syntax
● optimized re2 called hyperscan
● wiki/Determinizacja_automatu_skonczonego
● regular-expressions.info/refrepeat.html
● rexegg.com/regex-optimizations.html
● bugs.ruby-lang.org/issues/19104 selective memiozation
Thanks for listening
What’s your question?
49

More Related Content

Similar to How to check valid Email? Find using regex.

Go. Why it goes
Go. Why it goesGo. Why it goes
Go. Why it goes
Sergey Pichkurov
 
Ruby on Rails Presentation
Ruby on Rails PresentationRuby on Rails Presentation
Ruby on Rails Presentation
Michael MacDonald
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Data Con LA
 
Stop overusing regular expressions!
Stop overusing regular expressions!Stop overusing regular expressions!
Stop overusing regular expressions!
Franklin Chen
 
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Alexandre Morgaut
 
Groovy Update - JavaPolis 2007
Groovy Update - JavaPolis 2007Groovy Update - JavaPolis 2007
Groovy Update - JavaPolis 2007
Guillaume Laforge
 
Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009
spierre
 
Hacking Go Compiler Internals / GoCon 2014 Autumn
Hacking Go Compiler Internals / GoCon 2014 AutumnHacking Go Compiler Internals / GoCon 2014 Autumn
Hacking Go Compiler Internals / GoCon 2014 Autumn
Moriyoshi Koizumi
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
RootedCON
 
Build Your Own Tools
Build Your Own ToolsBuild Your Own Tools
Build Your Own Tools
Shugo Maeda
 
Go 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX GoGo 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX Go
Rodolfo Carvalho
 
Perly Parsing with Regexp::Grammars
Perly Parsing with Regexp::GrammarsPerly Parsing with Regexp::Grammars
Perly Parsing with Regexp::Grammars
Workhorse Computing
 
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash Cracking
Positive Hack Days
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008
Guillaume Laforge
 
Going to Mars with Groovy Domain-Specific Languages
Going to Mars with Groovy Domain-Specific LanguagesGoing to Mars with Groovy Domain-Specific Languages
Going to Mars with Groovy Domain-Specific Languages
Guillaume Laforge
 
Adventurous Merb
Adventurous MerbAdventurous Merb
Adventurous Merb
Matt Todd
 
Iron Languages - NYC CodeCamp 2/19/2011
Iron Languages - NYC CodeCamp 2/19/2011Iron Languages - NYC CodeCamp 2/19/2011
Iron Languages - NYC CodeCamp 2/19/2011
Jimmy Schementi
 
Lex tool manual
Lex tool manualLex tool manual
Lex tool manualSami Said
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
Andrew Lowe
 
Dart the Better JavaScript
Dart the Better JavaScriptDart the Better JavaScript
Dart the Better JavaScript
Jorg Janke
 

Similar to How to check valid Email? Find using regex. (20)

Go. Why it goes
Go. Why it goesGo. Why it goes
Go. Why it goes
 
Ruby on Rails Presentation
Ruby on Rails PresentationRuby on Rails Presentation
Ruby on Rails Presentation
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 
Stop overusing regular expressions!
Stop overusing regular expressions!Stop overusing regular expressions!
Stop overusing regular expressions!
 
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
 
Groovy Update - JavaPolis 2007
Groovy Update - JavaPolis 2007Groovy Update - JavaPolis 2007
Groovy Update - JavaPolis 2007
 
Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009
 
Hacking Go Compiler Internals / GoCon 2014 Autumn
Hacking Go Compiler Internals / GoCon 2014 AutumnHacking Go Compiler Internals / GoCon 2014 Autumn
Hacking Go Compiler Internals / GoCon 2014 Autumn
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
 
Build Your Own Tools
Build Your Own ToolsBuild Your Own Tools
Build Your Own Tools
 
Go 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX GoGo 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX Go
 
Perly Parsing with Regexp::Grammars
Perly Parsing with Regexp::GrammarsPerly Parsing with Regexp::Grammars
Perly Parsing with Regexp::Grammars
 
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash Cracking
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008
 
Going to Mars with Groovy Domain-Specific Languages
Going to Mars with Groovy Domain-Specific LanguagesGoing to Mars with Groovy Domain-Specific Languages
Going to Mars with Groovy Domain-Specific Languages
 
Adventurous Merb
Adventurous MerbAdventurous Merb
Adventurous Merb
 
Iron Languages - NYC CodeCamp 2/19/2011
Iron Languages - NYC CodeCamp 2/19/2011Iron Languages - NYC CodeCamp 2/19/2011
Iron Languages - NYC CodeCamp 2/19/2011
 
Lex tool manual
Lex tool manualLex tool manual
Lex tool manual
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
Dart the Better JavaScript
Dart the Better JavaScriptDart the Better JavaScript
Dart the Better JavaScript
 

Recently uploaded

Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 

How to check valid Email? Find using regex.

  • 1. How to check valid email? Not only in Ruby brought to DRUG by Piotr Wasiak 20.02.2023 Find using RegEx(p?)
  • 2. Agenda 2 1. RegEx overview 2. Recommendations 3. Ruby quirks / amenities 4. Tools / Resources 5. Advanced RE(2) 6. Ruby 3.2 RE changes
  • 3. Who am I? Piotr Wasiak Ruby, Rails developer Current PRUG organiser 3 Interests: ● climbing, hiking, squash ● contract bridge, chess ● ruby, programming, crypto
  • 4. Regular Expression is a character sequence, that defines a search pattern The purpose is: ● validate the string by the pattern ● get parts of the content (e.g. find or find_and_replace in text editors) 4
  • 5. RegEx history ● Concept of language arose in the 1950s ● Different syntaxes (1980+): ○ POSIX (Basic - or Extended Regular Expressions) ○ Perl (influenced/imported to other languages as PCRE 1997, PCRE2 2015) 5
  • 6. RegEx as a state machine 6 Statement validation: /(?<name>ADAM|PIOTR)s?[=><]{1,2}s*"(?:PIENIĄDZ|KUKU)"/g
  • 8. Find RegEx In replace we can use matched whole phrase or groups. Group number is ordered by starting bracket index and is limited to 1 - 9 8
  • 9. Valid email (1/3) Rails popular gem solution: 9
  • 10. Valid email (2/3) 10 Email validation: /(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|" (?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|[x01-x09x0bx0c x0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9] (?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0 bx0cx0e-x1fx21-x5ax5d-x7f]|[x01-x09x0bx0cx0e-x7f])+)])/g
  • 11. Valid email (3/3) 11 Email validation: /(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|" (?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|[x01-x09x0bx0c x0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9] (?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0 bx0cx0e-x1fx21-x5ax5d-x7f]|[x01-x09x0bx0cx0e-x7f])+)])/g
  • 13. original_regexp = %r{(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]|[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[[:alnum:]](?:[a-z0-9 -]*[[:alnum:]])?.)+[[:alnum:]](?:[a-z0-9-]*[[:alnum:]])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[[:alnum:]]:(?:[x01-x08x0bx 0cx0e-x1f!-Z]-x7f]|[x01-x09x0bx0cx0e-x7f])+)])} alnum_with_hypen = /[a-z0-9-]/.source # posix alternative /[-[:alnum:]]/ ip_number_type = /25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/.source common_parts = /[x01-x08x0bx0cx0e-x1f]-x7f]/.source username_without_backslash_prepended_set = /[#{common_parts}!#-x5b]/.source domain_port_unescaped_set = /[#{common_parts}!-Z]/.source domain_port_escaped_chars_set = /[#{common_parts}x0e-x7f]/.source non_ending_chars = %r{[a-z0-9!#$%&'*+/=?^_`{|}~-]+}.source final_with_variables = /(?:#{non_ending_chars}(?:.#{non_ending_chars})*|"(?:#{username_without_backslash _prepended_set}|#{domain_port_escaped_chars_set})*")@(?:(?:[[:alnum:]](?:#{alnum _with_hypen}*[[:alnum:]])?.)+[[:alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?|[(? :(?:#{ip_number_type}).){3}(?:#{ip_number_type}|#{alnum_with_hypen}*[[:alnum:]]:( ?:#{domain_port_unescaped_set}|#{domain_port_escaped_chars_set})+)])/ 13 Simplify valid email
  • 14. original_regexp = %r{(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]|[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[[:alnum:]](?:[a-z0-9 -]*[[:alnum:]])?.)+[[:alnum:]](?:[a-z0-9-]*[[:alnum:]])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[[:alnum:]]:(?:[x01-x08x0bx 0cx0e-x1f!-Z]-x7f]|[x01-x09x0bx0cx0e-x7f])+)])} alnum_with_hypen = /[a-z0-9-]/.source # posix alternative /[-[:alnum:]]/ ip_number_type = /25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/.source ascii_wo_tabs_cr_nl = /[[:ascii:]&&[^x09-x0ax0d]]/.source domain_port_escaped_chars_set = /[#{ascii_wo_tabs_cr_nl}x09x20"]/.source domain_port_unescaped_set = /[#{ascii_wo_tabs_cr_nl}&&[^x20]]/.source username = /[#{domain_port_unescaped_set}&&[^"]]/.source non_ending_chars = %r{[a-z0-9!#$%&'*+/=?^_`{|}~-]+}.source final_with_variables = /(?:#{non_ending_chars}(?:.#{non_ending_chars})*|"(?:#{username}|#{domain_port_ escaped_chars_set})*")@(?:(?:[[:alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?.)+[[ :alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?|[(?:(?:#{ip_number_type}).){3}(?:# {ip_number_type}|#{alnum_with_hypen}*[[:alnum:]]:(?:#{domain_port_unescaped_set}| #{domain_port_escaped_chars_set})+)])/ 14 Simplify valid email (more ruby version)
  • 15. original_regexp = %r{ # there is no heredoc for regexp (?: # strings with some special chars, but not ending with . [a-z0-9!#$%&'*+/=?^_`{|}~-]+ (?: .[a-z0-9!#$%&'*+/=?^_`{|}~-]+ )* | " (?: # special chars enquoted [x01-x08x0bx0cx0e-x1f!#-x5b]-x7f] | # prepended with backslash, here escaped [x01-x09x0bx0cx0e-x7f] # more special chars )* " # closing quote ) @ # the most crucial ampersand (?: # domain regexp (?: # at least one subdomain joined and finished with . [[:alnum:]] (?: [a-z0-9-]* # subdomain can have many alphanumeric or - inside [[:alnum:]] # subdomain have to finish with alphanumeric char )? . # dot separator )+ [[:alnum:]] # domain have to start with alphanumeric char (?: [a-z0-9-]* # domain can have many alphanumeric or - inside [[:alnum:]] # domain have to finish with alphanumeric char )? 15 /x comments mode | # or direct ip implementation or 3 numbers with . suffix and some special usecases [ # enquoted with square brackets (?: (?: # numbers are quite complex in RegEx 25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]? # 0-255 ). # . suffix ){3} # 3 times (?: 25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]? # 0-255 | # or 3 numbers with . suffix and some special usecases [a-z0-9-]* # alnums also starting with - [[:alnum:]] # finishing without - : (?: [x01-x08x0bx0cx0e-x1f!-Z]-x7f] # many chars | # more ansii chars prefixed with backslash [x01-x09x0bx0cx0e-x7f] )+ ) ] # closing square bracket ) }x # switch to treat spaces/new lines and `# ` suffix as comments
  • 16. Ruby simply string methods are faster and more meaningful: ● .start_with? / .end_with? ● .include?(‘some substring’) ● .chomp ● .strip ● .lines ● .split(‘ ’) # without regexp ● .tr(‘ !?‘, ‘1-9’) 16 Do not overuse regular expression (1/2)
  • 17. Libraries and gems for common concepts: ● URI(url) + .host / .path / .query / .fragment ● File(path_to_file) + .dirname / .basename / .extname ● Nokogiri::HTML( open('https://nokogiri.org/’) ) 17 Do not overuse regular expression (2/2)
  • 18. Do not use REGEX as language parser Programming languages depend more on language nodes/tree. There will be always a problem with some exceptions, different coding styles In Ruby we need to use Ripper or other tools to decompose Ruby code into pieces Markup languages can be parsed by e.g. Nokogiri, Ox, Oj gems easier and more secure 18
  • 19. Clear RegEx ● extract common parts in alternation ● put more likely to appear words in the front of alternation ● use comments and whitespace with /x modifier ● give a name for captured groups, use also non-captured ● split code to smaller logical pieces ● lint code with ruby -w for warnings 19
  • 20. 3. Ruby quirks / flavor 20
  • 21. mix ? Interpolation of RegEx MULTILINE IGNORECASE EXTENDED 21
  • 22. Joke Scrabble: what is a longest word from combined RE switch letters? 22 I M N O X
  • 23. Joke Scrabble: what is a longest word from combined RE switch letters? 23 I M N O X
  • 24. - in general "dot matches at line breaks mode" is turn on with s flag instead of ruby m flag - In Ruby, ^ and $ always match on every line. If you want to specify the beginning of the string, use A. For the very end of the string, use z (or Z including final line break). Quirks in Ruby RegEx engine (1/3) 24
  • 25. Quirks in Ruby RegEx engine (2/3) Ruby does not allow ● look-ahead ● negative look-behind inside a look-behind, such as: 25
  • 26. - Intersection […&&[…]] - Subtraction […&&[^…]] 26 Quirks in Ruby RegEx engine (3/3) Character classes operators
  • 30. 4. Tools / Resources 30
  • 31. Tools / Websites ● regex101.com/ nicest editor, explanation on hover, cheatset, performance analysis ● www.debuggex.com/ visualized graphs with cheat-set ● Visualization plugins for Visual Studio Code ● rubocop and rubocop-performance have some rules for regex ● rubular.com/ check if RegEx works in Ruby 2.5. Other with 2.1 ● rubyapi.org/3.1/o/regexp good Ruby docs 31
  • 32. 32
  • 35. Catastrophic backtracking case /a?n an =~ an / 35
  • 36. “Most modern engines are regex-directed because this is the only way to implement useful features such as lazy quantifiers and backreferences; and atomic grouping and possessive quantifiers that give extra control to backtracking.” PCRE like solutions 36
  • 37. 37
  • 38. 38
  • 39. Back to Finite Automaton - (D/N) FA 39 /abb*a/
  • 40. RegEx to Deterministic Finite Automaton What RegEx is it? 40
  • 41. RegEx to Deterministic Finite Automaton /(100?)*1/ matches: [ 1010101, 1, 10101, 1001001] 41
  • 42. RegEx to Deterministic Finite Automaton /(100?)*1/ 42
  • 43. RegEx to Deterministic Finite Automaton /(100?)*1/ 43
  • 45. 6. Ruby 3.2 RE changes 45 Regexp improvements against ReDoS It is known that Regexp matching may take unexpectedly long. If your code attempts to match a possibly inefficient Regexp against an untrusted input, an attacker may exploit it for efficient Denial of Service
  • 47. ReDoS improvements (2/2) 47 Improved Regexp matching algorithm using a memoization technique
  • 48. Sources 48 ● devopedia.org/regex-engines ● patshaughnessy.net/2012/4/3/ (...) rubys-regular-expression-algorithm ● github.com/google/re2/wiki/Syntax ● optimized re2 called hyperscan ● wiki/Determinizacja_automatu_skonczonego ● regular-expressions.info/refrepeat.html ● rexegg.com/regex-optimizations.html ● bugs.ruby-lang.org/issues/19104 selective memiozation
  • 49. Thanks for listening What’s your question? 49