SciLifeLab Coffee & Code, Sept 25th 2020.
An introduction to regular expressions at the SciLifeLab / NGI Sweden "Coffee 'n code" talk. Aimed at people who sort-of-know what regexes are, but find them a bit terrifying..
Watch the talk on YouTube: https://youtu.be/2Yp6kvdUMxM
26. GREEDYMATCHING
/reg.*exw*/
(no newlines this time)
A regular expression (shortened
as regex or regexp; also referred to
as rational expression) is a sequence
of characters that define a search pattern.
A greedy quantifier
29. REGEXMODIFIERS
g
m
Global - don’t return after first match
i
Multiline - ^ and $ match start and end of each line(not the whole string)
Insensitive - case insensitive
U Ungreedy - Make all quantifiers ungreedy
s Single line - Make the dot match newlines too
43. SPECIALCHARACTERCONTEXTS
/a?/ Zero or one a characters
/a*?/ Zero or more a characters, ungreedy
/(?!a)/ Negative lookahead group for a character
/^a/ An a at the start of the string
/[^a]/ Any character except an a
44. REGEXENGINEDIFFERENCES
Basic regular expressions (BRE)
Extended regular expressions (ERE)
Perl-compatible regular expressions (PCRE)
grep sed vi
awk egrep =~
Many code languages
https://unix.stackexchange.com/questions/119905/why-does-my-regular-expression-work-in-x-but-not-in-y (thanks to Andreas Kähäri)
53. KNOWWHENNOTTOUSEAREGEX
"a" in "abc"
10000000 loops, best of 5: 20.2 nsec per loop
import re
pattern = re.compile(“a")
pattern.search("abc")
2000000 loops, best of 5: 136 nsec per loop
~ 7x faster
54. IFSPEEDDOESMATTER…
..ignore most of the previous tips
• Match more than you need to
• Smush everything into one regex
• Fail as fast as possible
• Avoid large character classes
• Be as specific as possible with quantifiers
• Use possessive quantifiers
• For long strings with lots of no-matches, unroll your loops