Preface 
๏ถ Regular Expression (์ •๊ทœํ‘œํ˜„์‹)์˜ ์•ฝ์นญ REGEX 
๏ถ string pattern์€ ๋ฌธ์ž์—ด์˜ ์กฐํ•ฉ๋˜๋Š” ๊ทœ์น™ 
๏ถ meta charater๋Š” ๋‹ค๋ฅธ ์˜๋ฏธ๋ฅผ ์ˆ˜์‹ํ•˜๋Š” ๋ฌธ์ž 
๏ถ grep์€ ์ •๊ทœ์‹์„ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์œ ํ‹ธ๋ฆฌํ‹ฐ์ž…๋‹ˆ๋‹ค. 
๏ต egrep, fgrep์€ grep์˜ ํŠนํ™”๋œ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค. 
๏ถ sed๋Š” ์ŠคํŠธ๋ฆผ ์—๋””ํ„ฐ์ž…๋‹ˆ๋‹ค. 
๏ถ awk๋Š” ํŒจํ„ด์‹์„ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋Š” ์–ธ์–ดํˆด์ž…๋‹ˆ๋‹ค.
what is String Pattern? 
๏ถ ์กฐํ•ฉ๋œ ๋ฌธ์ž์—ด์˜ ๊ทœ์น™ 
๏ต e-mail ์ฃผ์†Œ 
๏ƒ˜ ์ค‘๊ฐ„์— @ ๋ฌธ์ž๊ฐ€ ๋“ฑ์žฅ 
๏ƒ˜ @ ๋ฌธ์ž์˜ ์˜ค๋ฅธ์ชฝ์€ dot ์™€ ์˜๋ฌธ, ์•„์Šคํ‚ค์ฝ”๋“œ๋กœ ์ด๋ฃจ์–ด์ง 
๏ƒ˜ @ ๋ฌธ์ž์˜ ์™ผ์ชฝ์€ ๊ณ„์ •๋ช… 
๏ต Web URL 
๏ƒ˜ http:// ์œผ๋กœ ์‹œ์ž‘ 
๏ƒ˜ ํ˜ธ์ŠคํŠธ์ด๋ฆ„๋’ค์—๋Š” URI ๊ฐ€ ๋ถ™๊ณ  ๋””๋ ‰ํ† ๋ฆฌ๊ตฌ์กฐ๋กœ ๋ช…๋ช… 
๏ƒ˜ CGI ๊ธฐ๋ฒ•์ด ์‚ฌ์šฉ๋  ๊ฒฝ์šฐ์— ? ์ด ๋“ฑ์žฅํ• ์ˆ˜๋„ ์žˆ์Œ
Regular Expression : Examples 
a.cdef? 
[a-zA-Z]+ 
.*boy 
(caret|dalar) 
(.*/)[^/]* 
^Do.*?$ 
http://([a-zA-Z0-9.-])/.* 
http://.*?(.*) 
REGEX๋ฅผ ๋ฐฐ์šด ๋’ค์— ํ•ด์„ํ•ด๋ด…์‹œ๋‹ค!
POSIX regex: meta char. 
๋ฌธ์ž ์ง€์ • . ์ž„์˜์˜ ๋ฌธ์ž ํ•œ ๊ฐœ๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. 
๋ฐ˜๋ณต ์ง€์ • 
? ์„ ํ–‰๋ฌธ์žํŒจํ„ด์ด 0๊ฐœ ํ˜น์€ 1๊ฐœ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. - ERE 
+ ์„ ํ–‰๋ฌธ์žํŒจํ„ด์ด 1๊ฐœ ์ด์ƒ ๋ฐ˜๋ณต๋ฉ๋‹ˆ๋‹ค. - ERE 
* ์„ ํ–‰๋ฌธ์žํŒจํ„ด์ด 0๊ฐœ ์ด์ƒ ๋ฐ˜๋ณต๋ฉ๋‹ˆ๋‹ค. 
{...} 
(interval) ๋ฐ˜๋ณต์ˆ˜๋ฅผ ์ง์ ‘ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด 
{3} : 3๋ฒˆ ๋ฐ˜๋ณต {,7} : 7๋ฒˆ ์ดํ•˜ {2,5} : 2~5๋ฒˆ ๋ฐ˜๋ณต 
์œ„์น˜์ง€์ • 
^ ๋ผ์ธ์˜ ์•ž๋ถ€๋ถ„์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. 
$ ๋ผ์ธ์˜ ๋๋ถ€๋ถ„์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. 
๊ทธ๋ฃน ์ง€์ • 
[...] ์•ˆ์— ์ง€์ •๋œ ๋ฌธ์ž๋“ค ๊ทธ๋ฃน ์ค‘์— ํ•œ ๋ฌธ์ž๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. 
[^...] ์•ˆ์— ์ง€์ •๋œ ๊ทธ๋ฃน์˜ ๋ฌธ์ž๋ฅผ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€(์—ฌ์ง‘ํ•ฉ)๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. 
๊ธฐํƒ€ 
 (escape) ๋ฉ”ํƒ€์˜ ์˜๋ฏธ๋ฅผ ์—†์• ์ค๋‹ˆ๋‹ค. 
| (alternation) OR์—ฐ์‚ฐ์„ ํ•ฉ๋‹ˆ๋‹ค. - ERE 
( ) ๊ด„ํ˜ธ๋Š” ํŒจํ„ด์„ ๊ทธ๋ฃนํ™” ๋ฐ ๋ฐฑ๋ ˆํผ๋Ÿฐ์Šค์˜ ์ž‘๋™์„ ํ•ฉ๋‹ˆ๋‹ค. 
* POSIX RE - IEEE std 1003.1 (International standard) 
* ERE - Extended Regular Expression
applying pattern 
๏ถ dot/period : . - any single character 
๏ต c.b : cab, cbb, ccb, cdb, c1b, c2b ๋“ฑ๋“ฑ 
๏ต a..b : axyb, a12b, ax0b, a#-b ๋“ฑ๋“ฑ 
๏ต a.........b : ์ด๋Ÿฐ ๋ฐฉ์‹์œผ๋กœ๋Š” ์“ฐ์ง€ ์•Š๋Š”๋‹ค.
applying pattern (con't) 
๏ถ ?, +, *, {m,n} - iteration, interval 
๏ต X?ML : XML or ML 
๏ต can+ : can, cann, cannn, cannnn, ... 
๏ต can* : ca, can, cann, cannn, ... 
๏ต http.* : http://, httpd, https, http1234 
๏ƒ˜ "http"๋’ค์— ์–ด๋–ค ๋ฌธ์ž๋„ ๋ถ™์„ ์ˆ˜ ์žˆ๋‹ค 
๏ต abc{2,5} : abcc, abccc, abcccc, abccccc 
๏ƒ˜ interval expression์€ ๋ช‡๋ช‡ ์œ ํ‹ธ, RE matching engine์—์„œ๋Š” ์ง€์› ์•Š๋Š”๋‹ค.
applying pattern (con't) 
๏ถ ^, $ - position 
๏ต ^ftp : "ftp"๋กœ ์‹œ์ž‘ํ•˜๋Š” ํ–‰ 
๏ต ^$ : ๋น„์–ด์žˆ๋Š” ํ–‰ (ํ–‰์˜ ์‹œ์ž‘๊ณผ ๋์— ์•„๋ฌด๋Ÿฐ ๋ฌธ์ž๋„ ์—†๋‹ค) 
๏ต <BR>$ : <BR>๋กœ ๋๋‚˜๋Š” ๊ฒฝ์šฐ
applying pattern (con't) 
๏ถ [ ], [^ ] - character class 
๏ต [abcd] : a, b, c, d 
๏ต [0-9] : 0, 1, 2, ... , 9 
๏ต [a-zA-Z0-9] : ์•ŒํŒŒ๋ฒณ๊ณผ ์ˆซ์ž 
๏ต [^0-9] : [0-9]์„ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ 
๏ต ^์ž์ฒด๋ฅผ ๊ทธ๋ฃนํ™” ํ•˜๋ ค๋ฉด? 
๏ƒ˜ ^์ด [ ๋ฐ”๋กœ ๋’ค์—๋งŒ ์˜ค์ง€ ์•Š์œผ๋ฉด ๋œ๋‹ค. 
๏ƒ˜ ํ˜น์€ escape ์‹œํ‚ค๊ฑฐ๋‚˜... 
๏ต interval expression์€ ๋ช‡๋ช‡ ํˆด์€ ์˜ต์…˜์„ ๋„ฃ์–ด์•ผ๋งŒ ์ง€์›ํ•œ๋‹ค. 
๏ƒ˜ e.g. awk
greedy matching 
๏ถ greedy matching ์ด๋ž€? 
$ var2="Itโ€™s gonna be <b>real</b>Itโ€™s gonna <i>change everything 
</i> I feel" 
$ echo $var2 | egrep -o "<.+>" 
<b>real</b>It's gonna <i>change everything</i> 
๏ต pattern ์€ ์ตœ๋Œ€ํ•œ ๋งŽ์€ ์ˆ˜์˜ ๋งค์นญ์„ ํ•˜๋ ค๊ณ  ํ•จ 
๏ต greedy matchingํ›„ result set์˜ ๋ฒ”์œ„๋ฅผ ์ค„์—ฌ๋‚˜๊ฐ€๋ฉด์„œ ์ •ํ™•ํ•œ ํ‘œํ˜„์‹ 
์„ ์™„์„ฑํ•˜๋„๋ก... 
๏ถ non-greedy matching ์ด๋ž€? 
๏ต greedy matching ๊ฒฐ๊ณผ๋ฅผ ์ตœ์†Œ ๋งค์นญ์„ ์œ„ํ•ด ์ˆ˜์ •ํ•œ ๊ฒฐ๊ณผ.
non-greedy matching (con't) 
๏ถ non-greedy matching์„ ์œ„ํ•œ ํ‘œํ˜„์‹์˜ ์ˆ˜์ • 
$ var2="Itโ€™s gonna be <b>real</b>Itโ€™s gonna <i>change everything 
</i> I feel" 
$ echo $var2 | egrep -o "<.+>" 
<b>real</b>It's gonna <i>change everything</i> 
$ echo $var2 | egrep -o "<[^<>]+>" 
<b> 
</b> 
<i> 
</i>
back-reference 
๏ถ ๋งค์นญ๋œ ๊ฒฐ๊ณผ๋ฅผ ๋‹ค์‹œ ์‚ฌ์šฉํ•˜๋Š” ํŒจํ„ด (๋ฐฑ๋ ˆํผ๋Ÿฐ์Šค) 
๏ต "( )"๋กœ ๋ฌถ์ธ ํŒจํ„ด ๋งค์นญ ๋ถ€๋ถ„์„ "#"์˜ ํ˜•ํƒœ๋กœ ์žฌ์‚ฌ์šฉ 
(#๋Š” ์ˆซ์ž๊ฐ€ ์ˆœ์„œ๋Œ€๋กœ), 0๋ฒˆ์€ ์ „์ฒด ๋งค์นญ ๊ฒฐ๊ณผ 
$ egrep "^(.+):x:[0-9]+:[0-9]+:.*:/home/1:" /etc/passwd 
sunyzero:x:500:500:Steven Kim:/home/sunyzero:/bin/bash 
linuxer:x:502:502::/home/linuxer:/bin/bash 
$ egrep -v "^(.+):x:[0-9]+:[0-9]+:.*:/home/1:" /etc/passwd 
... (์ƒ๋žต, ์ƒ์ƒํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค) ... 
๏ต -v : invert 
๏ต --color : Surround the matched (non-empty) strings
back-reference (conโ€™t) 
๏ถ back-reference ์‘์šฉ : tag๋กœ ๊ฐ์‹ธ์—ฌ์ง„ ๋ถ€๋ถ„ ์ถ”์ถœ 
$ var2="Itโ€™s gonna be <b>real</b>Itโ€™s gonna <i>change everything 
</i> I feel" 
$ echo $var2 | egrep -o "<([a-zA-Z0-9]+)>.*</1>" 
<b>real</b> 
<i>change everything</i> 
$ echo $var2 | egrep --color "<([a-zA-Z0-9]+)>.*</1>" 
... ์ƒ๋žต ...
Tip! - sed (stream ed) 
๏ถ substitution (sed) 
๏ต vim์˜ substitution command์™€ ๊ฐ™๋‹ค 
$ var2="Itโ€™s gonna be <b>real</b>Itโ€™s gonna <i>change everything 
</i> I feel" 
$ echo $var2 | sed -e "s/<[^<>]+>/ /g" 
It's gonna be real It's gonna change everything I feel 
$ echo $var2 | sed -e "s,<[^<>]+>, ,g" 
๏ƒ˜ vim์˜ substitution command๋Š” sed์˜ ๊ธฐ๋Šฅ์ด ํฌํ•จ๋œ ๊ฒƒ๋ฟ์ด๋‹ค! 
= sed๋ฅผ ์•Œ๋ฉด vim๋„ ์•Œ๊ณ ... UNIX๋Š” ์ด๋ ‡๊ฒŒ ์„œ๋กœ ์—ฐ๊ด€๋œ ๊ธฐ๋Šฅ๋“ค์ด ๋งŽ๋‹ค.
Tip! - awk 
๏ถ awk์—์„œ๋„ ์œ„์˜ ๋ชจ๋“  ๊ธฐ๋Šฅ์„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค. 
$ var2="Itโ€™s gonna be <b>real</b>Itโ€™s gonna <i>change everything 
</i> I feel" 
$ echo $var2 | awk '{ gsub(/[ ]*<[^<>]+>[ ]*/, " "); print }' 
Itโ€™s gonna be real Itโ€™s gonna change everything I feel
alternation 
๏ถ ( )๋Š” alternation ์šฉ๋„๋กœ๋„ ์‚ฌ์šฉ๋จ 
๏ต "( )" alternation ์ด๋‚˜ pattern group์„ ๋ฌถ์„๋•Œ๋„ ์‚ฌ์šฉ๋œ๋‹ค. 
$ echo "cat is not dog" | egrep -o "(cat|dog)" 
cat 
dog 
$ echo "My Childhood~~~ bye bye" | egrep -o "(child|boy)?hood" 
hood
predefined character class 
ํด๋ž˜์Šค ์„ค ๋ช… 
[[:alnum:]] ์•ŒํŒŒ๋ฒณ๊ณผ ์ˆซ์ž๋“ค์˜ ๋ชจ์Œ 
[[:alpha:]] ์•ŒํŒŒ๋ฒณ๋“ค (๋Œ€์†Œ๋ฌธ์ž) 
[[:blank:]] Tab(t)์„ ์˜๋ฏธ 
[[:cntrl:]] ์ œ์–ด๋ฌธ์ž๋“ค์„ ์˜๋ฏธ 
[[:digit:]] ์ˆซ์ž๋“ค์„ ์˜๋ฏธ 
[[:xdigit:]] 16์ง„์ˆ˜(hex)ํ˜• ์ˆซ์ž๋“ค์„ ์˜๋ฏธ, ์ฆ‰ 0-9a-fA-F ๋ฅผ ํฌํ•จํ•œ๋‹ค. 
[[:upper:]] ์•ŒํŒŒ๋ฒณ ๋Œ€๋ฌธ์ž 
[[:lower:]] ์•ŒํŒŒ๋ฒณ ์†Œ๋ฌธ์ž 
[[:space:]] tab(t), CR(r), New line(n) ์„ ํฌํ•จํ•œ๋‹ค. 
[[:print:]] ์ถœ๋ ฅ ๊ฐ€๋Šฅํ•œ ๋ฌธ์ž๋“ค 
[[:graph:]] ๊ณต๋ฐฑ์„ ์ œ์™ธํ•œ ๋ฌธ์ž๋“ค 
[[:punct:]] ์ถœ๋ ฅ ๊ฐ€๋Šฅํ•œ ํŠน์ˆ˜๋ฌธ์ž๋“ค
predefined character class (con't) 
๏ถ [...]์•ˆ์— ์กฐํ•ฉ๊ฐ€๋Šฅ 
$ var5="sunyzero@email.com:010-8500-80**:Sun-young Kim:AB-0105R" 
$ echo $var5 | egrep -o "^[[:alpha:]@]+" 
sunyzero@email 
$ echo $var5 | egrep -o "[[:upper:][:digit:]-]{8}" 
010-8500 
AB-0105R 
๏ต sunyzero@email๊นŒ์ง€๋งŒ ์ž˜๋ ธ๋‹ค. ๋ชจ๋‘ ๋‚˜์˜ค๊ฒŒ ํ•˜๋ ค๋ฉด?
boundary - ERE 
๏ถ word ๊ฒฝ๊ณ„ ๊ฒ€์ƒ‰์— ์‚ฌ์šฉ 
b boundary๊ฐ€ ๋งž๋Š” ํ‘œํ˜„์‹๋งŒ ์ฐพ์Šต๋‹ˆ๋‹ค. (๋‹จ์–ด ๊ฒฝ๊ณ„๋ฉด ๊ฒ€์ƒ‰) 
B boundary์— ๋งž์ง€ ์•Š๋Š” ํ‘œํ˜„์‹๋งŒ ์ฐพ์Šต๋‹ˆ๋‹ค. (๋‹จ์–ด ๊ฒฝ๊ณ„๋ฉด์ด ์•„๋‹Œ ๊ฒฝ์šฐ๋งŒ ๊ฒ€์ƒ‰) 
$ var3="abc? <def> 123hijklm" 
$ echo $var3 | egrep -o "[a-j]+" 
$ echo $var3 | egrep --color "B[a-j]+B" 
abc? <def> 123hijklm 
abc 
def 
hij 
$ echo $var3 | egrep --color "b[a-j]+b" 
abc? <def> 123hijklm
REGEX and PCRE 
๏ถ POSIX REGEX 
๏ต ๊ฐ„๋‹จํ•œ ํŒจํ„ด ๋งค์นญ์— ์‚ฌ์šฉ๋œ๋‹ค. 
๏ต ํŒจํ„ด์˜ ๋ณต์žกํ•จ์ด ๋Š˜์–ด๋‚˜๋ฉด ์„ฑ๋Šฅ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒ. 
๏ต ์ฒ˜์Œ์—” ๊ผญ POSIX REGEX๋ถ€ํ„ฐ ํ•™์Šตํ•ด์•ผ๋งŒ ํ•œ๋‹ค.- Standard๋‹ˆ๊นŒ! 
๏ถ PCRE (Perl Compatible Regular Expr.) 
๏ต perl์—์„œ ํŒŒ์ƒ๋œ ํ™•์žฅ๋œ ์ •๊ทœํ‘œํ˜„์‹ 
๏ต ๋งค์šฐ ๋น ๋ฅธ ์†๋„, ํ™•์žฅ๋œ ํ‘œํ˜„์‹์—... 
๏ต C, C++, ๊ธฐํƒ€ ๋Œ€๋ถ€๋ถ„์˜ ์–ธ์–ด๊ฐ€ ์ง€์›ํ•œ๋‹ค. (์ถ”๊ฐ€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ์ œ๊ณต) 
๏ต ์‹ค๋ฌด๋ผ๋ฉด PCRE๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํŽธ์ด ๋‚ซ๋‹ค.

Regex

  • 1.
    Preface ๏ถ RegularExpression (์ •๊ทœํ‘œํ˜„์‹)์˜ ์•ฝ์นญ REGEX ๏ถ string pattern์€ ๋ฌธ์ž์—ด์˜ ์กฐํ•ฉ๋˜๋Š” ๊ทœ์น™ ๏ถ meta charater๋Š” ๋‹ค๋ฅธ ์˜๋ฏธ๋ฅผ ์ˆ˜์‹ํ•˜๋Š” ๋ฌธ์ž ๏ถ grep์€ ์ •๊ทœ์‹์„ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์œ ํ‹ธ๋ฆฌํ‹ฐ์ž…๋‹ˆ๋‹ค. ๏ต egrep, fgrep์€ grep์˜ ํŠนํ™”๋œ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค. ๏ถ sed๋Š” ์ŠคํŠธ๋ฆผ ์—๋””ํ„ฐ์ž…๋‹ˆ๋‹ค. ๏ถ awk๋Š” ํŒจํ„ด์‹์„ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋Š” ์–ธ์–ดํˆด์ž…๋‹ˆ๋‹ค.
  • 2.
    what is StringPattern? ๏ถ ์กฐํ•ฉ๋œ ๋ฌธ์ž์—ด์˜ ๊ทœ์น™ ๏ต e-mail ์ฃผ์†Œ ๏ƒ˜ ์ค‘๊ฐ„์— @ ๋ฌธ์ž๊ฐ€ ๋“ฑ์žฅ ๏ƒ˜ @ ๋ฌธ์ž์˜ ์˜ค๋ฅธ์ชฝ์€ dot ์™€ ์˜๋ฌธ, ์•„์Šคํ‚ค์ฝ”๋“œ๋กœ ์ด๋ฃจ์–ด์ง ๏ƒ˜ @ ๋ฌธ์ž์˜ ์™ผ์ชฝ์€ ๊ณ„์ •๋ช… ๏ต Web URL ๏ƒ˜ http:// ์œผ๋กœ ์‹œ์ž‘ ๏ƒ˜ ํ˜ธ์ŠคํŠธ์ด๋ฆ„๋’ค์—๋Š” URI ๊ฐ€ ๋ถ™๊ณ  ๋””๋ ‰ํ† ๋ฆฌ๊ตฌ์กฐ๋กœ ๋ช…๋ช… ๏ƒ˜ CGI ๊ธฐ๋ฒ•์ด ์‚ฌ์šฉ๋  ๊ฒฝ์šฐ์— ? ์ด ๋“ฑ์žฅํ• ์ˆ˜๋„ ์žˆ์Œ
  • 3.
    Regular Expression :Examples a.cdef? [a-zA-Z]+ .*boy (caret|dalar) (.*/)[^/]* ^Do.*?$ http://([a-zA-Z0-9.-])/.* http://.*?(.*) REGEX๋ฅผ ๋ฐฐ์šด ๋’ค์— ํ•ด์„ํ•ด๋ด…์‹œ๋‹ค!
  • 4.
    POSIX regex: metachar. ๋ฌธ์ž ์ง€์ • . ์ž„์˜์˜ ๋ฌธ์ž ํ•œ ๊ฐœ๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ณต ์ง€์ • ? ์„ ํ–‰๋ฌธ์žํŒจํ„ด์ด 0๊ฐœ ํ˜น์€ 1๊ฐœ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. - ERE + ์„ ํ–‰๋ฌธ์žํŒจํ„ด์ด 1๊ฐœ ์ด์ƒ ๋ฐ˜๋ณต๋ฉ๋‹ˆ๋‹ค. - ERE * ์„ ํ–‰๋ฌธ์žํŒจํ„ด์ด 0๊ฐœ ์ด์ƒ ๋ฐ˜๋ณต๋ฉ๋‹ˆ๋‹ค. {...} (interval) ๋ฐ˜๋ณต์ˆ˜๋ฅผ ์ง์ ‘ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด {3} : 3๋ฒˆ ๋ฐ˜๋ณต {,7} : 7๋ฒˆ ์ดํ•˜ {2,5} : 2~5๋ฒˆ ๋ฐ˜๋ณต ์œ„์น˜์ง€์ • ^ ๋ผ์ธ์˜ ์•ž๋ถ€๋ถ„์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. $ ๋ผ์ธ์˜ ๋๋ถ€๋ถ„์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฃน ์ง€์ • [...] ์•ˆ์— ์ง€์ •๋œ ๋ฌธ์ž๋“ค ๊ทธ๋ฃน ์ค‘์— ํ•œ ๋ฌธ์ž๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. [^...] ์•ˆ์— ์ง€์ •๋œ ๊ทธ๋ฃน์˜ ๋ฌธ์ž๋ฅผ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€(์—ฌ์ง‘ํ•ฉ)๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐํƒ€ (escape) ๋ฉ”ํƒ€์˜ ์˜๋ฏธ๋ฅผ ์—†์• ์ค๋‹ˆ๋‹ค. | (alternation) OR์—ฐ์‚ฐ์„ ํ•ฉ๋‹ˆ๋‹ค. - ERE ( ) ๊ด„ํ˜ธ๋Š” ํŒจํ„ด์„ ๊ทธ๋ฃนํ™” ๋ฐ ๋ฐฑ๋ ˆํผ๋Ÿฐ์Šค์˜ ์ž‘๋™์„ ํ•ฉ๋‹ˆ๋‹ค. * POSIX RE - IEEE std 1003.1 (International standard) * ERE - Extended Regular Expression
  • 5.
    applying pattern ๏ถdot/period : . - any single character ๏ต c.b : cab, cbb, ccb, cdb, c1b, c2b ๋“ฑ๋“ฑ ๏ต a..b : axyb, a12b, ax0b, a#-b ๋“ฑ๋“ฑ ๏ต a.........b : ์ด๋Ÿฐ ๋ฐฉ์‹์œผ๋กœ๋Š” ์“ฐ์ง€ ์•Š๋Š”๋‹ค.
  • 6.
    applying pattern (con't) ๏ถ ?, +, *, {m,n} - iteration, interval ๏ต X?ML : XML or ML ๏ต can+ : can, cann, cannn, cannnn, ... ๏ต can* : ca, can, cann, cannn, ... ๏ต http.* : http://, httpd, https, http1234 ๏ƒ˜ "http"๋’ค์— ์–ด๋–ค ๋ฌธ์ž๋„ ๋ถ™์„ ์ˆ˜ ์žˆ๋‹ค ๏ต abc{2,5} : abcc, abccc, abcccc, abccccc ๏ƒ˜ interval expression์€ ๋ช‡๋ช‡ ์œ ํ‹ธ, RE matching engine์—์„œ๋Š” ์ง€์› ์•Š๋Š”๋‹ค.
  • 7.
    applying pattern (con't) ๏ถ ^, $ - position ๏ต ^ftp : "ftp"๋กœ ์‹œ์ž‘ํ•˜๋Š” ํ–‰ ๏ต ^$ : ๋น„์–ด์žˆ๋Š” ํ–‰ (ํ–‰์˜ ์‹œ์ž‘๊ณผ ๋์— ์•„๋ฌด๋Ÿฐ ๋ฌธ์ž๋„ ์—†๋‹ค) ๏ต <BR>$ : <BR>๋กœ ๋๋‚˜๋Š” ๊ฒฝ์šฐ
  • 8.
    applying pattern (con't) ๏ถ [ ], [^ ] - character class ๏ต [abcd] : a, b, c, d ๏ต [0-9] : 0, 1, 2, ... , 9 ๏ต [a-zA-Z0-9] : ์•ŒํŒŒ๋ฒณ๊ณผ ์ˆซ์ž ๏ต [^0-9] : [0-9]์„ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ ๏ต ^์ž์ฒด๋ฅผ ๊ทธ๋ฃนํ™” ํ•˜๋ ค๋ฉด? ๏ƒ˜ ^์ด [ ๋ฐ”๋กœ ๋’ค์—๋งŒ ์˜ค์ง€ ์•Š์œผ๋ฉด ๋œ๋‹ค. ๏ƒ˜ ํ˜น์€ escape ์‹œํ‚ค๊ฑฐ๋‚˜... ๏ต interval expression์€ ๋ช‡๋ช‡ ํˆด์€ ์˜ต์…˜์„ ๋„ฃ์–ด์•ผ๋งŒ ์ง€์›ํ•œ๋‹ค. ๏ƒ˜ e.g. awk
  • 9.
    greedy matching ๏ถgreedy matching ์ด๋ž€? $ var2="Itโ€™s gonna be <b>real</b>Itโ€™s gonna <i>change everything </i> I feel" $ echo $var2 | egrep -o "<.+>" <b>real</b>It's gonna <i>change everything</i> ๏ต pattern ์€ ์ตœ๋Œ€ํ•œ ๋งŽ์€ ์ˆ˜์˜ ๋งค์นญ์„ ํ•˜๋ ค๊ณ  ํ•จ ๏ต greedy matchingํ›„ result set์˜ ๋ฒ”์œ„๋ฅผ ์ค„์—ฌ๋‚˜๊ฐ€๋ฉด์„œ ์ •ํ™•ํ•œ ํ‘œํ˜„์‹ ์„ ์™„์„ฑํ•˜๋„๋ก... ๏ถ non-greedy matching ์ด๋ž€? ๏ต greedy matching ๊ฒฐ๊ณผ๋ฅผ ์ตœ์†Œ ๋งค์นญ์„ ์œ„ํ•ด ์ˆ˜์ •ํ•œ ๊ฒฐ๊ณผ.
  • 10.
    non-greedy matching (con't) ๏ถ non-greedy matching์„ ์œ„ํ•œ ํ‘œํ˜„์‹์˜ ์ˆ˜์ • $ var2="Itโ€™s gonna be <b>real</b>Itโ€™s gonna <i>change everything </i> I feel" $ echo $var2 | egrep -o "<.+>" <b>real</b>It's gonna <i>change everything</i> $ echo $var2 | egrep -o "<[^<>]+>" <b> </b> <i> </i>
  • 11.
    back-reference ๏ถ ๋งค์นญ๋œ๊ฒฐ๊ณผ๋ฅผ ๋‹ค์‹œ ์‚ฌ์šฉํ•˜๋Š” ํŒจํ„ด (๋ฐฑ๋ ˆํผ๋Ÿฐ์Šค) ๏ต "( )"๋กœ ๋ฌถ์ธ ํŒจํ„ด ๋งค์นญ ๋ถ€๋ถ„์„ "#"์˜ ํ˜•ํƒœ๋กœ ์žฌ์‚ฌ์šฉ (#๋Š” ์ˆซ์ž๊ฐ€ ์ˆœ์„œ๋Œ€๋กœ), 0๋ฒˆ์€ ์ „์ฒด ๋งค์นญ ๊ฒฐ๊ณผ $ egrep "^(.+):x:[0-9]+:[0-9]+:.*:/home/1:" /etc/passwd sunyzero:x:500:500:Steven Kim:/home/sunyzero:/bin/bash linuxer:x:502:502::/home/linuxer:/bin/bash $ egrep -v "^(.+):x:[0-9]+:[0-9]+:.*:/home/1:" /etc/passwd ... (์ƒ๋žต, ์ƒ์ƒํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค) ... ๏ต -v : invert ๏ต --color : Surround the matched (non-empty) strings
  • 12.
    back-reference (conโ€™t) ๏ถback-reference ์‘์šฉ : tag๋กœ ๊ฐ์‹ธ์—ฌ์ง„ ๋ถ€๋ถ„ ์ถ”์ถœ $ var2="Itโ€™s gonna be <b>real</b>Itโ€™s gonna <i>change everything </i> I feel" $ echo $var2 | egrep -o "<([a-zA-Z0-9]+)>.*</1>" <b>real</b> <i>change everything</i> $ echo $var2 | egrep --color "<([a-zA-Z0-9]+)>.*</1>" ... ์ƒ๋žต ...
  • 13.
    Tip! - sed(stream ed) ๏ถ substitution (sed) ๏ต vim์˜ substitution command์™€ ๊ฐ™๋‹ค $ var2="Itโ€™s gonna be <b>real</b>Itโ€™s gonna <i>change everything </i> I feel" $ echo $var2 | sed -e "s/<[^<>]+>/ /g" It's gonna be real It's gonna change everything I feel $ echo $var2 | sed -e "s,<[^<>]+>, ,g" ๏ƒ˜ vim์˜ substitution command๋Š” sed์˜ ๊ธฐ๋Šฅ์ด ํฌํ•จ๋œ ๊ฒƒ๋ฟ์ด๋‹ค! = sed๋ฅผ ์•Œ๋ฉด vim๋„ ์•Œ๊ณ ... UNIX๋Š” ์ด๋ ‡๊ฒŒ ์„œ๋กœ ์—ฐ๊ด€๋œ ๊ธฐ๋Šฅ๋“ค์ด ๋งŽ๋‹ค.
  • 14.
    Tip! - awk ๏ถ awk์—์„œ๋„ ์œ„์˜ ๋ชจ๋“  ๊ธฐ๋Šฅ์„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค. $ var2="Itโ€™s gonna be <b>real</b>Itโ€™s gonna <i>change everything </i> I feel" $ echo $var2 | awk '{ gsub(/[ ]*<[^<>]+>[ ]*/, " "); print }' Itโ€™s gonna be real Itโ€™s gonna change everything I feel
  • 15.
    alternation ๏ถ ()๋Š” alternation ์šฉ๋„๋กœ๋„ ์‚ฌ์šฉ๋จ ๏ต "( )" alternation ์ด๋‚˜ pattern group์„ ๋ฌถ์„๋•Œ๋„ ์‚ฌ์šฉ๋œ๋‹ค. $ echo "cat is not dog" | egrep -o "(cat|dog)" cat dog $ echo "My Childhood~~~ bye bye" | egrep -o "(child|boy)?hood" hood
  • 16.
    predefined character class ํด๋ž˜์Šค ์„ค ๋ช… [[:alnum:]] ์•ŒํŒŒ๋ฒณ๊ณผ ์ˆซ์ž๋“ค์˜ ๋ชจ์Œ [[:alpha:]] ์•ŒํŒŒ๋ฒณ๋“ค (๋Œ€์†Œ๋ฌธ์ž) [[:blank:]] Tab(t)์„ ์˜๋ฏธ [[:cntrl:]] ์ œ์–ด๋ฌธ์ž๋“ค์„ ์˜๋ฏธ [[:digit:]] ์ˆซ์ž๋“ค์„ ์˜๋ฏธ [[:xdigit:]] 16์ง„์ˆ˜(hex)ํ˜• ์ˆซ์ž๋“ค์„ ์˜๋ฏธ, ์ฆ‰ 0-9a-fA-F ๋ฅผ ํฌํ•จํ•œ๋‹ค. [[:upper:]] ์•ŒํŒŒ๋ฒณ ๋Œ€๋ฌธ์ž [[:lower:]] ์•ŒํŒŒ๋ฒณ ์†Œ๋ฌธ์ž [[:space:]] tab(t), CR(r), New line(n) ์„ ํฌํ•จํ•œ๋‹ค. [[:print:]] ์ถœ๋ ฅ ๊ฐ€๋Šฅํ•œ ๋ฌธ์ž๋“ค [[:graph:]] ๊ณต๋ฐฑ์„ ์ œ์™ธํ•œ ๋ฌธ์ž๋“ค [[:punct:]] ์ถœ๋ ฅ ๊ฐ€๋Šฅํ•œ ํŠน์ˆ˜๋ฌธ์ž๋“ค
  • 17.
    predefined character class(con't) ๏ถ [...]์•ˆ์— ์กฐํ•ฉ๊ฐ€๋Šฅ $ var5="sunyzero@email.com:010-8500-80**:Sun-young Kim:AB-0105R" $ echo $var5 | egrep -o "^[[:alpha:]@]+" sunyzero@email $ echo $var5 | egrep -o "[[:upper:][:digit:]-]{8}" 010-8500 AB-0105R ๏ต sunyzero@email๊นŒ์ง€๋งŒ ์ž˜๋ ธ๋‹ค. ๋ชจ๋‘ ๋‚˜์˜ค๊ฒŒ ํ•˜๋ ค๋ฉด?
  • 18.
    boundary - ERE ๏ถ word ๊ฒฝ๊ณ„ ๊ฒ€์ƒ‰์— ์‚ฌ์šฉ b boundary๊ฐ€ ๋งž๋Š” ํ‘œํ˜„์‹๋งŒ ์ฐพ์Šต๋‹ˆ๋‹ค. (๋‹จ์–ด ๊ฒฝ๊ณ„๋ฉด ๊ฒ€์ƒ‰) B boundary์— ๋งž์ง€ ์•Š๋Š” ํ‘œํ˜„์‹๋งŒ ์ฐพ์Šต๋‹ˆ๋‹ค. (๋‹จ์–ด ๊ฒฝ๊ณ„๋ฉด์ด ์•„๋‹Œ ๊ฒฝ์šฐ๋งŒ ๊ฒ€์ƒ‰) $ var3="abc? <def> 123hijklm" $ echo $var3 | egrep -o "[a-j]+" $ echo $var3 | egrep --color "B[a-j]+B" abc? <def> 123hijklm abc def hij $ echo $var3 | egrep --color "b[a-j]+b" abc? <def> 123hijklm
  • 19.
    REGEX and PCRE ๏ถ POSIX REGEX ๏ต ๊ฐ„๋‹จํ•œ ํŒจํ„ด ๋งค์นญ์— ์‚ฌ์šฉ๋œ๋‹ค. ๏ต ํŒจํ„ด์˜ ๋ณต์žกํ•จ์ด ๋Š˜์–ด๋‚˜๋ฉด ์„ฑ๋Šฅ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒ. ๏ต ์ฒ˜์Œ์—” ๊ผญ POSIX REGEX๋ถ€ํ„ฐ ํ•™์Šตํ•ด์•ผ๋งŒ ํ•œ๋‹ค.- Standard๋‹ˆ๊นŒ! ๏ถ PCRE (Perl Compatible Regular Expr.) ๏ต perl์—์„œ ํŒŒ์ƒ๋œ ํ™•์žฅ๋œ ์ •๊ทœํ‘œํ˜„์‹ ๏ต ๋งค์šฐ ๋น ๋ฅธ ์†๋„, ํ™•์žฅ๋œ ํ‘œํ˜„์‹์—... ๏ต C, C++, ๊ธฐํƒ€ ๋Œ€๋ถ€๋ถ„์˜ ์–ธ์–ด๊ฐ€ ์ง€์›ํ•œ๋‹ค. (์ถ”๊ฐ€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ์ œ๊ณต) ๏ต ์‹ค๋ฌด๋ผ๋ฉด PCRE๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํŽธ์ด ๋‚ซ๋‹ค.