^Regular Expressions is one of those tools that every developer should have in their toolbox. You can do your job without regular expressions, but knowing when and how to use them will make you a much more efficient and marketable developer. You'll learn how regular expressions can be used for validating user input, parsing text, and refactoring code. We'll also cover various tools that can be used to help you write and share expressions.$
^Regular Expressions is one of those tools that every developer should have in their toolbox. You can do your job without regular expressions, but knowing when and how to use them will make you a much more efficient and marketable developer. You'll learn how regular expressions can be used for validating user input, parsing text, and refactoring code. We'll also cover various tools that can be used to help you write and share expressions.$
Introduction of basic building blocks in regular expressions example repetition token, anchor token, character token etc. Includes some challenges. (solutions included as well)
Become a deft manipulator of text data. Regular Expression is the miracle of text extraction. If you got a text patten in mind, you can write your own pattern match in regular expression.
Given presentation tell us about string, string matching and the navie method of string matching. Well this method has O((n-m+1)*m) time complexicity. It also tells the problem with naive approach and gives list of approaches which can be applied to reduce the time complexicity
n this presentation, Manoj K. has talked about “Regular Expression”. Here he has explained how Regular Expressions are used. He has covered all of the codes and what they are used for. The goal is to teach you how to use regular expressions once and for all.
----------------------------------------------------------
Get Socialistic
Our website: http://valuebound.com/
LinkedIn: http://bit.ly/2eKgdux
Facebook: https://www.facebook.com/valuebound/
Twitter: https://twitter.com/valuebound
Introduction of basic building blocks in regular expressions example repetition token, anchor token, character token etc. Includes some challenges. (solutions included as well)
Become a deft manipulator of text data. Regular Expression is the miracle of text extraction. If you got a text patten in mind, you can write your own pattern match in regular expression.
Given presentation tell us about string, string matching and the navie method of string matching. Well this method has O((n-m+1)*m) time complexicity. It also tells the problem with naive approach and gives list of approaches which can be applied to reduce the time complexicity
n this presentation, Manoj K. has talked about “Regular Expression”. Here he has explained how Regular Expressions are used. He has covered all of the codes and what they are used for. The goal is to teach you how to use regular expressions once and for all.
----------------------------------------------------------
Get Socialistic
Our website: http://valuebound.com/
LinkedIn: http://bit.ly/2eKgdux
Facebook: https://www.facebook.com/valuebound/
Twitter: https://twitter.com/valuebound
This presentation explores and discusses the practical and useful of Regular Expressions covering username validation, complex and strong password validation, password strength checker, email validation, and finally image file extension validation.
Regular Expressions: every developer's best friend and worst nightmare! Join Andrei Zmievski, PHP developer and author of the PHP Regex (PCRE) extension, on a journey that will take you from your first steps into the world of regular expressions to the mastery of this most useful of tools. A must for everyone who's ever wondered what /(?=\d+)bar/ means.
Oracle database supports perl- and POSIX-compatible regular expressions with five elegant and powerful functions: REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_INSTR, REGEXP_LIKE, and REGEXP_COUNT.
This session will demonstrate their nuances and how to use them effectively for data cleansing, manipulation and selection, for validating things such as Social Security Numbers, credit cards, IP addresses, phone numbers, DNAs, XMLs, for extracting things such as email-ids, hostnames from URLs and strings, and for transposing delimited columns to rows. There will be a demo of a few tricky examples taken from forums.oracle.com and asktom.oracle.com.
The session will end with fuzzy matching and optimization techniques, and things to watch out for.
http://docs.oracle.com/cd/E11882_01/appdev.112/e25518/adfns_regexp.htm
Have you been scared off by Klingon-looking one-liners in Perl? Do you resort to writing complicated recursive functions just to parse some HTML? Don't!
I'll demystify regular expressions and show you how best to do them in PHP. We'll cover the syntax and functions that make PHP a great text-parsing language, and give you the foundation to learn more.
As a bonus, I'll give you two cases people often use as examples for regexes that PHP gives you better native ways to accomplish.
Do you have data and lists you keep having to massage to make it useful for your project? Have you heard of regular expressions but been frightened by the Klingon-looking examples? Fear no longer!
I’ll demystify regular expressions and show you how best to do them in PHP. We’ll cover the syntax and functions that make PHP a great text-parsing language, and give you the foundation to learn more.
As a bonus, I’ll give you two cases people often use as examples for regexes that PHP gives you better native ways to accomplish.
Poznań Ruby User Group presentation, which took place 31.03.2022.
Regex is used for google analytics and much more!
We dig into ruby amenities of default Regexp class, but go beyond it (so it's for more advanced developers):
- explaining how backtracking and deterministic finate automatom works.
- why to reach RE2 library from google? or regular expressions at all
Don't Fear the Regex - CapitalCamp/GovDays 2014Sandy Smith
Have you been scared off by Klingon-looking one-liners in Perl? Do you resort to writing complicated recursive functions just to parse some HTML? Don't!
I'll demystify regular expressions and show you how best to do them in PHP. We'll cover the syntax and functions that make PHP a great text-parsing language, and give you the foundation to learn more.
As a bonus, I'll give you two cases people often use as examples for regexes that PHP gives you better native ways to accomplish.
Given at CapitalCamp & GovDays 2014
PT.BUZOO INDONESIA is No1 Japanese offshore development company in Indonesia.
We are professional of web solution and smartphone apps. We can support Japanese, English and Indonesia.
We are hiring now at http://buzoo.co.id/
Don't Fear the Regex - Northeast PHP 2015Sandy Smith
Have you been scared off by Klingon-looking one-liners in Perl? Do you resort to writing complicated recursive functions just to parse some HTML? Don’t!
I’ll demystify regular expressions and show you how best to do them in PHP. We’ll cover the syntax and functions that make PHP a great text-parsing language, and give you the foundation to learn more.
As a bonus, I’ll give you two cases people often use as examples for regexes that PHP gives you better native ways to accomplish.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
3. TABLE OF CONTENT
1. Introduction
2. Literal Characters
3. First Look at How a Regex Engine Works
4. Character Classes or Character Sets
5. The Dot Matches (Almost) Any Character .
6. Start of String and End of String Anchors
7. Word Boundaries
8. Alternation with The Vertical Bar or Pipe Symbol
9. Optional Items
10. Repetition with Star and Plus
11. Use Round Brackets for Grouping
3
4. INTRODUCTION
A regular expression (regex or regexp) is a special text
string for describing a search pattern, validate data.
A regular expression “engine” is a piece of software that
can process regular expressions.
There are many software applications and programming
languages that support regular expressions.
4
5. .NET 1.0–4.5
Java 4–8
Perl 5.8–5.18
PCRE C library = Perl Compatible Regular Expressions
PHP
Delphi
R
JavaScript
VBScript
XRegExp
Python
Ruby
Tcl ARE
POSIX BRE
• similar to the one used by the traditional UNIX grep command
• most metacharacters require a backslash to give the
metacharacter a{1,2} matches a or aa
POSIX ERE
• similar to the one used by the UNIX egrep command
• Quantifiers ?, +, {n}, {n,m} and {n,}
• Backreferences
• Alternation
GNU BRE
GNU ERE
Oracle
XML
XPath
5
EditPrO
6. INTRODUCTION(CONT.)
Advantage:
Reducing development time for Programmer
Fast executing
Ex : reg(ular expressions?|ex(p|es)?)
regular expressions
regular
regexp
regexp
regexes
6
7. INTRODUCTION(CONT.)
Pattern matching an esentional problem
Many applications need to "parse" a input
1) URLs
2) Log Files:
3) XML
http://first.dk/index.php?id=141&view=details
13/02/2010 66.249.65.107 get /support.html
20/02/2010 42.116.32.64 post /search.html
protocol host path query-string
(list of key-value pairs)
<article>
<title>Three Models for
the...</title>
<author>Noam Chomsky</author>
<year>1956</year>
</article>
7
8. LITERAL CHARACTERS & SPECIAL
CHARACTERS
Literals
A single literal character, ex : «a»
“Jack is a boy”
Some literal characters
Apply «cat» to “He captured a catfish for his cat.”
Non-Printable Characters
«t » tab character (ASCII 0x09)
«r» for carriage return (0x0D)
«n» for line feed (0x0A)
…
1-انطباقایربتالشجرتوکن اولینکس
شتهر با:CباHشکست و!!
2-موجود جکسراز یدیگرجایگشت
شتهر دربعدیاکتررکالذانیست
3-تواغرس موفقانطباق مینرچها درکن
بعدی
4-میخوردشکستانطباق ششمیندرو
مینرچها درمیشودمتوجهموفی بررسق
م ادامه اکتررکا پنجمیناز و نبوددهی
دهد
∕∕
8
9. LITERAL CHARACTERS & SPECIAL
CHARACTERS (CONT.)
Reserve certain characters for special use
Meaning char
Beginning of string ^ caret
End of string $ doller sign
Any character except newline . dot
Match 0 or more * star
Match 1 or more + plus
Match 0 or 1 ? Question mark
alternative | pipe symbol
Grouping; ”store” ( ) parenthesis
Special backslash
opening square bracket [
9
10. LITERAL CHARACTERS & SPECIAL
CHARACTERS (CONT.)
use any of these characters as a literal in a regex
If you forget to escape a special character
NOTE !
Most regular expression flavors treat the brace «{» as a literal
character, unless it is part of a repetition operator like « M{1,3}».
An exception to this rule is the java.util.regex
1+1=2 literal
123+111=234 other meaning
+1=2 ERROR
1+1=2
10
11. LITERAL CHARACTERS & SPECIAL
CHARACTERS (CONT.)
Q...E escape sequence
E.g. «Q*d+*E» matches the literal text „*d+*”.
o Special Characters and Programming Languages
Compiler will turn the escaped backslash in the source
code into a single backslash in the string that is passed
on to the regex library
The regex «1+1=2» as “1+1=2”
The regex «c:temp» as “c:temp”
compiler Regex lib
11
12. FIRST LOOK AT HOW A REGEX ENGINE
WORKS INTERNALL
Two kinds of regular
expression engines:
text-directed- DFA
regex-directed- NFA
awk, egrep, flex, lex,
MySQL are text-directed
A few of versions are regex-
directed
The regex directed is
more powerful.
12
13. TABLE OF CONTENT
1. Introduction
2. Literal Characters
3. First Look at How a Regex Engine Works
4. Character Classes or Character Sets
5. The Dot Matches (Almost) Any Character .
6. Start of String and End of String Anchors
7. Word Boundaries
8. Alternation with The Vertical Bar or Pipe Symbol
9. Optional Items
10. Repetition with Star and Plus
11. Use Round Brackets for Grouping
13
14. CHARACTER CLASSES OR CHARACTER
SETS
To match only one out of several characters
«gr[ae]y» = > „ gray” or „grey” (for both American or British English )
Using a hyphen inside a character class to specify a
range of characters.
[0-9a-fA-F]
Useful Application
Find a word, even if it is misspelled => «sep[ae]r[ae]te»
Find an identifier => «[A-Za-z_][A-Za-z_0-9]*»
Find a C-style hexadecimal number => «0[xX][A-Fa-f0-9]+»
14
15. NEGATED CHARACTER CLASSES
[^ …]
Match any character that is not in the character
class.
«q[^u]» : “a q followed by a character that is not a
u”. => Iraq is a country
Negated character class still must match a
character. [^abc] => fdgha
Unlike the dot, negated character classes also
match line break characters. 15
16. METACHARACTERS INSIDE CHARACTER
CLASSES
Metacharacters inside a character class are:
The closing bracket ( ] )=>(])
The backslash ()=> []
The caret (^) => [^]
The hyphen (-)=> [-]
[+*]=[+*] -- > reducing readability
Other Solutions:
Placing them in a position where they do not take on their
special meaning.
Closing bracket right after the opening bracket []x]
Caret anywhere except right after the opening bracket [x^]
Hyphen any where except middle [-x] 16
17. METACHARACTERS INSIDE CHARACTER
CLASSES
All non-printable characters in character classes
just like outside of character classes.
E.g. [$u20AC] : dollar or euro sign
Perl and PCRE also support the Q...E sequence
inside character classes
E.g. «[Q[-]E]» matches „[”, „-” or „]”.
POSIX regular expressions treat the backslash as a
literal character inside character classes.
Can’t use backslashes to escape
So just use in correct position
17
18. SHORTHAND CHARACTER CLASSES
Both inside and outside the square brackets are
used
Ex: 1 + 2 = 3
sd=whitespace followed by a digit “ 2”
[sd]=whitespace or digit “1 2 3”
Class Meaninig
w Word character, [a-zA-z0-9_].
d Digit character, [0-9].
s Whitespace character, [ nrt ].
W Non-word character, [^a-zA-z0-9_] =[^w]
D Non-digit character, [^0-9]=[^d]
S Non-whitespace character, [^ nrft ]=[^s]
18
19. NEGATED SHORTHAND CHARACTER
CLASSES
«[DS]» is not the same as «[^ds]».
[^ds]= any char that is not a digit or whitespace.
123 5] ⌐(a U b )
[DS] =any char that is either not a digit, or is not
whitespace.
123 5] ⌐ a U ⌐b
19
20. REPEATING CHARACTER CLASSES
By using the «?», «*» or «+» operators
«[0-9]+» “833337” „222” …
For repeating the matched character, rather than
the class we need “backreferences”
«([0-9])1+»
will match „222”
will match „3333” for “833337 “
20
21. LOOKING INSIDE THE REGEX ENGINE
The order inside a character class does not matter
Ex : «gr[ae]y» “Is his hair grey or gray?”
1. Failing to match “g” every 12 steps
2. „g” is matched in 13th step
3. Matching “r” token in the regex with “r” in text
4. Failing to match “a“ token with “e”
5. Try to match other permutations of the regex
pattern
6. Matching the last regex token with “y” in text
7. the leftmost match was returned : grey
21
22. TABLE OF CONTENT
1. Introduction
2. Literal Characters
3. First Look at How a Regex Engine Works
4. Character Classes or Character Sets
5. The Dot Matches (Almost) Any Character .
6. Start of String and End of String Anchors
7. Word Boundaries
8. Alternation with The Vertical Bar or Pipe Symbol
9. Optional Items
10. Repetition with Star and Plus
11. Use Round Brackets for Grouping
22
23. THE DOT MATCHES (ALMOST) ANY
CHARACTER
The most commonly misused metacharacter.
The dot will not match a newline character by
default (Why)?
«[^n]» (UNIX regex flavors)
«[^rn]» (Widows regex flavors)
In Perl, the mode where the dot also matches
newlines is called "single-line mode“
In .NET framework “Regex.Match("string", "regex",
RegexOptions.Singleline)”
JavaScript and VBScript do not have an option to
make the dot match line break characters : «[sS]»
23
24. USE THE DOT SPARINGLY
The dot is a very powerful regex metacharacter
It allows you to be lazy ,Ex: mm/dd/yy format
Solutions:
• dd.dd.dd 02512703
• dd[- /.]dd[- /.]dd 99/99/99
• (0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01]) [- .](19|20)dd
09/31/2079
24
25. USE NEGATED CHARACTER SETS INSTEAD
OF THE DOT
star is greedy
Ex: we have a problem with "string one" and "string
two "
Regexp : ".*"
"string one" and "string two“
Regexp : "[^"rn]*"
"string one" "string two"
25
26. TABLE OF CONTENT
1. Introduction
2. Literal Characters
3. First Look at How a Regex Engine Works
4. Character Classes or Character Sets
5. The Dot Matches (Almost) Any Character .
6. Start of String and End of String Anchors
7. Word Boundaries
8. Alternation with The Vertical Bar or Pipe Symbol
9. Optional Items
10. Repetition with Star and Plus
11. Use Round Brackets for Grouping
26
27. START OF STRING AND END OF STRING
ANCHORS
Literals and class characters match a character
Anchors do not match any character at all. Instead,
they match a position
Caret «^» : «^a» to “abc”
Dolor sign «$» : «c$» to “abc”
Useful Application
For validating user input, using anchors is very important
if ($input =~ m/d+/) qsdf4ghjk => «^d+$» qsdf4ghjk 44467
27
28. USING ^ AND $ AS START OF LINE AND END
OF LINE ANCHORS
If you have a string consisting of multiple lines,Ex:
“ first linen second line”
In tools as EditPad Pro (work with entire files)
In Programming Languages
Perl : "multi-line mode“
m/^regex$/m
28
29. PERMANENT START OF STRING AND END OF
STRING ANCHORS
«A» : only ever matches at the start of the file
«Z» : only ever matches at the end of the file
Anchors match at a position, rather than matching a
character
Anchors can result in a zero-length match.
Since the match does not include any characters,
nothing is deleted in replcament
In VB.NET
Dim Quoted as String = Regex.Replace(Original, "^", "> ",
RegexOptions.Multiline)
29
30. TABLE OF CONTENT
1. Introduction
2. Literal Characters
3. First Look at How a Regex Engine Works
4. Character Classes or Character Sets
5. The Dot Matches (Almost) Any Character .
6. Start of String and End of String Anchors
7. Word Boundaries
8. Alternation with The Vertical Bar or Pipe Symbol
9. Optional Items
10. Repetition with Star and Plus
11. Use Round Brackets for Grouping
30
31. WORD BOUNDARIES
The metacharacter «b» is an anchor like ^“ ” & “$”
This match is zero-length.
Simply put: «b» allows you to perform a “whole
words only”
«b4b» matches a “4” 44 a4
2 positions :
Before the first & last word character
Between a word character and a non-word character
31
32. LOOKING INSIDE THE REGEX ENGINE
Ex: «bisb» string : “This island is beautiful”.
“b” matches position before “T”
Matching the next token: the literal «i»
The engine does not advance to the next character in the
string, because the previous regex token was zero-lenght, «i»
does not match “T”.
«b» can not match at the position between the “T” and
the “h”.
….
POSIX does not support word boundaries at all.
32
33. TABLE OF CONTENT
1. Introduction
2. Literal Characters
3. First Look at How a Regex Engine Works
4. Character Classes or Character Sets
5. The Dot Matches (Almost) Any Character .
6. Start of String and End of String Anchors
7. Word Boundaries
8. Alternation with The Vertical Bar or Pipe Symbol
9. Optional Items
10. Repetition with Star and Plus
11. Use Round Brackets for Grouping
33
34. ALTERNATION WITH THE VERTICAL BAR OR
PIPE SYMBOL
Similar to character classes to match a single
character
Remember That The Regex Engine Is Eager
It will stop searching as soon as it finds a valid match.
RE: Get|GetValue|Set|SetValue
Str : SetValue
What are solutions?
1-توکن اولینGو دادهانطباق لاو اکتررکابا
شکست!!
2-بعدی هایگزینه«یا»شکست و!!
3-بعدیتوکنSباSو داده قانطباشتهردر
موفق!تاادامه و«t»
4-،جیکسربودن مشتاق خاطربهSET
میشودبرگردونده 34
35. ALTERNATION WITH THE VERTICAL BAR OR
PIPE SYMBOL (CONT.)
Solutions are:
Changing the order of options
GetValue|Get|SetValue|Set
Using greedy feature of question mark ”?”
Get(Value)?|Set(Value)?
Using b
b(Get|GetValue|Set|SetValue)b
The POSIX standard mandates that the longest match be
returned, regardless if the regex an NFA or DFA algorithm.
35
36. TABLE OF CONTENT
1. Introduction
2. Literal Characters
3. First Look at How a Regex Engine Works
4. Character Classes or Character Sets
5. The Dot Matches (Almost) Any Character .
6. Start of String and End of String Anchors
7. Word Boundaries
8. Alternation with The Vertical Bar or Pipe Symbol
9. Optional Items
10. Repetition with Star and Plus
11. Use Round Brackets for Grouping
36
37. OPTIONAL ITEMS
“?” makes the preceding token in the regular
expression optional.
You can make several tokens optional by grouping
them together using round brackets
Feb(ruary)? 23(rd)?
„February 23rd”, „February 23”, „Feb 23rd” , „Feb 23”.
Important Regex Concept: Greediness
The engine will always try to match that part. Only if this
causes the entire regular expression to fail, will try
ignoring the part the question mark applies to. 37
38. LOOKING INSIDE THE REGEX ENGINE
EX: «colou?r» , Str: “The colonel likes the color
green”.
1. 5th char matches successfully from “c” to “o”
2. Checking wheather “u” matches “n” and fail
Question mark : failing is accesptable.
3. Next token , fails to match “n”.
4. starts again trying to match «c» to the first o in
“colonel”.
5. ….
38
39. TABLE OF CONTENT
1. Introduction
2. Literal Characters
3. First Look at How a Regex Engine Works
4. Character Classes or Character Sets
5. The Dot Matches (Almost) Any Character .
6. Start of String and End of String Anchors
7. Word Boundaries
8. Alternation with The Vertical Bar or Pipe Symbol
9. Optional Items
10. Repetition with Star and Plus
11. Use Round Brackets for Grouping
39
40. REPETITION WITH STAR AND PLUS
Valid HTML tag
«<[A-Za-z][A-Za-z0-9]*>»
«<[A-Za-z0-9]+>» „<1>”
Class Meaninig
* 0 or more
+ 1 or more
? 0 or 1
{3} Exactly 3
{3,} 3 or more
{3,5} 3, 4 or 5
Add a ? to a quantifier to make it ungreedy. 40
41. WATCH OUT FOR THE GREEDINESS!
EX: Matching HTML tag
This is a <EM>first</EM> test
<.+>
1. The first token in the regex is «<».
2. The next token is the dot, which matches any character except
newlines
3. The dot is repeated by the plus. The plus is greedy
4. The dot fails when the engine has reached the void after the
end of the string.
5. Engine continue with the next token «>» &can not match
6. The engine remembers that the plus has repeated the dot more
often than is required so it backtrack
7. It is reduced to „EM>first</EM> tes” and next token in the regex
is still «>»
8. It will continue for the first valid match (eager)
41
42. LAZINESS INSTEAD OF GREEDINESS
Lazy quantifiers are sometimes also called “ungreedy”
This is a <EM>first</EM> test
“<.+?>”
1. «<» matches the first „<” in the string
2. The next token is the dot, this time repeated by a lazy
plus
This tells the regex engine to repeat the dot as few times
as possible (MIN=1)
Matches “.” With “E”
3. Matches “>” with “M” and fails
• But this time, the backtracking will force the lazy plus to
expand
4. Return <EM> </EM>
42
43. LOOKING INSIDE THE REGEX ENGINE
Ex: <([A-Z][A-Z0-9]*)[^>]*>.*?</1>
Str: “Testing <B><I>bold italic</I></B> text”
1. Matching at the first „<”
2. «[A-Z]» matches „B” & advances to «[A-Z0-9]» and “>”
3. This match fails. However, because of the star, that’s
perfectly fine
4. Storing what was matched inside them, „B” is stored
5. The regex is advanced to «[^>]» & string remains at “>” &
go to 3
6. Matching “>” with “>”
7. The next token is a dot, repeated by a lazy star
43
44. AN ALTERNATIVE TO LAZINESS
An option for making the plus lazy instead of
backtracking
Greedy plus and a negated character class
<EM>first</EM>
«<[^>]+>»
Backtracking slows down the regex engine
you will save plenty of CPU cycles when using such a
regex
44
45. REPEATING Q...E ESCAPE SEQUENCE
«Q*d+*E+»
In Perl : “*d+**d+*”
In java : “*d+**d+*”
If you want Java to return the same match as Perl
«Q*d+E*+»
If you want Perl to repeat the whole sequence like
Java does
«(Q*d+*E)+»
45
46. TABLE OF CONTENT
1. Introduction
2. Literal Characters
3. First Look at How a Regex Engine Works
4. Character Classes or Character Sets
5. The Dot Matches (Almost) Any Character .
6. Start of String and End of String Anchors
7. Word Boundaries
8. Alternation with The Vertical Bar or Pipe Symbol
9. Optional Items
10. Repetition with Star and Plus
11. Use Round Brackets for Grouping
46
47. USE ROUND BRACKETS FOR GROUPING
Grouping the part of the regular expression
together for applying a regex operator
Creating a Backreference
reuses part of the regex match
slows down the regex engine
Optimize this regular expression into «Set(?:Value)?
How to Use Backreferences
abc5abc
)[abc])+51$ 1=> a b c - > abbc5c
<div>hello</div>
<([a-z]*)>.*</1> 47
48. REPETITION AND BACKREFERENCES
([abc]+)» & «([abc])+» to “cab” string
([abc]+)» : “cab” to be referenced
«([abc])+» : “b” to be referenced
48
49. USE ROUND BRACKETS FOR GROUPING
(CONT.)
Reusing the same backreference more than once.
([a-c])x1x1» „axaxa” „bxbxb” „cxcxc”
Backreferences Cannot be used inside itself.
([abc]1)
Round brackets Cannot be used inside character
classes, as metacharacters.
(a)[1b]
Useful Example: Checking for Doubled Words
«b(w+)s+1b»
49
51. ٍEXAMPLE 1
Example 1. Beginning of line ( ^ )
grep "^Nov 10" messages.1
Example 2. End of the line ( $)
grep "terminating.$" messages
.
Nov 10 01:12:55 gs123 ntpd[2241]: time reset +0.177479 s
Nov 10 01:17:17 gs123 ntpd[2241]: synchronized to OCAL(0)
Nov 10 01:18:49 gs123 ntpd[2241]: synchronized to 15.1.13.13
Jul 12 17:01:09 cloneme kernel: Kernel log daemon terminating.
Oct 28 06:29:54 cloneme kernel: Kernel log daemon terminating
51
52. EXAMPLE 2
Example 3. quantifier (*)(+)(?)
[hc]*at =cchat,hcat,hhhat,at 0 or more
[hc]+at= ccchat, hcat, No at 1 or more
[hc]?at= hat, cat, at 0 or 1
Example 4.Escaping the special character ()
grep "127.0.0.1" /var/log/messages.4
Oct 28 06:31:10 btovm871 ntpd[2241]: Listening on interface lo,
127.0.0.1#123 Enabled
52
53. ٍEXAMPLE 3
Example 5.Excluding specific characters
الف:
Match text hog
Match text dog ---- > [^b]og
Skip Text bog
ب:غیر یاکتررکاaیاbیاc
[^abc]
abccc
adb
gh
53
54. EXAMPLE 4
Example 6. Composite syntax
است یرز شرح به الگاطالعات:
Sun Jun 4 22:08:39 2006 [pid 21611] [dcid] OK
LOGIN: Client “192.168.1.1”
^w+sw+sd+ S+ d+ [pid d+]s [(w+)] OK LOGIN:
Client “(d+.d+.d+.d+)”$
54
Editor's Notes
Portable Operating System Interface for uniX
When applying «cat» to “He captured a catfish for his cat.”, the engine will try to match the first
token in the regex «c» to the first character in the match “H”. This fails. There are no other possible
permutations of this regex, because it merely consists of a sequence of literal characters. So the regex engine
tries to match the «c» with the “e”. This fails too, as does matching the «c» with the space. Arriving at the 4th
character in the match, «c» matches „c”. The engine will then try to match the second token «a» to the 5th
character, „a”. This succeeds too. But then, «t» fails to match “p”. At that point, the engine knows the regex
cannot be matched starting at the 4th character in the match. So it will continue with the 5th: “a”. Again, «c»
fails to match here and the engine carries on. At the 15th character in the match, «c» again matches „c”. The
engine then proceeds to attempt to match the remainder of the regex at character 15 and finds that «a»
matches „a” and «t» matches „t”.
The entire regular expression could be matched starting at character 15. The engine is "eager" to report a
match. It will therefore report the first three letters of catfish as a valid match. The engine never proceeds
beyond this point to see if there are any “better” matches
If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash. If
you want to match „1+1=2”, the correct regex is «1+1=2». Otherwise, the plus sign will have a special
meaning.
Note that «1+1=2», with the backslash omitted, is a valid regex. So you will not get an error message. But it
will not match “1+1=2”. It would match „111=2” in “123+111=234”, due to the special meaning of the plus
character.
If you forget to escape a special character where its use is not allowed, such as in «+1», then you will get an
error message.
on POSIX bracket
expressions for more information.
ایزارهای قدیمی خط به خط یک فایل را میخواندند و ریجگس را به آن اعمال میکردند لذا خط جدید متج نمیشد ولی بعدا ابزارهای مدرن کل فایل
In the date-matching example, we improved our regex by replacing the dot with a character class. Here, we
will do the same. Our original definition of a double-quoted string was faulty. We do not want any number of
any character between the quotes. We want any number of characters that are not double quotes or newlines
between the quotes. So the proper regex is «"[^"
]*"».
Thus far, I have explained literal characters and character classes. In both cases, putting one in a regex will
cause the regex engine to try to match a single character.
Anchors are a different breed. They do not match any character at all. Instead, they match a position before,
after or between characters. They can be used to “anchor” the regex match at a certain position. The caret «^»
matches the position before the first character in the string. Applying «^a» to “abc” matches „a”. «^b» will
not match “abc” at all, because the «b» cannot be matched right after the start of the string, matched by «^».
See below for the inside view of the regex engine.
Similarly, «$» matches right after the last character in the string. «c$» matches „c” in “abc”, while «a$» does
not match at all.