## What's hot

Textpad and Regular Expressions
Textpad and Regular Expressions
OCSI

Array and functions
Array and functions
Sun Technlogies

Common fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spaces
Alexander Decker

Andrei's Regex Clinic
Andrei's Regex Clinic
Andrei Zmievski

2.regular expressions
2.regular expressions
Praveen Gorantla

Java căn bản - Chapter9
Java căn bản - Chapter9
Vince Vo

Strong convergence of an algorithm about strongly quasi nonexpansive mappings
Strong convergence of an algorithm about strongly quasi nonexpansive mappings
Alexander Decker

Regular expressions
Regular expressions
Thomas Langston

Deriving the Y Combinator
Deriving the Y Combinator
Yuta Okazaki

Regex Basics
Regex Basics
Jeremy Coates

Python advanced 2. regular expression in python
Python advanced 2. regular expression in python
John(Qiang) Zhang

Bd32360363
Bd32360363
IJERA Editor

Python - Regular Expressions
Python - Regular Expressions
Mukesh Tekwani

Java DSLs with Xtext
Java DSLs with Xtext
Dr. Jan Köhnlein

Continuation calculus at Term Rewriting Seminar
Continuation calculus at Term Rewriting Seminar
bgeron

Cwkaa 2010
Cwkaa 2010
Sam Neaves

Master of Computer Application (MCA) – Semester 4 MC0080
Master of Computer Application (MCA) – Semester 4 MC0080
Aravind NC

Processing Regex Python
Processing Regex Python
primeteacher32

Quiz 1 solution
Quiz 1 solution
Gopi Saiteja

H-MLQ

### What's hot(20)

Textpad and Regular Expressions
Textpad and Regular Expressions

Array and functions
Array and functions

Common fixed point theorems of integral type in menger pm spaces
Common fixed point theorems of integral type in menger pm spaces

Andrei's Regex Clinic
Andrei's Regex Clinic

2.regular expressions
2.regular expressions

Java căn bản - Chapter9
Java căn bản - Chapter9

Strong convergence of an algorithm about strongly quasi nonexpansive mappings
Strong convergence of an algorithm about strongly quasi nonexpansive mappings

Regular expressions
Regular expressions

Deriving the Y Combinator
Deriving the Y Combinator

Regex Basics
Regex Basics

Python advanced 2. regular expression in python
Python advanced 2. regular expression in python

Bd32360363
Bd32360363

Python - Regular Expressions
Python - Regular Expressions

Java DSLs with Xtext
Java DSLs with Xtext

Continuation calculus at Term Rewriting Seminar
Continuation calculus at Term Rewriting Seminar

Cwkaa 2010
Cwkaa 2010

Master of Computer Application (MCA) – Semester 4 MC0080
Master of Computer Application (MCA) – Semester 4 MC0080

Processing Regex Python
Processing Regex Python

Quiz 1 solution
Quiz 1 solution

H-MLQ
H-MLQ

## Similar to regex-presentation_ed_goodwin

Eag 201110-hrugregexpresentation-111006104128-phpapp02
Eag 201110-hrugregexpresentation-111006104128-phpapp02
egoodwintx

Regular expressions
Regular expressions
Raj Gupta

Regular_Expressions.pptx
Regular_Expressions.pptx
DurgaNayak4

Regular expressions in Python
Regular expressions in Python
Sujith Kumar

Maxbox starter20
Maxbox starter20
Max Kleiner

Regular expressions in oracle
Regular expressions in oracle
Logan Palanisamy

Text Mining using Regular Expressions
Text Mining using Regular Expressions
Rupak Roy

Adv. python regular expression by Rj
Adv. python regular expression by Rj
Shree M.L.Kakadiya MCA mahila college, Amreli

arrays.pptx
arrays.pptx
SachinBhosale73

Regular Expressions Cheat Sheet
Regular Expressions Cheat Sheet
Akash Bisariya

11. using regular expressions with oracle database
11. using regular expressions with oracle database
Amrit Kaur

Intoduction to php strings
Regular Expressions in PHP, MySQL by programmerblog.net
Regular Expressions in PHP, MySQL by programmerblog.net
Programmer Blog

Module 3 - Regular Expressions, Dictionaries.pdf
Module 3 - Regular Expressions, Dictionaries.pdf
GaneshRaghu4

PHP Web Programming
PHP Web Programming
Muthuselvam RS

Underscore.js
Underscore.js
timourian

Unit 1-array,lists and hashes
Unit 1-array,lists and hashes
sana mateen

Regular expressionfunction
Regular expressionfunction

SQL for pattern matching (Oracle 12c)
SQL for pattern matching (Oracle 12c)
Logan Palanisamy

3 Data Structure in R
3 Data Structure in R
Dr Nisha Arora

### Similar to regex-presentation_ed_goodwin(20)

Eag 201110-hrugregexpresentation-111006104128-phpapp02
Eag 201110-hrugregexpresentation-111006104128-phpapp02

Regular expressions
Regular expressions

Regular_Expressions.pptx
Regular_Expressions.pptx

Regular expressions in Python
Regular expressions in Python

Maxbox starter20
Maxbox starter20

Regular expressions in oracle
Regular expressions in oracle

Text Mining using Regular Expressions
Text Mining using Regular Expressions

Adv. python regular expression by Rj
Adv. python regular expression by Rj

arrays.pptx
arrays.pptx

Regular Expressions Cheat Sheet
Regular Expressions Cheat Sheet

11. using regular expressions with oracle database
11. using regular expressions with oracle database

Intoduction to php strings
Intoduction to php strings

Regular Expressions in PHP, MySQL by programmerblog.net
Regular Expressions in PHP, MySQL by programmerblog.net

Module 3 - Regular Expressions, Dictionaries.pdf
Module 3 - Regular Expressions, Dictionaries.pdf

PHP Web Programming
PHP Web Programming

Underscore.js
Underscore.js

Unit 1-array,lists and hashes
Unit 1-array,lists and hashes

Regular expressionfunction
Regular expressionfunction

SQL for pattern matching (Oracle 12c)
SQL for pattern matching (Oracle 12c)

3 Data Structure in R
3 Data Structure in R

## More from schamber

Poster
Poster
schamber

Poster
Poster
schamber

Chamberlain PhD Thesis
Chamberlain PhD Thesis
schamber

Phylogenetics in R
Phylogenetics in R
schamber

Web data from R
Web data from R
schamber

R Introduction
R Introduction
schamber

### More from schamber(6)

Poster
Poster

Poster
Poster

Chamberlain PhD Thesis
Chamberlain PhD Thesis

Phylogenetics in R
Phylogenetics in R

Web data from R
Web data from R

R Introduction
R Introduction

Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely

Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
DianaGray10

EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai

Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Nicolás Lopéz

Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
ldtexsolbl

Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
ankush9927

Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer

Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
BrainSell Technologies

Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx

Step-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From Scratch
softsuave

Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris

Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success
David Wilson

Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software

Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta

Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase
shyamraj55

leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
alexjohnson7307

Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire

The Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
Arpan Buwa

Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
Baishakhi Ray

(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
Priyanka Aash

Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...

Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making
Connector Corner: Leveraging Snowflake Integration for Smarter Decision Making

EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase

Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024

Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology

Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10

Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster

Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx

Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx

Step-By-Step Process to Develop a Mobile App From Scratch
Step-By-Step Process to Develop a Mobile App From Scratch

Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...

Mastering OnlyFans Clone App Development: Key Strategies for Success
Mastering OnlyFans Clone App Development: Key Strategies for Success

Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data

Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx

Integrating Kafka with MuleSoft 4 and usecase
Integrating Kafka with MuleSoft 4 and usecase

leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...

Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...

The Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and Cities

Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development

(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf

### regex-presentation_ed_goodwin

• 1. Regular Expressions in R Houston R Users Group 10.05.2011 Ed Goodwin twitter: @egoodwintx
• 2. What is a Regular Expression? Regexes are an extremely ﬂexible tool for ﬁnding and replacing text. They can easily be applied globally across a document, dataset, or speciﬁcally to individual strings.
• 3. Example Data LastName, FirstName, Address, Phone Baker, Tom, 123 Unit St., 555-452-1324 Smith, Matt, 456 Tardis St., 555-326-4567 Tennant, David, 567 Torchwood Ave., 555-563-8974 Regular Expression to Convert “St.” to “Street” gsub(“St.”, “Street”, data[i]) *Note the double-slash “” to escape the ‘.’
• 4. Beneﬁts of Regex • Flexible (can be applied globally or speciﬁcally across data) • Terse (very powerful scripting template) • Portable (sort of) across languages • Rich history
• 5. Disadvantages of regex • Non-intuitive • Easy to make errors (unintended consequences) • Difﬁcult to robustly debug • Various ﬂavors may cause portability issues.
• 6. Why do this in R? • Easier to locate all code in one place • (Relatively) Robust regex tools • May be the only tool available • Familiarity
• 7. Other alternatives? • Perl • Python • Java • Ruby • Others (grep, sed, awk, bash, csh, ksh, etc.)
• 8. Components of a Regular Expression • Characters • Metacharacters • Character classes
• 9. The R regex functions Function Purpose breaks apart strings at predeﬁned points strsplit() returns a vector of indices where a grep() pattern is matched returns a logical vector (TRUE/FALSE) grepl() for each element of the data replaces one pattern with another at sub() ﬁrst matching location replaces one pattern with another at gsub() every matching location returns an integer vector giving the starting position of regexpr() the ﬁrst match, along with a match.length attribute giving the length of the matched text. returns an integer vector giving the starting position of gregexpr() the all matches, along with a match.length attribute giving the length of the matched text. Note: all functions are in the base package
• 10. Metacharacter Symbols Modiﬁer Meaning ^ anchors expression to beginning of target \$ anchors expression to end of target . matches any single character except newline | separates alternative patterns [] accepts any of the enclosed characters [^] accepts any characters but the ones enclosed in brackets () groups patterns together for assignment or constraint * matches zero or more occurrences of preceding entity ? matches zero or one occurrences of preceding entity + matches one or more occurrences of preceding entity {n} matches exactly n occurrences of preceding entity {n,} matches at least n occurrences of preceding entity {n,m} matches n to m occurrences of preceding entity interpret succeeding character as literal Source: “Data Manipulation with R”. Spector, Phil. Springer, 2008. page 92.
• 11. Examples [A-Za-z]+ matches one or more alphabetic characters .* matches zero or more of any character up to the newline .*.* matches zero or more characters followed by a literal .* (July? ) Accept ‘Jul’ or ‘July’ but not ‘Julyy’. Note the space. (abc|123) Match “abc” or “123” [abc|123] Match a, b, c, 1, 2 or 3.The ‘|’ is extraneous. Matches lines starting with “From:” or “Subject:” or ^(From|Subject|Date): “Date:”
• 12. Let’s work through some examples... Data LastName, FirstName, Address, Phone Baker, Tom, 123 Unit St., 555-452-1324 Smith, Matt, 456 Tardis St., 555-326-4567 Tennant, David, 567 Torchwood Ave., 555-563-8974 1. Locate all phone numbers. 2. Locate all addresses. 3. Locate all addresses ending in ‘Street’ (including abbreviations). 4. Read in full names, reverse the order and remove the comma.
• 13. So how would you write the regular expression to match a calendar date in format “mm/dd/yyyy” or “mm.dd.yyyy”?
• 14. Regex to identify date format? What’s wrong with “[0-9]{2}(.|/)[0-9]{2}(.|/)[0-9]{4}” ? Or with “[1-12](.|/)[1-31](.|/)[0001-9999]” ?
• 15. Dates are not an easy problem because they are not a simple text pattern Best bet is to validate the textual pattern (mm.dd.yyyy) and then pass to a separate function to validate the date (leap years, odd days in month, etc.) “^(1[0-2]|0[1-9])(.|/)(3[0-1]|[1-2][0-9]|0[1-9])(.|/) ([0-9]{4})\$”
• 16. Supported ﬂavors of regex in R • POSIX 1003.2 • Perl Perl is the more robust of the two. POSIX has a few idiosyncracies handling ‘’ that may trip you up.
• 17. Usage Patterns • Data validation • String replace on dataset • String identify in dataset (subset of data) • Pattern arithmetic (how prevalent is string in data?) • Error prevention/detection
• 18. The R regex functions Function Purpose breaks apart strings at predeﬁned points strsplit() returns a vector of indices where a grep() pattern is matched returns a logical vector (TRUE/FALSE) grepl() for each element of the data replaces one pattern with another at sub() ﬁrst matching location replaces one pattern with another at gsub() every matching location returns an integer vector giving the starting position of regexpr() the ﬁrst match, along with a match.length attribute giving the length of the matched text. returns an integer vector giving the starting position of gregexpr() the all matches, along with a match.length attribute giving the length of the matched text. Note: all functions are in the base package
• 19. strsplit( ) Definition: strsplit(x, split, fixed=FALSE, perl=FALSE, useBytes=FALSE) Example: str <- “This is some dummy data to parse x785y8099” strsplit(str, “[ xy]”, perl=TRUE) Result: [[1]] [1] "This" "is" "some" "dumm" "" "data" "to" "parse" "" [10] "785" "8099"
• 20. grep( ) Definition: grep(pattern, x, ignore.case=FALSE, perl=FALSE, value=FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE) Example: str <- “This is some dummy data to parse x785y8099” grep(“[a-z][0-9]{3}[a-z][0-9]{4}”, str, perl=TRUE, value=TRUE) Result: [1] "This is some dummy data to parse x785y8099"
• 21. grepl( ) Definition: grepl(pattern, x, ignore.case=FALSE, perl=FALSE, value=FALSE,fixed = FALSE, useBytes = FALSE, invert = FALSE) Example: str <- “This is some dummy data to parse x785y8099” grepl(“[a-z][0-9]{3}[a-z][0-9]{4}”, str, perl=TRUE) Result: [1] TRUE
• 22. sub( ) Definition: sub(pattern, replacement, x, ignore.case=FALSE, perl=FALSE, fixed=FALSE, useBytes=FALSE) Example: str <- “This is some dummy data to parse x785y8099” sub("dummy(.* )([a-z][0-9]{3}).([0-9]{4})", "awesome12H3", str, perl=TRUE) Result: [1] "This is some awesome data to parse x785H8099"
• 23. gsub( ) Definition: gsub(pattern, replacement, x, ignore.case=FALSE, perl=FALSE,fixed=FALSE, useBytes=FALSE) Example: str <- “This is some dummy data to parse x785y8099 you dummy” gsub(“dummy”, “awesome”, perl=TRUE) Result: [1] "This is some awesome data to parse x785y8099 you awesome"
• 24. regexpr( ) Definition: regexpr(pattern, text, ignore.case=FALSE, perl=FALSE, fixed = FALSE, useBytes = FALSE) Example: duckgoose <- "Duck, duck, duck, goose, duck, duck, goose, duck, duck" regexpr("duck", duckgoose, ignore.case=TRUE, perl=TRUE) Result: [1] 1 attr(,"match.length") [1] 4
• 25. gregexpr( ) Definition: gregexpr(pattern, text, ignore.case=FALSE, perl=FALSE, fixed=FALSE, useBytes=FALSE) Example: duckgoose <- "Duck, duck, duck, goose, duck, duck, goose, duck, duck" regexpr("duck", duckgoose, ignore.case=TRUE, perl=TRUE) Result: [[1]] [1] 1 7 13 26 32 45 51 attr(,"match.length") [1] 4 4 4 4 4 4 4
• 26. Problem Solving & Debugging • Remember that regexes are greedy by default. They will try to grab the largest matching string possible unless constrained. • Dummy data - small datasets • Unit testing - testthis, etc. • Build up regex complexity incrementally
• 27. Best Practices for Regex in R • Store regex string as variable to pass to function • Try to make regex expression as exact as possible (avoid lazy matching) • Pick one type of regex syntax and stick with it (POSIX or Perl) • Document all regexes in code with liberal comments • use cat() to verify regex string • Test, test, and test some more
• 28. Regex Workﬂow • Deﬁne initial data pattern • Deﬁne desired data pattern • Deﬁne transformation steps • Incremental iteration to desired regex • Testing & QA
• 29. Regex Resources • http://regexpal.com/ - online regex tester • Data Manipulation with R. Spector, Phil. Springer, 2008. • Regular Expression Cheat Sheet. http:// www.addedbytes.com/cheat-sheets/regular-expressions- cheat-sheet/ • Regular Expressions Cookbook. Goyvaerts, Jan and Levithan, Steven. O’Reilly, 2009. • Mastering Regular Expressions. Friedl, Jeffrey E.F. O’Reilly, 2006. • Twitter: @RegexTip - regex tips and tricks
Current LanguageEnglish
Español
Portugues
Français
Deutsche
© 2024 SlideShare from Scribd