SlideShare a Scribd company logo
Introduction to
Pattern search and
Replace
Regular expressions
A regular expression is an effective tool for find and replace the text.
Regular Expression in R –
grep, grepl, grepexpr, sub, gsub
- grep, grepl, regexpr and gregexpr search for matches to argument
pattern within each element of a character vector
- Sub performs replacement of the first and gsub for all matches.
Rupak Roy
Regular expressions: Grep(pattern, x)
Grep(pattern, x)
- Searches for a specified substring pattern in a vector X of strings
- It gives the position of the pattern.
>grep(“[au]”,c(“Harry Potter,”Game of Thrones”, “Lord of Rings”))
Character class [au] is a list of character enclosed between [and]
which matches an character in that list. Now it will look for a or u
>[1] 1 2 This is position called as regexp.
1,2=Harry potter, game of thrones
>grep(“[Harry potter]”,c(“Harry Potter,”Game of Thrones”, “Lord of
Rings”))
> 1 2 3
Rupak Roy
Regular expressions: Grep(pattern, x)
>grep(“[^Harry potter]”,c(“Harry Potter”,”Game of Thrones”, “Lord of
Rings”))
#^ symbol: it matches any character not in the list,
#basically NOT CONDITION
> 2 3
>grep(“[letters]”,c(“Harry Potter”,”1234”, “Lord of Rings”))
>1 3
>grep(“[:lower:]”,c(“harry potter”,”1234”, “LORD of RINGS”))
>1
>grep(“[:punct:]”,c(“harry;; potter$”,”abc123”, “Lordof”))
> 1 2
Rupak Roy
Regular expressions: Grep(pattern, x)
# a period represents any single character
>grep(“t.e”,c(“Harry Potter”, “Game of Thrones”,”Lord of the rings”))
>[1] 1 3 where t_e in potter, the
>grep(“L..d”,c(“Harry Potter”, “Game of Thrones”,”Lord of the rings”))
>[1] 3
>name<-c(“a.txt”,”pqr”,”p.txt”) #here .acts as a meta character
>grep(“.txt”,name) #.means any character
>grep(“.”,c(“abc”,”de”,”f.e”)
[1] 1 2 3 because . means any character
>grep( “ .“,c(“abc”,”de”,”f.g”))
[1] 3  escape backslash are single  here well backslash itself must
be escaped which is acomplised by own back slash
Regular expressions: Grepl(pattern, x)
Grepl(pattern, x)
- Similar to grep, However it gives output in logical value
>grepl(“[au]”,c(“Harry Potter,”Game of Thrones”, “Lord of Rings”))
>[1] True True False
>grepl(“[b]”,c(“Harry Potter,”Game of Thrones”, “Lord of Rings”))
>[1] False False False
Rupak Roy
Regular expressions: regexpr(pattern,x)
regexpr(pattern, x)
- Finds the character position of the first instance of pattern within text.
>regexpr(“#”,c(“Harry#Potter”,”#Game of thrones”,”Lord of the rings”))
>[1] 7 9 13
>regexpr(“(Harry+)”,c(“Harry Potter Harry”, ”Game of thrones”))
>[1] 1 -1 -1 #only the 1st instance Harry
#position of the first instance “.” in the strings
>regexpr(“.”,c(“abc”, ”de”,”f.g”)) >[1] -1 -1 2
#position of the first instance of punctuation
>regexpr(“[:punct:]”,c(“harry;;Potter$”, ”>=<”,”1234”,”lof”)) >[1] 11 -1 -1
Rupak Roy
Regular expressions: gregexpr(pattern,x)
--- Finds the character position for all instances of pattern within text
gregexpr(“#”, c(“#Hary#Potter”, ”GameofThones”,”Lordofthe#Rings”))
>[1] 1, 8
gregexpr(“Harry+”, c(“Harry Potter
Harry ”, ”GameofThones”,”Lordofthe#Rings”))
>[1] 1 14
Rupak Roy
Regular expressions: sub
It helps to replaces a given string with another string but ‘sub’ only
replaces the first match in each string element
>sub( regular expression, replacement text, x)
>sub( “(th+)”, “e”, c(“the mountain the”, “ the hill hill”, “the city without
pollution is the peaceful is the peaceful city”, “the the”) , perl=TRUE)
The vector the will be replaced by e
>sub( “(th+)”, “1e”, c(“the mountain the”, “ the hill hill”, “the city
without pollution is the peaceful is the peaceful city”, “the the”) ,
perl=TRUE)
>[1] “thee mountain” “thee hill” “Thee city without population is the
peaceful city” “Thee the” #only the first instance
Rupak Roy
Regular expressions: gsub
It also replaces a given string with another string however unlike in sub
here all the matches in each string element is replaced.
>gsub( “(Th+)”,”e”,c(“The mountain The”, “ The hill hill”, “The city without
pollution is the peaceful city”, “the the”),perl=TRUE)
>[1] “ee mountain ee” “ee hill hill” “ee city without pollution is ee
peaceful city”, “ee the”
Rupak Roy
Regular expressions: EXAMPLE
>reviews<-read.csv(“…”, stringasFactors = FALSE)
>reviews<-data.frame(reviews=reviews$review_title)
>names(reviews)
>dim(reviews)
#checking which expression have “star”
#trying to understand the rating
>p<-reviews[grep(“ *star”,reviews$reviews),”reviews”]
#replace “ start” with the word “Ratings”
>sub(“(star)”,”rating”, P, perl=TRUE)
#position of star
>regeexpr(“star”,P)

More Related Content

What's hot

First class patterns for object matching
First class patterns for object matchingFirst class patterns for object matching
First class patterns for object matching
ESUG
 
Phorms: Pattern Matching Library for Pharo
Phorms: Pattern Matching Library for PharoPhorms: Pattern Matching Library for Pharo
Phorms: Pattern Matching Library for Pharo
MarkRizun
 
Ruby初級者向けレッスン 48回 ─── Array と Hash
Ruby初級者向けレッスン 48回 ─── Array と HashRuby初級者向けレッスン 48回 ─── Array と Hash
Ruby初級者向けレッスン 48回 ─── Array と Hashhigaki
 
Python workshop intro_string (1)
Python workshop intro_string (1)Python workshop intro_string (1)
Python workshop intro_string (1)Karamjit Kaur
 
Example 1
Example 1Example 1
Example 1
LachJ78
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat Sheet
GlowTouch
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Lorenzo Alberton
 
Numpy
NumpyNumpy
Numpy
ToniyaP1
 
Pure Laziness: An Exploration of Haskell
Pure Laziness: An Exploration of HaskellPure Laziness: An Exploration of Haskell
Pure Laziness: An Exploration of Haskell
Mitchell Vitez
 
High-Performance Haskell
High-Performance HaskellHigh-Performance Haskell
High-Performance HaskellJohan Tibell
 
Systems of congruence
Systems of congruenceSystems of congruence
Systems of congruence
SreejaRamesh2
 
python-cheat-sheet-v1
python-cheat-sheet-v1python-cheat-sheet-v1
python-cheat-sheet-v1Hiroshi Ono
 
Faster persistent data structures through hashing
Faster persistent data structures through hashingFaster persistent data structures through hashing
Faster persistent data structures through hashing
Johan Tibell
 
Intoduction to numpy
Intoduction to numpyIntoduction to numpy
Intoduction to numpy
Faraz Ahmed
 
Algorithm chapter 7
Algorithm chapter 7Algorithm chapter 7
Algorithm chapter 7chidabdu
 
Bloom filter
Bloom filterBloom filter
Bloom filterfeng lee
 
Headerfiles
HeaderfilesHeaderfiles
Headerfiles
archikabhatia
 
Python basic
Python basic Python basic
Python basic
sewoo lee
 
Fundamental Theorem of Calculus
Fundamental Theorem of CalculusFundamental Theorem of Calculus
Fundamental Theorem of Calculus
gizemk
 
Real World Haskell: Lecture 5
Real World Haskell: Lecture 5Real World Haskell: Lecture 5
Real World Haskell: Lecture 5Bryan O'Sullivan
 

What's hot (20)

First class patterns for object matching
First class patterns for object matchingFirst class patterns for object matching
First class patterns for object matching
 
Phorms: Pattern Matching Library for Pharo
Phorms: Pattern Matching Library for PharoPhorms: Pattern Matching Library for Pharo
Phorms: Pattern Matching Library for Pharo
 
Ruby初級者向けレッスン 48回 ─── Array と Hash
Ruby初級者向けレッスン 48回 ─── Array と HashRuby初級者向けレッスン 48回 ─── Array と Hash
Ruby初級者向けレッスン 48回 ─── Array と Hash
 
Python workshop intro_string (1)
Python workshop intro_string (1)Python workshop intro_string (1)
Python workshop intro_string (1)
 
Example 1
Example 1Example 1
Example 1
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat Sheet
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
 
Numpy
NumpyNumpy
Numpy
 
Pure Laziness: An Exploration of Haskell
Pure Laziness: An Exploration of HaskellPure Laziness: An Exploration of Haskell
Pure Laziness: An Exploration of Haskell
 
High-Performance Haskell
High-Performance HaskellHigh-Performance Haskell
High-Performance Haskell
 
Systems of congruence
Systems of congruenceSystems of congruence
Systems of congruence
 
python-cheat-sheet-v1
python-cheat-sheet-v1python-cheat-sheet-v1
python-cheat-sheet-v1
 
Faster persistent data structures through hashing
Faster persistent data structures through hashingFaster persistent data structures through hashing
Faster persistent data structures through hashing
 
Intoduction to numpy
Intoduction to numpyIntoduction to numpy
Intoduction to numpy
 
Algorithm chapter 7
Algorithm chapter 7Algorithm chapter 7
Algorithm chapter 7
 
Bloom filter
Bloom filterBloom filter
Bloom filter
 
Headerfiles
HeaderfilesHeaderfiles
Headerfiles
 
Python basic
Python basic Python basic
Python basic
 
Fundamental Theorem of Calculus
Fundamental Theorem of CalculusFundamental Theorem of Calculus
Fundamental Theorem of Calculus
 
Real World Haskell: Lecture 5
Real World Haskell: Lecture 5Real World Haskell: Lecture 5
Real World Haskell: Lecture 5
 

Similar to Text Mining using Regular Expressions

Eag 201110-hrugregexpresentation-111006104128-phpapp02
Eag 201110-hrugregexpresentation-111006104128-phpapp02Eag 201110-hrugregexpresentation-111006104128-phpapp02
Eag 201110-hrugregexpresentation-111006104128-phpapp02
egoodwintx
 
3.2 javascript regex
3.2 javascript regex3.2 javascript regex
3.2 javascript regex
Jalpesh Vasa
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
Ben Brumfield
 
Regular Expressions in PHP, MySQL by programmerblog.net
Regular Expressions in PHP, MySQL by programmerblog.netRegular Expressions in PHP, MySQL by programmerblog.net
Regular Expressions in PHP, MySQL by programmerblog.net
Programmer Blog
 
Regular Expressions Cheat Sheet
Regular Expressions Cheat SheetRegular Expressions Cheat Sheet
Regular Expressions Cheat Sheet
Akash Bisariya
 
Scala Parallel Collections
Scala Parallel CollectionsScala Parallel Collections
Scala Parallel Collections
Aleksandar Prokopec
 
Strinng Classes in c++
Strinng Classes in c++Strinng Classes in c++
Strinng Classes in c++Vikash Dhal
 
Presentation more c_programmingcharacter_and_string_handling_
Presentation more c_programmingcharacter_and_string_handling_Presentation more c_programmingcharacter_and_string_handling_
Presentation more c_programmingcharacter_and_string_handling_
KarthicaMarasamy
 
Reg ex cheatsheet
Reg ex cheatsheetReg ex cheatsheet
Reg ex cheatsheet
Dieudonne Nahigombeye
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsRaj Gupta
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Chia-Chi Chang
 
Assignment 2 interview preparation work COSC1285
Assignment 2 interview preparation work COSC1285Assignment 2 interview preparation work COSC1285
Assignment 2 interview preparation work COSC1285
MadelineLong2
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
Ahmed Swilam
 
Rcommands-for those who interested in R.
Rcommands-for those who interested in R.Rcommands-for those who interested in R.
Rcommands-for those who interested in R.
Dr. Volkan OBAN
 
P3 2017 python_regexes
P3 2017 python_regexesP3 2017 python_regexes
P3 2017 python_regexes
Prof. Wim Van Criekinge
 
Python Lecture 11
Python Lecture 11Python Lecture 11
Python Lecture 11
Inzamam Baig
 
lecture5.ppt
lecture5.pptlecture5.ppt
lecture5.ppt
KarthiKeyan462713
 
Array,lists and hashes in perl
Array,lists and hashes in perlArray,lists and hashes in perl
Array,lists and hashes in perl
sana mateen
 
Class 31: Deanonymizing
Class 31: DeanonymizingClass 31: Deanonymizing
Class 31: Deanonymizing
David Evans
 

Similar to Text Mining using Regular Expressions (20)

Intoduction to php strings
Intoduction to php  stringsIntoduction to php  strings
Intoduction to php strings
 
Eag 201110-hrugregexpresentation-111006104128-phpapp02
Eag 201110-hrugregexpresentation-111006104128-phpapp02Eag 201110-hrugregexpresentation-111006104128-phpapp02
Eag 201110-hrugregexpresentation-111006104128-phpapp02
 
3.2 javascript regex
3.2 javascript regex3.2 javascript regex
3.2 javascript regex
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
 
Regular Expressions in PHP, MySQL by programmerblog.net
Regular Expressions in PHP, MySQL by programmerblog.netRegular Expressions in PHP, MySQL by programmerblog.net
Regular Expressions in PHP, MySQL by programmerblog.net
 
Regular Expressions Cheat Sheet
Regular Expressions Cheat SheetRegular Expressions Cheat Sheet
Regular Expressions Cheat Sheet
 
Scala Parallel Collections
Scala Parallel CollectionsScala Parallel Collections
Scala Parallel Collections
 
Strinng Classes in c++
Strinng Classes in c++Strinng Classes in c++
Strinng Classes in c++
 
Presentation more c_programmingcharacter_and_string_handling_
Presentation more c_programmingcharacter_and_string_handling_Presentation more c_programmingcharacter_and_string_handling_
Presentation more c_programmingcharacter_and_string_handling_
 
Reg ex cheatsheet
Reg ex cheatsheetReg ex cheatsheet
Reg ex cheatsheet
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)
 
Assignment 2 interview preparation work COSC1285
Assignment 2 interview preparation work COSC1285Assignment 2 interview preparation work COSC1285
Assignment 2 interview preparation work COSC1285
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 
Rcommands-for those who interested in R.
Rcommands-for those who interested in R.Rcommands-for those who interested in R.
Rcommands-for those who interested in R.
 
P3 2017 python_regexes
P3 2017 python_regexesP3 2017 python_regexes
P3 2017 python_regexes
 
Python Lecture 11
Python Lecture 11Python Lecture 11
Python Lecture 11
 
lecture5.ppt
lecture5.pptlecture5.ppt
lecture5.ppt
 
Array,lists and hashes in perl
Array,lists and hashes in perlArray,lists and hashes in perl
Array,lists and hashes in perl
 
Class 31: Deanonymizing
Class 31: DeanonymizingClass 31: Deanonymizing
Class 31: Deanonymizing
 

More from Rupak Roy

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
Rupak Roy
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
Rupak Roy
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
Rupak Roy
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
Rupak Roy
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
Rupak Roy
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
Rupak Roy
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
Rupak Roy
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
Rupak Roy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
Rupak Roy
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
Rupak Roy
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
Rupak Roy
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
Rupak Roy
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
Rupak Roy
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
Rupak Roy
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
Rupak Roy
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
Rupak Roy
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
Rupak Roy
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
Rupak Roy
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
Rupak Roy
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, reference
Rupak Roy
 

More from Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, reference
 

Recently uploaded

一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Text Mining using Regular Expressions

  • 2. Regular expressions A regular expression is an effective tool for find and replace the text. Regular Expression in R – grep, grepl, grepexpr, sub, gsub - grep, grepl, regexpr and gregexpr search for matches to argument pattern within each element of a character vector - Sub performs replacement of the first and gsub for all matches. Rupak Roy
  • 3. Regular expressions: Grep(pattern, x) Grep(pattern, x) - Searches for a specified substring pattern in a vector X of strings - It gives the position of the pattern. >grep(“[au]”,c(“Harry Potter,”Game of Thrones”, “Lord of Rings”)) Character class [au] is a list of character enclosed between [and] which matches an character in that list. Now it will look for a or u >[1] 1 2 This is position called as regexp. 1,2=Harry potter, game of thrones >grep(“[Harry potter]”,c(“Harry Potter,”Game of Thrones”, “Lord of Rings”)) > 1 2 3 Rupak Roy
  • 4. Regular expressions: Grep(pattern, x) >grep(“[^Harry potter]”,c(“Harry Potter”,”Game of Thrones”, “Lord of Rings”)) #^ symbol: it matches any character not in the list, #basically NOT CONDITION > 2 3 >grep(“[letters]”,c(“Harry Potter”,”1234”, “Lord of Rings”)) >1 3 >grep(“[:lower:]”,c(“harry potter”,”1234”, “LORD of RINGS”)) >1 >grep(“[:punct:]”,c(“harry;; potter$”,”abc123”, “Lordof”)) > 1 2 Rupak Roy
  • 5. Regular expressions: Grep(pattern, x) # a period represents any single character >grep(“t.e”,c(“Harry Potter”, “Game of Thrones”,”Lord of the rings”)) >[1] 1 3 where t_e in potter, the >grep(“L..d”,c(“Harry Potter”, “Game of Thrones”,”Lord of the rings”)) >[1] 3 >name<-c(“a.txt”,”pqr”,”p.txt”) #here .acts as a meta character >grep(“.txt”,name) #.means any character >grep(“.”,c(“abc”,”de”,”f.e”) [1] 1 2 3 because . means any character >grep( “ .“,c(“abc”,”de”,”f.g”)) [1] 3 escape backslash are single here well backslash itself must be escaped which is acomplised by own back slash
  • 6. Regular expressions: Grepl(pattern, x) Grepl(pattern, x) - Similar to grep, However it gives output in logical value >grepl(“[au]”,c(“Harry Potter,”Game of Thrones”, “Lord of Rings”)) >[1] True True False >grepl(“[b]”,c(“Harry Potter,”Game of Thrones”, “Lord of Rings”)) >[1] False False False Rupak Roy
  • 7. Regular expressions: regexpr(pattern,x) regexpr(pattern, x) - Finds the character position of the first instance of pattern within text. >regexpr(“#”,c(“Harry#Potter”,”#Game of thrones”,”Lord of the rings”)) >[1] 7 9 13 >regexpr(“(Harry+)”,c(“Harry Potter Harry”, ”Game of thrones”)) >[1] 1 -1 -1 #only the 1st instance Harry #position of the first instance “.” in the strings >regexpr(“.”,c(“abc”, ”de”,”f.g”)) >[1] -1 -1 2 #position of the first instance of punctuation >regexpr(“[:punct:]”,c(“harry;;Potter$”, ”>=<”,”1234”,”lof”)) >[1] 11 -1 -1 Rupak Roy
  • 8. Regular expressions: gregexpr(pattern,x) --- Finds the character position for all instances of pattern within text gregexpr(“#”, c(“#Hary#Potter”, ”GameofThones”,”Lordofthe#Rings”)) >[1] 1, 8 gregexpr(“Harry+”, c(“Harry Potter Harry ”, ”GameofThones”,”Lordofthe#Rings”)) >[1] 1 14 Rupak Roy
  • 9. Regular expressions: sub It helps to replaces a given string with another string but ‘sub’ only replaces the first match in each string element >sub( regular expression, replacement text, x) >sub( “(th+)”, “e”, c(“the mountain the”, “ the hill hill”, “the city without pollution is the peaceful is the peaceful city”, “the the”) , perl=TRUE) The vector the will be replaced by e >sub( “(th+)”, “1e”, c(“the mountain the”, “ the hill hill”, “the city without pollution is the peaceful is the peaceful city”, “the the”) , perl=TRUE) >[1] “thee mountain” “thee hill” “Thee city without population is the peaceful city” “Thee the” #only the first instance Rupak Roy
  • 10. Regular expressions: gsub It also replaces a given string with another string however unlike in sub here all the matches in each string element is replaced. >gsub( “(Th+)”,”e”,c(“The mountain The”, “ The hill hill”, “The city without pollution is the peaceful city”, “the the”),perl=TRUE) >[1] “ee mountain ee” “ee hill hill” “ee city without pollution is ee peaceful city”, “ee the” Rupak Roy
  • 11. Regular expressions: EXAMPLE >reviews<-read.csv(“…”, stringasFactors = FALSE) >reviews<-data.frame(reviews=reviews$review_title) >names(reviews) >dim(reviews) #checking which expression have “star” #trying to understand the rating >p<-reviews[grep(“ *star”,reviews$reviews),”reviews”] #replace “ start” with the word “Ratings” >sub(“(star)”,”rating”, P, perl=TRUE) #position of star >regeexpr(“star”,P)