Detailed Pattern Search using regular expressions using grepl, grep, grepexpr and Replace with sub, gsub and much more.
Thanks, for your time, if you enjoyed this short slide there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
The first part of a series of talks about modern algorithms and data structures, used by nosql databases like HBase and Cassandra. An explanation of Bloom Filters and several derivates, and Merkle Trees.
Python support numpy libary for fast mathematical computations.
Homogeneous memory allocation for elements in array.
Here I have tried to cover all functions of python numpy library
XII Computer science students find it helpful.
Refer my blog for more solutions
https://pythonxiisolutions.blogspot.com/
https://prippython12.blogspot.com/
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
The first part of a series of talks about modern algorithms and data structures, used by nosql databases like HBase and Cassandra. An explanation of Bloom Filters and several derivates, and Merkle Trees.
Python support numpy libary for fast mathematical computations.
Homogeneous memory allocation for elements in array.
Here I have tried to cover all functions of python numpy library
XII Computer science students find it helpful.
Refer my blog for more solutions
https://pythonxiisolutions.blogspot.com/
https://prippython12.blogspot.com/
Regular Expressions in PHP, MySQL by programmerblog.netProgrammer Blog
This PPT explains, how to use regular expressions in PHP. PHP has two types of regular expressions Perl Style and Posix Style.
Read detailed tutorials on http://programmerblog.net
A short list of the most useful R commands
reference: http://www.personality-project.org/r/r.commands.html
R programı ile ilgilenen veya yeni öğrenmeye başlayan herkes için hazırlanmıştır.
Hierarchical Clustering - Text Mining/NLPRupak Roy
Documented Hierarchical clustering using Hclust for text mining, natural language processing.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Clustering K means and Hierarchical - NLPRupak Roy
Classify to cluster the natural language processing via K means, Hierarchical and more.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Network Analysis using 3D interactive plots along with their steps for implementation.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Explore detailed Topic Modeling via LDA Laten Dirichlet Allocation and their steps.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Widely accepted steps for sentiment analysis.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Process the sentiments of NLP with Naive Bayes Rule, Random Forest, Support Vector Machine, and much more.
Thanks, for your time, if you enjoyed this short slide there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Detailed documented with the definition of text mining along with challenges, implementing modeling techniques, word cloud and much more.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Bundled with the documentation to the introduction of Apache Hbase to the configuration.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Understand and implement the terminology of why partitioning the table is important and the Hive Query Language (HQL)
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Installing Apache Hive, internal and external table, import-export Rupak Roy
Perform Hive installation with internal and external table import-export and much more
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Well illustrated with definitions of Apache Hive with its architecture workflows plus with the types of data available for Apache Hive
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Automate the complete big data process from import to export data from HDFS to RDBMS like sql with apache sqoop
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Apache Scoop - Import with Append mode and Last Modified mode Rupak Roy
Familiar with scoop advanced functions like import with append and last modified mode.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Get acquainted with the differences in scoop, the added advantages with hands-on implementation
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Get acquainted with a distributed, reliable tool/service for collecting a large amount of streaming data to centralized storage with their architecture.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
take care!
Enhance analysis with detailed examples of Relational Operators - II includes Foreash, Filter, Join, Co-Group, Union and much more.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Passing Parameters using File and Command LineRupak Roy
Explore well versed other functions, flatten operator and other available options to pass parameters
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Get to know the implementation of apache Pig relational operators like order, limit, distinct, groupby.
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Get to know about casting of data from one to another type and reference field by position and much more
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
2. Regular expressions
A regular expression is an effective tool for find and replace the text.
Regular Expression in R –
grep, grepl, grepexpr, sub, gsub
- grep, grepl, regexpr and gregexpr search for matches to argument
pattern within each element of a character vector
- Sub performs replacement of the first and gsub for all matches.
Rupak Roy
3. Regular expressions: Grep(pattern, x)
Grep(pattern, x)
- Searches for a specified substring pattern in a vector X of strings
- It gives the position of the pattern.
>grep(“[au]”,c(“Harry Potter,”Game of Thrones”, “Lord of Rings”))
Character class [au] is a list of character enclosed between [and]
which matches an character in that list. Now it will look for a or u
>[1] 1 2 This is position called as regexp.
1,2=Harry potter, game of thrones
>grep(“[Harry potter]”,c(“Harry Potter,”Game of Thrones”, “Lord of
Rings”))
> 1 2 3
Rupak Roy
4. Regular expressions: Grep(pattern, x)
>grep(“[^Harry potter]”,c(“Harry Potter”,”Game of Thrones”, “Lord of
Rings”))
#^ symbol: it matches any character not in the list,
#basically NOT CONDITION
> 2 3
>grep(“[letters]”,c(“Harry Potter”,”1234”, “Lord of Rings”))
>1 3
>grep(“[:lower:]”,c(“harry potter”,”1234”, “LORD of RINGS”))
>1
>grep(“[:punct:]”,c(“harry;; potter$”,”abc123”, “Lordof”))
> 1 2
Rupak Roy
5. Regular expressions: Grep(pattern, x)
# a period represents any single character
>grep(“t.e”,c(“Harry Potter”, “Game of Thrones”,”Lord of the rings”))
>[1] 1 3 where t_e in potter, the
>grep(“L..d”,c(“Harry Potter”, “Game of Thrones”,”Lord of the rings”))
>[1] 3
>name<-c(“a.txt”,”pqr”,”p.txt”) #here .acts as a meta character
>grep(“.txt”,name) #.means any character
>grep(“.”,c(“abc”,”de”,”f.e”)
[1] 1 2 3 because . means any character
>grep( “ .“,c(“abc”,”de”,”f.g”))
[1] 3 escape backslash are single here well backslash itself must
be escaped which is acomplised by own back slash
6. Regular expressions: Grepl(pattern, x)
Grepl(pattern, x)
- Similar to grep, However it gives output in logical value
>grepl(“[au]”,c(“Harry Potter,”Game of Thrones”, “Lord of Rings”))
>[1] True True False
>grepl(“[b]”,c(“Harry Potter,”Game of Thrones”, “Lord of Rings”))
>[1] False False False
Rupak Roy
7. Regular expressions: regexpr(pattern,x)
regexpr(pattern, x)
- Finds the character position of the first instance of pattern within text.
>regexpr(“#”,c(“Harry#Potter”,”#Game of thrones”,”Lord of the rings”))
>[1] 7 9 13
>regexpr(“(Harry+)”,c(“Harry Potter Harry”, ”Game of thrones”))
>[1] 1 -1 -1 #only the 1st instance Harry
#position of the first instance “.” in the strings
>regexpr(“.”,c(“abc”, ”de”,”f.g”)) >[1] -1 -1 2
#position of the first instance of punctuation
>regexpr(“[:punct:]”,c(“harry;;Potter$”, ”>=<”,”1234”,”lof”)) >[1] 11 -1 -1
Rupak Roy
8. Regular expressions: gregexpr(pattern,x)
--- Finds the character position for all instances of pattern within text
gregexpr(“#”, c(“#Hary#Potter”, ”GameofThones”,”Lordofthe#Rings”))
>[1] 1, 8
gregexpr(“Harry+”, c(“Harry Potter
Harry ”, ”GameofThones”,”Lordofthe#Rings”))
>[1] 1 14
Rupak Roy
9. Regular expressions: sub
It helps to replaces a given string with another string but ‘sub’ only
replaces the first match in each string element
>sub( regular expression, replacement text, x)
>sub( “(th+)”, “e”, c(“the mountain the”, “ the hill hill”, “the city without
pollution is the peaceful is the peaceful city”, “the the”) , perl=TRUE)
The vector the will be replaced by e
>sub( “(th+)”, “1e”, c(“the mountain the”, “ the hill hill”, “the city
without pollution is the peaceful is the peaceful city”, “the the”) ,
perl=TRUE)
>[1] “thee mountain” “thee hill” “Thee city without population is the
peaceful city” “Thee the” #only the first instance
Rupak Roy
10. Regular expressions: gsub
It also replaces a given string with another string however unlike in sub
here all the matches in each string element is replaced.
>gsub( “(Th+)”,”e”,c(“The mountain The”, “ The hill hill”, “The city without
pollution is the peaceful city”, “the the”),perl=TRUE)
>[1] “ee mountain ee” “ee hill hill” “ee city without pollution is ee
peaceful city”, “ee the”
Rupak Roy
11. Regular expressions: EXAMPLE
>reviews<-read.csv(“…”, stringasFactors = FALSE)
>reviews<-data.frame(reviews=reviews$review_title)
>names(reviews)
>dim(reviews)
#checking which expression have “star”
#trying to understand the rating
>p<-reviews[grep(“ *star”,reviews$reviews),”reviews”]
#replace “ start” with the word “Ratings”
>sub(“(star)”,”rating”, P, perl=TRUE)
#position of star
>regeexpr(“star”,P)