Regular expressions are very commonly used to process and validate text data. Unfortunately, they suffer from various limitations. I will discuss the limitations and illustrate how using grammars instead can be an improvement. I will make the examples oncrete using parsing libraries in a couple of representative languages, although the ideas are language-independent.
I will emphasize ease of use, since one reason for the overuse of regular expressions is that they are so easy to pull out of one's toolbox.
Have you ever wondered how you would implement a parser for simple arithmetic expressions in C++? Faced with this problem, many C++ programmers would generate a parser using Flex and Bison, or perhaps Boost Spirit. In this talk, I introduced some alternative tools, Ragel and Lemon, showing how they can be used to build a calculator that handles simple arithmetic expressions.
The Rust compiler's borrow checker is critical for ensuring safe Rust code. Even more critical, however, is how the borrow checker provides useful, automated guidance on how to write safe code when the check fails. Early in your Rust journey it may feel like you are fighting the borrow checker. Come to this talk to learn how you can transition from fighting the borrow checker to using its guidance to write safer and more powerful code at any experience level. Walk away not only understanding the what and the how of the borrow checker - but why it works the way it does - and why it is so critical to both the technical functionality and philosophy of Rust.
Have you ever wondered how you would implement a parser for simple arithmetic expressions in C++? Faced with this problem, many C++ programmers would generate a parser using Flex and Bison, or perhaps Boost Spirit. In this talk, I introduced some alternative tools, Ragel and Lemon, showing how they can be used to build a calculator that handles simple arithmetic expressions.
The Rust compiler's borrow checker is critical for ensuring safe Rust code. Even more critical, however, is how the borrow checker provides useful, automated guidance on how to write safe code when the check fails. Early in your Rust journey it may feel like you are fighting the borrow checker. Come to this talk to learn how you can transition from fighting the borrow checker to using its guidance to write safer and more powerful code at any experience level. Walk away not only understanding the what and the how of the borrow checker - but why it works the way it does - and why it is so critical to both the technical functionality and philosophy of Rust.
This is the first set of slightly updated slides from a Perl programming course that I held some years ago for the QA team of a big international company.
I want to share it with everyone looking for intransitive Perl-knowledge.
The updates after 1st of June 2014 are made with the kind support of Chain Solutions (http://chainsolutions.net/)
A table of content for all presentations can be found at i-can.eu.
The source code for the examples and the presentations in ODP format are on https://github.com/kberov/PerlProgrammingCourse
Does what it says on the tin. An introduction to Modern Perl programming aimed at programmers who have little or no experience in Perl.
I gave this course at the London Perl Workshop in November 2013.
The Halifax Index tells Halifax, Nova Scotia's story - the strength of the economy, the health of the community, the sustainability of the environment and the progress of the Economic Strategy. It’s a new and smarter way to measure progress that reaches beyond traditional economic indicators like GDP and jobs. It provides deeper insight and the most comprehensive look at what’s happening in Halifax than anything currently available, and could be used as a model for other Canadian cities.
The 1st Halifax Index was presented at the Halifax State of the Economy Conference www.agreaterhalifax.com
This is the first set of slightly updated slides from a Perl programming course that I held some years ago for the QA team of a big international company.
I want to share it with everyone looking for intransitive Perl-knowledge.
The updates after 1st of June 2014 are made with the kind support of Chain Solutions (http://chainsolutions.net/)
A table of content for all presentations can be found at i-can.eu.
The source code for the examples and the presentations in ODP format are on https://github.com/kberov/PerlProgrammingCourse
Does what it says on the tin. An introduction to Modern Perl programming aimed at programmers who have little or no experience in Perl.
I gave this course at the London Perl Workshop in November 2013.
The Halifax Index tells Halifax, Nova Scotia's story - the strength of the economy, the health of the community, the sustainability of the environment and the progress of the Economic Strategy. It’s a new and smarter way to measure progress that reaches beyond traditional economic indicators like GDP and jobs. It provides deeper insight and the most comprehensive look at what’s happening in Halifax than anything currently available, and could be used as a model for other Canadian cities.
The 1st Halifax Index was presented at the Halifax State of the Economy Conference www.agreaterhalifax.com
Stites & Harbison attorneys, Chuck Waters, Matt Mashburn, Ron Stay, Richard Stephens and Allen Bradley, along with attorney William H. Dodson (William H. Dodson, II, LLC) present this seminar.
Topics covered are Title Insurance, Bank Owned Property, the Protecting Tenants at Foreclosure Act of 2009, Issues in Suits on Notes and Confirmation Actions and 1099 Reporting Requirements.
Documento elaborado por John Bertot y Elsa Estevez con motivo del Foro Argentino de Transformación Digital, organizado por CESSI y la United Nations University (UNU_EGOV). Buenos Aires, 7 de marzo de 2016.
Presentation of the Dolnośląska Ruby User Group, which took place on 20.02.2023.
Compared to the previous version, it contains 1 regular expression fix and Ruby 3.2 improvements.
Regex is used for Google Analytics and much more!
We explore Ruby's default regexp class, but go beyond (so it's for advanced developers):
- Explaining how backtracking and deterministic finite automaton work.
- Why get the RE2 library from Google? or regular expressions at all?
- What Ruby regex improvements ship with Ruby 3.2?
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/ibrettflorio
REGEX! Love it or hate it, sometimes you actually need it. And when that time comes, there's no reason to be afraid or to ask help from that one weirdo on your team who actually loves regular expressions. (I'm that weirdo, fwiw.)
This session is geared towards beginning and intermediate regex users, as well as experienced programmers and developers who just don't really grok regex. We'll cover the following topics using practical examples that you might encounter in your own projects. (ie. No matching against "dog" and "cat".)
* What is regex? How's it work? A brief history.
* Syntax, special characters, character classes
* Grouping, capturing, and common gotchas
* Use cases for matching, validating, and replacing
* More advanced topics like backreferences and lookarounds
Similar to Stop overusing regular expressions! (20)
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
SOCRadar Research Team: Latest Activities of IntelBroker
Stop overusing regular expressions!
1. Stop overusing regular expressions!
Franklin Chen
http://franklinchen.com/
Pittsburgh Tech Fest 2013
June 1, 2013
2. Famous quote
In 1997, Jamie Zawinski famously wrote:
Some people, when confronted with a problem, think,
“I know, I’ll use regular expressions.”
Now they have two problems.
3. Purpose of this talk
Assumption: you already have experience using regexes
Goals:
Change how you think about and use regexes
Introduce you to advantages of using parser combinators
Show a smooth way to transition from regexes to parsers
Discuss practical tradeoffs
Show only tested, running code:
https://github.com/franklinchen/
talk-on-overusing-regular-expressions
Non-goals:
Will not discuss computer science theory
Will not discuss parser generators such as yacc and ANTLR
4. Example code in which languages?
Considerations:
This is a polyglot conference
Time limitations
Decision: focus primarily on two representative languages.
Ruby: dynamic typing
Perl, Python, JavaScript, Clojure, etc.
Scala: static typing:
Java, C++, C#, F#, ML, Haskell, etc.
GitHub repository also has similar-looking code for
Perl
JavaScript
5. An infamous regex for email
The reason for my talk!
A big Perl regex for email address based on RFC 822 grammar:
/(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+
)+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:r
rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[]
?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]
t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@
31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[
](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:
(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^
(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:
|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[
?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 0
rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|
t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-
?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*
)*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-03
t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*]
6. Personal story: my email address fiasco
To track and prevent spam: FranklinChen+spam@cmu.edu
Some sites wrongly claimed invalid (because of +)
Other sites did allow registration
I caught spam
Unsubscribing failed!
Some wrong claimed invalid (!?!)
Some silently failed to unsubscribe
Had to set up spam filter
7. Problem: different regexes for email?
Examples: which one is better?
/A[^@]+@[^@]+z/
vs.
/A[a-zA-Z]+@([^@.]+.)+[^@.]+z/
8. Readability: first example
Use x for readability!
/A # match string begin
[^@]+ # local part: 1+ chars of not @
@ # @
[^@]+ # domain: 1+ chars of not @
z/x # match string end
matches
FranklinChen+spam
@
cmu.edu
Advice: please write regexes in this formatted style!
9. Readability: second example
/A # match string begin
[a-zA-Z]+ # local part: 1+ of Roman alphabetic
@ # @
( # 1+ of this group
[^@.]+ # 1+ of not (@ or dot)
. # dot
)+
[^@.]+ # 1+ of not (@ or dot)
z/x # match string end
does not match
FranklinChen+spam
@
cmu.edu
10. Don’t Do It Yourself: find libraries
Infamous regex revisited: was automatically generated into the Perl
module Mail::RFC822::Address based on the RFC 822 spec.
If you use Perl and need to validate email addresses:
# Install the library
$ cpanm Mail::RFC822::Address
Other Perl regexes: Regexp::Common
11. Regex match success vs. failure
Parentheses for captured grouping:
if ($address =~ /A # match string begin
([a-zA-Z]+) # $1: local part: 1+ Roman alphabetic
@ # @
( # $2: entire domain
(?: # 1+ of this non-capturing group
[^@.]+ # 1+ of not (@ or dot)
. # dot
)+
([^@.]+) # $3: 1+ of not (@ or dot)
)
z/x) { # match string end
print "local part: $1; domain: $2; top level: $3n";
}
else {
print "Regex match failed!n";
}
12. Regexes do not report errors usefully
Success:
$ perl/extract_parts.pl ’prez@whitehouse.gov’
local part: prez; domain: whitehouse.gov; top level: gov
But failure:
$ perl/extract_parts.pl ’FranklinChen+spam@cmu.edu’
Regex match failed!
We have a dilemma:
Bug in the regex?
The data really doesn’t match?
13. Better error reporting: how?
Would prefer something at least minimally like:
[1.13] failure: ‘@’ expected but ‘+’ found
FranklinChen+spam@cmu.edu
^
Features of parser libraries:
Extraction of line, column information of error
Automatically generated error explanation
Hooks for customizing error explanation
Hooks for error recovery
14. Moving from regexes to grammar parsers
Only discuss parser combinators, not generators
Use the second email regex as a starting point
Use modularization
15. Modularized regex: Ruby
Ruby allows interpolation of regexes into other regexes:
class EmailValidator::PrettyRegexMatcher
LOCAL_PART = /[a-zA-Z]+/x
DOMAIN_CHAR = /[^@.]/x
SUB_DOMAIN = /#{DOMAIN_CHAR}+/x
AT = /@/x
DOT = /./x
DOMAIN = /( #{SUB_DOMAIN} #{DOT} )+ #{SUB_DOMAIN}/x
EMAIL = /A #{LOCAL_PART} #{AT} #{DOMAIN} z/x
def match(s)
s =~ EMAIL
end
end
16. Modularized regex: Scala
object PrettyRegexMatcher {
val user = """[a-zA-Z]+"""
val domainChar = """[^@.]"""
val domainSegment = raw"""(?x) $domainChar+"""
val at = """@"""
val dot = """."""
val domain = raw"""(?x)
($domainSegment $dot)+ $domainSegment"""
val email = raw"""(?x) A $user $at $domain z"""
val emailRegex = email.r
def matches(s: String): Boolean =
(emailRegex findFirstMatchIn s).nonEmpty
end
Triple quotes are for raw strings (no backslash interpretation)
raw allows raw variable interpolation
.r is a method converting String to Regex
17. Email parser: Ruby
Ruby does not have a standard parser combinator library. One that
is popular is Parslet.
require ’parslet’
class EmailValidator::Parser < Parslet::Parser
rule(:local_part) { match[’a-zA-Z’].repeat(1) }
rule(:domain_char) { match[’^@.’] }
rule(:at) { str(’@’) }
rule(:dot) { str(’.’) }
rule(:sub_domain) { domain_char.repeat(1) }
rule(:domain) { (sub_domain >> dot).repeat(1) >>
sub_domain }
rule(:email) { local_part >> at >> domain }
end
18. Email parser: Scala
Scala comes with standard parser combinator library.
import scala.util.parsing.combinator.RegexParsers
object EmailParsers extends RegexParsers {
override def skipWhitespace = false
def localPart: Parser[String] = """[a-zA-Z]+""".r
def domainChar = """[^@.]""".r
def at = "@"
def dot = "."
def subDomain = rep1(domainChar)
def domain = rep1(subDomain ~ dot) ~ subDomain
def email = localPart ~ at ~ domain
}
Inheriting from RegexParsers allows the implicit conversions from
regexes into parsers.
19. Running and reporting errors: Ruby
parser = EmailValidator::Parser.new
begin
parser.email.parse(address)
puts "Successfully parsed #{address}"
rescue Parslet::ParseFailed => error
puts "Parse failed, and here’s why:"
puts error.cause.ascii_tree
end
$ validate_email ’FranklinChen+spam@cmu.edu’
Parse failed, and here’s why:
Failed to match sequence (LOCAL_PART AT DOMAIN) at
line 1 char 13.
‘- Expected "@", but got "+" at line 1 char 13.
20. Running and reporting errors: Scala
def runParser(address: String) {
import EmailParsers._
// parseAll is method inherited from RegexParsers
parseAll(email, address) match {
case Success(_, _) =>
println(s"Successfully parsed $address")
case failure: NoSuccess =>
println("Parse failed, and here’s why:")
println(failure)
}
}
Parse failed, and here’s why:
[1.13] failure: ‘@’ expected but ‘+’ found
FranklinChen+spam@cmu.edu
^
21. Infamous email regex revisited: conversion!
Mail::RFC822::Address Perl regex: actually manually
back-converted from a parser!
Original module used Perl parser library Parse::RecDescent
Regex author turned the grammar rules into a modularized
regex
Reason: speed
Perl modularized regex source code excerpt:
# ...
my $localpart = "$word(?:.$lwsp*$word)*";
my $sub_domain = "(?:$atom|$domain_literal)";
my $domain = "$sub_domain(?:.$lwsp*$sub_domain)*";
my $addr_spec = "$localpart@$lwsp*$domain";
# ...
22. Why validate email address anyway?
Software development: always look at the bigger picture
This guy advocates simply sending a user an activation email
Engineering tradeoff: the email sending and receiving
programs need to handle the email address anyway
23. Email example wrapup
It is possible to write a regex
But first try to use someone else’s tested regex
If you do write your own regex, modularize it
For error reporting, use a parser: convert from modularized
regex
Do you even need to solve the problem?
24. More complex parsing: toy JSON
{
"distance" : 5.6,
"address" : {
"street" : "0 Nowhere Road",
"neighbors" : ["X", "Y"],
"garage" : null
}
}
Trick question: could you use a regex to parse this?
25. Recursive regular expressions?
2007: Perl 5.10 introduced “recursive regular expressions” that can
parse context-free grammars that have rules embedding a reference
to themselves.
$regex = qr/
( # match an open paren (
( # followed by
[^()]+ # 1+ non-paren characters
| # OR
(??{$regex}) # the regex itself
)* # repeated 0+ times
) # followed by a close paren )
/x;
Not recommended!
But, gateway to context-free grammars
26. Toy JSON parser: Ruby
To save time:
No more Ruby code in this presentation
Continuing with Scala since most concise
Remember: concepts are language-independent!
Full code for a toy JSON parser is available in the Parslet web site
at https://github.com/kschiess/parslet/blob/master/
example/json.rb
27. Toy JSON parser: Scala
object ToyJSONParsers extends JavaTokenParsers {
def value: Parser[Any] = obj | arr |
stringLiteral | floatingPointNumber |
"null" | "true" | "false"
def obj = "{" ~ repsep(member, ",") ~ "}"
def arr = "[" ~ repsep(value, ",") ~ "]"
def member = stringLiteral ~ ":" ~ value
}
Inherit from JavaTokenParsers: reuse stringLiteral and
floatingPointNumber parsers
~ is overloaded operator to mean sequencing of parsers
value parser returns Any (equivalent to Java Object)
because we have not yet refined the parser with our own
domain model
28. Fancier toy JSON parser: domain modeling
Want to actually query the data upon parsing, to use.
{
"distance" : 5.6,
"address" : {
"street" : "0 Nowhere Road",
"neighbors" : ["X", "Y"],
"garage" : null
}
}
We may want to traverse the JSON to address and then to
the second neighbor, to check whether it is "Y"
Pseudocode after storing in domain model:
data.address.neighbors[1] == "Y"
30. Domain modeling: Scala
// sealed means: can’t extend outside the source file
sealed abstract class ToyJSON
case class JObject(map: Map[String, ToyJSON]) extends
ToyJSON
case class JArray(list: List[ToyJSON]) extends ToyJSON
case class JString(string: String) extends ToyJSON
case class JFloat(float: Float) extends ToyJSON
case object JNull extends ToyJSON
case class JBoolean(boolean: Boolean) extends ToyJSON
32. Running the fancy toy JSON parser
val j = """{
"distance" : 5.6,
"address" : {
"street" : "0 Nowhere Road",
"neighbors" : ["X", "Y"],
"garage" : null
}
}"""
parseAll(value, j) match {
case Success(result, _) =>
println(result)
case failure: NoSuccess =>
println("Parse failed, and here’s why:")
println(failure)
}
Output: what we proposed when data modeling!
33. JSON: reminder not to reinvent
Use a standard JSON parsing library!
Scala has one.
All languages have a standard JSON parsing library.
Shop around: alternate libraries have different tradeoffs.
Other standard formats: HTML, XML, CSV, etc.
34. JSON wrapup
Parsing can be simple
Domain modeling is trickier but still fit on one page
Use an existing JSON parsing library already!
35. Final regex example
These slides use the Python library Pygments for syntax
highlighting of all code (highly recommended)
Experienced bugs in the highlighting of regexes in Perl code
Found out why, in source code for class PerlLexer
#TODO: give this to a perl guy who knows how to parse perl
tokens = {
’balanced-regex’: [
(r’/(|[^]|[^/])*/[egimosx]*’, String.Regex, ’#p
(r’!(|[^]|[^!])*![egimosx]*’, String.Regex, ’#p
(r’(|[^])*[egimosx]*’, String.Regex, ’#pop’),
(r’{(|[^]|[^}])*}[egimosx]*’, String.Regex, ’#p
(r’<(|[^]|[^>])*>[egimosx]*’, String.Regex, ’#p
(r’[(|[^]|[^]])*][egimosx]*’, String.Regex,
(r’((|[^]|[^)])*)[egimosx]*’, String.Regex,
(r’@(|[^]|[^@])*@[egimosx]*’, String.Regex, ’#
(r’%(|[^]|[^%])*%[egimosx]*’, String.Regex, ’#
(r’$(|[^]|[^$])*$[egimosx]*’, String.Regex,
36. Conclusion
Regular expressions
No error reporting
Flat data
More general grammars
Composable structure (using combinator parsers)
Hierarchical, nested data
Avoid reinventing
Concepts are language-independent: find and use a parser
library for your language
All materials for this talk available at https://github.com/
franklinchen/talk-on-overusing-regular-expressions.
The hyperlinks on the slide PDFs are clickable.