1
Housekeeping
○ Please ensure your mic is muted during the presentation and demo
○ Please use the chat box to ask any questions you may have
○ We will have a live Q&A at the end of the session and take your
questions during this time
○ This session will be recorded
2
Bend it like Ballerina: Unveiling our Regex support
Sasindu Alahakoon
May 2023
3
In this presentation:
Introduction 05
Basics of Ballerina regular expressions 06
Advanced regular expression uses 12
Using Ballerina’s regular expressions 17
Demonstration 23
Future developments 25
4
Introduction
● How could regular expressions rescue the world of text processing?
○ Data validation
○ Data extraction
○ Content transformation
5
Basics of
Basics of
Ballerina regular expressions
6
First-class support for regular expressions
● Why do we need Regex as a first-class element?
○ Generate compile time errors for invalid Regex patterns
○ Language tooling support
■ code completion
■ code action
7
Regex syntax
string:RegExp letter = re `[A-Z]?`;
● Specify the pattern between
two backticks after the re
notation
● Supports inserting values
dynamically into a regular
expression
function getDynPattern(string name) {
string:RegExp a = re `.*${name}.*`;
}
8
Regex syntax
● Based on the ECMAScript
2022 specification
● Supports a subset of the
ECMAScript specification
9
Regex declaration using template expression
import ballerina/lang.regexp;
function initRegexVars() {
regexp:RegExp a = re `[Reg]ular[Exp]ression$`;
string:RegExp b = re `[Reg]ular[Exp]ression$`;
}
10
Regex declaration with fromString()
import ballerina/lang.regexp;
function getPattern() returns error? {
string:RegExp a = check regexp:fromString("[Reg]ex$");
regexp:RegExp b = check regexp:fromString("[Reg]ex$");
}
11
Basics of
Advanced regular expression
uses
12
Capturing groups
● These allow to extract parts of
a Regex match for further
processing
import ballerina/lang.regexp;
string text = "my name is John";
function testCaptureGroups () {
regexp:RegExp r = re `name is (w+)`;
regexp:Groups? grp = r.findGroups(text);
if grp is regexp:Groups {
// 0th element contains entire match
string matchedStr =
grp[0].substring();
regexp:Span? nameSpan = grp[1];
if nameSpan is regexp:Span {
string name =
nameSpan.substring();
}
}
}
13
Non-capturing groups
string:RegExp letterWord = re `(?i:[A-Z]+)`;
● Used to group patterns
together without creating a
capturing group
string:RegExp title = re `(?:Mr|Mrs|Ms).`;
14
Regex flags
import ballerina/lang.regexp;
function testFlags() {
regexp:RegExp r = re `(?i:hello.*)`;
regexp:Span? match1 = r.matchAt("Hello");
regexp:Span? match2 = r.matchAt("HELLO!");
regexp:Span? match3 = r.matchAt("HelLO!");
regexp:Span? match4 = r.matchAt("hello!");
}
● Flags can be used to modify
how the non-capturing group
is treated
i Enables case-insensitivity
m Enables multi-line matching
s Enables the dot character to match any
character
x Allows comments and whitespace in the
pattern
15
Unicode property escape
● Allows matching characters
based on their Unicode
properties
function testUnicodeProperties () {
// `p` matches the property value.
string:RegExp upperCaseLetter = re `p{Lu}`;
// `P` matches the negation of `p`
// `gc` for General categories is optional
string:RegExp nonDigitChar = re `P{gc=N}`;
// `sc` to specify the Script property.
string:RegExp latin = re `p{sc=Latin}` ;
}
16
Basics of
Using Ballerina’s regular
expressions
17
String manipulation with Regex functions
● Supports a set of `regexp` functions to search and manipulate string values
find(text, startIndex = 0) Returns the first match of a regular expression
within a string
findAll(text, startIndex = 0) Returns a list of all the matches of a regular
expression within a string
findGroups(text, startIndex =
0)
Returns the captured groups for the first match
of a regular expression within a string
matchAt(text, startIndex = 0) Checks whether there is a match of a regular
expression at a specific index in the string
replace(text, replacement,
startIndex = 0)
Replaces the first match of a regular
expression
18
Regex functions
● Two constructs are used to
provide the output
import ballerina/lang.regexp;
function useSpan(){
regexp:RegExp name = re `John`;
regexp:Span? result = name.find("John
Doe");
if (result is regexp:Span) {
string name = result.substring();
int startIndex = result.startIndex;
int endIndex = result.endIndex;
}
}
Span Object type as a
container for a substring
of another string
Groups Tuple type that contains
span objects for the
matched substrings of
each capturing group in
the regular expression
import ballerina/lang.regexp;
function useGroups() {
regexp:RegExp nm = re `(John)`;
regexp:Groups? g = nm.findGroups("John
Doe");
if (g is regexp:Groups) {
string completeMatch = g[0].substring();
regexp:Span? s = g[1];
if (s is regexp:Span) {
string c = s.substring();
}
}
}
19
Regex functions - find()
import ballerina/io;
import ballerina/lang.regexp;
string text = string `Build microservices with ease using https://ballerina.io/,
explore integrations at https://central.ballerina.io/, and
join our community at https://discord.gg/ballerinalang.`;
public function main() {
string:RegExp urlRegex = re `(http|https)://[a-zA-Z0-9.-]+.[a-zA-Z]{2,}/?`;
regexp:Span? url = urlRegex.find(text);
if url is regexp:Span {
io:println(`Found a URL: ${url.substring()}`); // https://ballerina.io/
} else {
io:println("No URL found");
}
}
https://ballerina.io/
20
Regex functions - findGroups()
import ballerina/io;
import ballerina/lang.regexp;
string text = string `Build microservices with ease using https://ballerina.io/,
explore integrations at https://central.ballerina.io/, and
join our community at https://discord.gg/ballerinalang.` ;
public function main() {
string:RegExp urlRegex = re `(http|https)://([a-zA-Z0-9.-]+) .([a-zA-Z]{2,})/?` ;
regexp:Groups? urlGroup = urlRegex.findGroups(text);
if urlGroup is regexp:Groups {
string foundUrl = urlGroup[0].substring(); // https://ballerina.io/
string scheme = (<regexp:Span> urlGroup[1]).substring(); // https
string domain = (<regexp:Span> urlGroup[2]).substring(); // ballerina
string topLevelDomain = (<regexp:Span> urlGroup[3]).substring(); // io
} else {
io:println("No match found" );
}
}
https://ballerina.io/
https: ballerin
a
io
21
Regex functions - replace()
import ballerina/io;
import ballerina/lang.regexp;
string text = string `Build microservices with ease using https://www.abc.com,
explore integrations at https://central.ballerina.io/, and
join our community at https://discord.gg/ballerinalang.`;
public function main() {
string:RegExp urlRegex = re `(http|https)://[a-zA-Z0-9.-]+ .[a-zA-Z]{2,}/?` ;
string newText = urlRegex.replace(text, "https://ballerina.io/" );
io:println(newText);
}
https://ballerina.io/
https://www.abc.com
22
Basics of
Demonstration
23
Real-world use case demonstration
24
Basics of
Future developments
25
Future developments
● Syntax highlighting
○ Will be available from Swan Lake Update 7
○ Mark and color the start and end bounds of the brackets inside the Regex
26
Questions?
Find out more…
● Ballerina documentation
○ Learn guide: Ballerina regular expressions
■ ballerina.io/learn/distinctive-language-features/advanced-general-purpos
e-language-features/#regular-expressions
○ Ballerina by example
■ ballerina.io/learn/by-example/#regular-expressions
○ Ballerina regular expressions specification
■ ballerina.io/spec/lang/master/#section_10.1
● Join the Ballerina community
28
ballerinalang WSO2 Collective @ballerinalang ballerina-lang
Thank you!
If you have any further questions, please email contact@ballerina.io or raise them in
the Ballerina Discord server.

Ballerina Tech Talk - May 2023

  • 1.
  • 2.
    Housekeeping ○ Please ensureyour mic is muted during the presentation and demo ○ Please use the chat box to ask any questions you may have ○ We will have a live Q&A at the end of the session and take your questions during this time ○ This session will be recorded 2
  • 3.
    Bend it likeBallerina: Unveiling our Regex support Sasindu Alahakoon May 2023 3
  • 4.
    In this presentation: Introduction05 Basics of Ballerina regular expressions 06 Advanced regular expression uses 12 Using Ballerina’s regular expressions 17 Demonstration 23 Future developments 25 4
  • 5.
    Introduction ● How couldregular expressions rescue the world of text processing? ○ Data validation ○ Data extraction ○ Content transformation 5
  • 6.
    Basics of Basics of Ballerinaregular expressions 6
  • 7.
    First-class support forregular expressions ● Why do we need Regex as a first-class element? ○ Generate compile time errors for invalid Regex patterns ○ Language tooling support ■ code completion ■ code action 7
  • 8.
    Regex syntax string:RegExp letter= re `[A-Z]?`; ● Specify the pattern between two backticks after the re notation ● Supports inserting values dynamically into a regular expression function getDynPattern(string name) { string:RegExp a = re `.*${name}.*`; } 8
  • 9.
    Regex syntax ● Basedon the ECMAScript 2022 specification ● Supports a subset of the ECMAScript specification 9
  • 10.
    Regex declaration usingtemplate expression import ballerina/lang.regexp; function initRegexVars() { regexp:RegExp a = re `[Reg]ular[Exp]ression$`; string:RegExp b = re `[Reg]ular[Exp]ression$`; } 10
  • 11.
    Regex declaration withfromString() import ballerina/lang.regexp; function getPattern() returns error? { string:RegExp a = check regexp:fromString("[Reg]ex$"); regexp:RegExp b = check regexp:fromString("[Reg]ex$"); } 11
  • 12.
    Basics of Advanced regularexpression uses 12
  • 13.
    Capturing groups ● Theseallow to extract parts of a Regex match for further processing import ballerina/lang.regexp; string text = "my name is John"; function testCaptureGroups () { regexp:RegExp r = re `name is (w+)`; regexp:Groups? grp = r.findGroups(text); if grp is regexp:Groups { // 0th element contains entire match string matchedStr = grp[0].substring(); regexp:Span? nameSpan = grp[1]; if nameSpan is regexp:Span { string name = nameSpan.substring(); } } } 13
  • 14.
    Non-capturing groups string:RegExp letterWord= re `(?i:[A-Z]+)`; ● Used to group patterns together without creating a capturing group string:RegExp title = re `(?:Mr|Mrs|Ms).`; 14
  • 15.
    Regex flags import ballerina/lang.regexp; functiontestFlags() { regexp:RegExp r = re `(?i:hello.*)`; regexp:Span? match1 = r.matchAt("Hello"); regexp:Span? match2 = r.matchAt("HELLO!"); regexp:Span? match3 = r.matchAt("HelLO!"); regexp:Span? match4 = r.matchAt("hello!"); } ● Flags can be used to modify how the non-capturing group is treated i Enables case-insensitivity m Enables multi-line matching s Enables the dot character to match any character x Allows comments and whitespace in the pattern 15
  • 16.
    Unicode property escape ●Allows matching characters based on their Unicode properties function testUnicodeProperties () { // `p` matches the property value. string:RegExp upperCaseLetter = re `p{Lu}`; // `P` matches the negation of `p` // `gc` for General categories is optional string:RegExp nonDigitChar = re `P{gc=N}`; // `sc` to specify the Script property. string:RegExp latin = re `p{sc=Latin}` ; } 16
  • 17.
    Basics of Using Ballerina’sregular expressions 17
  • 18.
    String manipulation withRegex functions ● Supports a set of `regexp` functions to search and manipulate string values find(text, startIndex = 0) Returns the first match of a regular expression within a string findAll(text, startIndex = 0) Returns a list of all the matches of a regular expression within a string findGroups(text, startIndex = 0) Returns the captured groups for the first match of a regular expression within a string matchAt(text, startIndex = 0) Checks whether there is a match of a regular expression at a specific index in the string replace(text, replacement, startIndex = 0) Replaces the first match of a regular expression 18
  • 19.
    Regex functions ● Twoconstructs are used to provide the output import ballerina/lang.regexp; function useSpan(){ regexp:RegExp name = re `John`; regexp:Span? result = name.find("John Doe"); if (result is regexp:Span) { string name = result.substring(); int startIndex = result.startIndex; int endIndex = result.endIndex; } } Span Object type as a container for a substring of another string Groups Tuple type that contains span objects for the matched substrings of each capturing group in the regular expression import ballerina/lang.regexp; function useGroups() { regexp:RegExp nm = re `(John)`; regexp:Groups? g = nm.findGroups("John Doe"); if (g is regexp:Groups) { string completeMatch = g[0].substring(); regexp:Span? s = g[1]; if (s is regexp:Span) { string c = s.substring(); } } } 19
  • 20.
    Regex functions -find() import ballerina/io; import ballerina/lang.regexp; string text = string `Build microservices with ease using https://ballerina.io/, explore integrations at https://central.ballerina.io/, and join our community at https://discord.gg/ballerinalang.`; public function main() { string:RegExp urlRegex = re `(http|https)://[a-zA-Z0-9.-]+.[a-zA-Z]{2,}/?`; regexp:Span? url = urlRegex.find(text); if url is regexp:Span { io:println(`Found a URL: ${url.substring()}`); // https://ballerina.io/ } else { io:println("No URL found"); } } https://ballerina.io/ 20
  • 21.
    Regex functions -findGroups() import ballerina/io; import ballerina/lang.regexp; string text = string `Build microservices with ease using https://ballerina.io/, explore integrations at https://central.ballerina.io/, and join our community at https://discord.gg/ballerinalang.` ; public function main() { string:RegExp urlRegex = re `(http|https)://([a-zA-Z0-9.-]+) .([a-zA-Z]{2,})/?` ; regexp:Groups? urlGroup = urlRegex.findGroups(text); if urlGroup is regexp:Groups { string foundUrl = urlGroup[0].substring(); // https://ballerina.io/ string scheme = (<regexp:Span> urlGroup[1]).substring(); // https string domain = (<regexp:Span> urlGroup[2]).substring(); // ballerina string topLevelDomain = (<regexp:Span> urlGroup[3]).substring(); // io } else { io:println("No match found" ); } } https://ballerina.io/ https: ballerin a io 21
  • 22.
    Regex functions -replace() import ballerina/io; import ballerina/lang.regexp; string text = string `Build microservices with ease using https://www.abc.com, explore integrations at https://central.ballerina.io/, and join our community at https://discord.gg/ballerinalang.`; public function main() { string:RegExp urlRegex = re `(http|https)://[a-zA-Z0-9.-]+ .[a-zA-Z]{2,}/?` ; string newText = urlRegex.replace(text, "https://ballerina.io/" ); io:println(newText); } https://ballerina.io/ https://www.abc.com 22
  • 23.
  • 24.
    Real-world use casedemonstration 24
  • 25.
  • 26.
    Future developments ● Syntaxhighlighting ○ Will be available from Swan Lake Update 7 ○ Mark and color the start and end bounds of the brackets inside the Regex 26
  • 27.
  • 28.
    Find out more… ●Ballerina documentation ○ Learn guide: Ballerina regular expressions ■ ballerina.io/learn/distinctive-language-features/advanced-general-purpos e-language-features/#regular-expressions ○ Ballerina by example ■ ballerina.io/learn/by-example/#regular-expressions ○ Ballerina regular expressions specification ■ ballerina.io/spec/lang/master/#section_10.1 ● Join the Ballerina community 28 ballerinalang WSO2 Collective @ballerinalang ballerina-lang
  • 29.
    Thank you! If youhave any further questions, please email contact@ballerina.io or raise them in the Ballerina Discord server.