SlideShare a Scribd company logo
1 of 22
Download to read offline
- ReDoS -
Regular
Expression
Denial of Service
What are we talking about?
The Regular expression Denial of Service (ReDoS) is a Denial of Service attack, that
exploits the fact that most Regular Expression implementations may reach extreme
situations that cause them to work very slowly (exponentially related to input size). An
attacker can then cause a program using a Regular Expression to enter these extreme
situations and then hang for a very long time.
- OWASP Definition
Initial
Knowledge
Regex Interpreter in a Nutshell
Regular expression engines are implemented as finite state machines (FSM).
The pattern you supply (called Regular Expression) is compiled into a data structure that
represents this state machine.
When you match a string against this pattern, the regex engine takes each character and
decides the state transition within the FSM. If there are no valid state transitions for an
input character the match fails.
One of the states in the FSM is a terminating/end state. If the regex engine gets there it
reports success.
Regex engines can arrange the state machines in two different ways: DFA and NFA.
Regex Engines - DFA vs NFA
Deterministic Finite Automa
Non-Deterministic Finite Automa
Called deterministic because it can always choose a next
state for a given input character; If it cannot go to a valid
state it means that the input string does not match the regex
pattern.
Called non-deterministic because there are cases where the
regex engine has to guess which state to go to next. If it
guesses wrong it has to go back to a previous state and try
a different transition.
This process is called backtracking.
The NFA will have to try all possible routes through the state
machine until it finds the terminating state, the possible
routes are exhausted, or there are no more input characters.
Backtracking Example - the apple
Which one do we choose? and Why NFA?
DFA regex Engines don’t need to backtrack and they are faster
NFA regex Engines need to backtrack and it is possible to structure your pattern in such a
way that the backtracking will cause nearly infinite loops on certain input sequences, but
this feature allow for capture groups.
For this reason, what you see in practice is that most modern languages implement their
regex engines as NFAs
On the other side
Did you notice the Security Problem?
if you didn’t, please read again slide 5,7 with a focus on red enlightened parts
In some cases a not so well structured regular Expression can degenerate in a sort of
infinite loop NFA because of its naïve nature:
“The algorithm tries one by one all the possible paths (if needed) until a match is found (or
all the paths are tried and fail). ”
The Backtracking Problem
The attacker might use the above knowledge to look for applications that use Regular
Expressions, containing an Evil Regex, and send a well-crafted input, that will hang
the system.
Catastrophic
Backtracking
&
CPU
Exhaustion
Quantifiers
To specify the number of times a token should be matched by the regex engine, you can
choose one of the following quantifiers:
? — match the token zero times (not at all) or exactly once
* — match the token zero or more times
+ — match the token one or more times
{m,n} — match the token between m and n (both including) times, where mand n are
natural numbers and n ≥ m.
As you can see by Default they are greedy because they tell the engine to match as many
instances of its quantified token or subpattern as possible.
Why are Greedy Quantifiers so Heavy?
a greedy quantifier will try to match as much as it possibly can. Every time the engine
greedily consumes one more input tokens, it has to remember that it made that choice. It will
therefore persist its current state and store it so it can come back to it later in the
backtracking process. When the regular expression engine backtracks, it performs another
match attempt at a different position in the pattern.
Storing this backtracking position doesn’t come for free, and neither does the actual
backtracking process.
Why are Nested Greedy Quantifiers so Evil?
- Part 1
Examples of Evil Patterns:
● (a+)+
● ([a-zA-Z]+)*
● (a|aa)+
● (a|a?)+
● (.*a){x} | for x > 10
There is an exponential complexity of O(2^n).
This occurs because each quantifier adds a layer of alternative steps to the paths that the NFA has to try before it
can certainly tell that there is no match (fail situation).
Whenever you see that a quantifier applies to a token that is already quantified, there is potential for the number of
steps to explode.
Why are Nested Greedy Quantifiers so Evil ? -
- Part 2
Example of Evil Regex: /(x+x+)+y/
The above regex turns ugly when the “y” is missing from the subject string.
At 21*x long input string the debugger bows out at 2.8 million steps, diagnosing a bad case of
catastrophic backtracking, just to find out that there’s no “y”.
The
Cloudflare
Case
A simple WAF Rule deployment -
On July 2, Cloudflare deployed a new rule in the
WAF that caused CPUs to become exhausted.
The update contained a regular expression that
backtracked enormously and exhausted every
CPU core that handles HTTP/HTTPS traffic on
the Cloudflare network worldwide; this brought
down Cloudflare’s core proxying, CDN and WAF
functionality.
graph showing CPUs dedicated to serving HTTP/HTTPS traffic spiking to
nearly 100% usage across the servers in Cloudflare’s network.
The regular expression
that was at the heart of the outage
/(?:(?:"|'|]|}||d|(?:nan|infinity|true|false|null|undefined|symbol
|math)|`|-|+)+[)]*;?((?:s|-|~|!|{}||||+)*.*(?:.*=.*)))/
This rule causing the outage was targeting Cross-site
scripting (XSS) attacks.
The critical part is the red enlightened .*(?:.*=.*). The (?: and matching ) are a non-capturing group
(the parser uses it to match the text, but ignores it later in the final result).
For the purposes of the discussion of why this pattern causes CPU exhaustion we can safely
ignore it and treat the pattern as .*.*=.*.
“Any real-world expression that ask the engine to ‘match anything followed by anything’ can
lead to catastrophic backtracking. “
Needed Steps:
The indicted regex takes 23 steps to match the string
“x=x”.
With 20 x’s after the = the engine takes 555
steps to match. That’s not linear.
Worst Case :
If the x= was missing, so the string was just
20 x’s, the engine would take 4,067 steps to
find the pattern doesn’t match. That
because of the naive nature of NFA regex
engines (it has to try all the possible paths,
exploded by the greedy quantifiers).
Mitigation
Lazy Quantifiers (quantifier?)
In contrast to the standard greedy quantifier, which eats up as many instances of the
quantified token as possible, a lazy quantifier tells the engine to match as few of the quantified
tokens as needed.
A lazy quantifier gives you the shortest match.
If the quantified token has matched so few characters that the rest of the pattern can not
match, the engine backtracks to the quantified token and makes it expand its match—one step
at a time. After matching each new character or subexpression, the engine tries once again to
match the rest of the pattern (this behavior is called “helpful”).
they are expensive too
Using lazy rather than greedy matches helps control the amount of
backtracking that occurs in some cases like in the Cloudflare regex :
/.*?.*?=.*?/ matches “x=x” in 11 steps instead of 23 and so does matching
“x=xxxxxxxxxxxxxxxxxx”
From a computing standpoint, this process of matching one item, advancing,
failing, backtracking and expanding is expensive in the other direction in some
situations.
In conclusion lazy quantifiers doesn’t fix every backtrackig problem, but they
could be a possible way to reduce it.
Possessive Quantifiers (quantifier+)
In contrast to the standard docile quantifier, which gives up characters if needed in order to allow the rest of
the pattern to match, a possessive quantifier tells the engine that even if what follows in the pattern fails to
match, it will hang on to its characters.
Possessive quantifiers match fragments of string as solid blocks that cannot be backtracked into: it's all or
nothing. This behavior is particularly useful when you know there is no valid reason why the engine should
ever backtrack into a section of matched text, as you can save the engine a lot of needless work.
Useful, but you need to be sure that no backtracking is needed in your search pattern.
fully re-writing the pattern to be more
specific and move away from a regular
expression engine with that backtracks
when a partially successful search path
fails.
This means having a DFA in which the
algorithm executes in time linear in the
size of the string being matched
against.
The only real solution :
“NFA version of the indicted Cloudflare regex”
“DFA version of the indicted Cloudflare regex converted with
https://cyberzhg.github.io/toolbox/nfa2dfa”
End
Sitography:
https://www.rexegg.com/regex-explosive-quantifiers.html - The Explosive Quantifier Trap
http://wstoop.co.za/wregex.php - How Regular Expression Engines works
https://mariusschulz.com/blog/why-using-the-greedy-in-regular-expressions-is-almost-never-what-you-actually-
want#bad-performance-and-incorrect-matches - Why Using the .* Greedy Quantifier is Almost never What you actually want
https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS - OWASP ReDoS
https://www.regular-expressions.info/catastrophic.html - Catastrophic Backtracking
https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/ - Details of the 2 July Outage by Cloudflare

More Related Content

Similar to ReDoS - Regular Expression Denial of Service Explained

Evaluating Model Testing and Model Checking for Finding Requirements Violatio...
Evaluating Model Testing and Model Checking for Finding Requirements Violatio...Evaluating Model Testing and Model Checking for Finding Requirements Violatio...
Evaluating Model Testing and Model Checking for Finding Requirements Violatio...Lionel Briand
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimationData Con LA
 
Achieving mass scale with Quasar Fibers
Achieving mass scale with Quasar FibersAchieving mass scale with Quasar Fibers
Achieving mass scale with Quasar FibersIdan Sheinberg
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018Codemotion
 
Regular Expression Denial of Service RegexDoS
Regular Expression Denial of  Service RegexDoSRegular Expression Denial of  Service RegexDoS
Regular Expression Denial of Service RegexDoSMichael Hidalgo
 
systemverilog-interview-questions.docx
systemverilog-interview-questions.docxsystemverilog-interview-questions.docx
systemverilog-interview-questions.docxssuser1c8ca21
 
Complier design
Complier design Complier design
Complier design shreeuva
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersUniversity of Huddersfield
 
Reading group nfm - 20170312
Reading group  nfm - 20170312Reading group  nfm - 20170312
Reading group nfm - 20170312Shuai Zhang
 
Junhua wang ai_next_con
Junhua wang ai_next_conJunhua wang ai_next_con
Junhua wang ai_next_conJunhua Wang
 
RubyConf 2015 - Stately State Machines with Ragel - Ian Duggan
RubyConf 2015 - Stately State Machines with Ragel - Ian DugganRubyConf 2015 - Stately State Machines with Ragel - Ian Duggan
RubyConf 2015 - Stately State Machines with Ragel - Ian Dugganijcd
 
FSM.NET presentation
FSM.NET presentationFSM.NET presentation
FSM.NET presentationTrueWill
 
Introduction to Boost regex
Introduction to Boost regexIntroduction to Boost regex
Introduction to Boost regexYongqiang Li
 
Unit2-Part2-MultithreadAlgos.pptx.pdf
Unit2-Part2-MultithreadAlgos.pptx.pdfUnit2-Part2-MultithreadAlgos.pptx.pdf
Unit2-Part2-MultithreadAlgos.pptx.pdfVinayak247538
 
JAVASCRIPT PPT [Autosaved].pptx
JAVASCRIPT PPT [Autosaved].pptxJAVASCRIPT PPT [Autosaved].pptx
JAVASCRIPT PPT [Autosaved].pptxAchieversITAravind
 
Full Stack Online Course in Marathahalli| AchieversIT
Full Stack Online Course in Marathahalli| AchieversITFull Stack Online Course in Marathahalli| AchieversIT
Full Stack Online Course in Marathahalli| AchieversITAchieversITAravind
 
Deep Learning Inference at speed and scale
Deep Learning Inference at speed and scaleDeep Learning Inference at speed and scale
Deep Learning Inference at speed and scaleBill Liu
 

Similar to ReDoS - Regular Expression Denial of Service Explained (20)

Evaluating Model Testing and Model Checking for Finding Requirements Violatio...
Evaluating Model Testing and Model Checking for Finding Requirements Violatio...Evaluating Model Testing and Model Checking for Finding Requirements Violatio...
Evaluating Model Testing and Model Checking for Finding Requirements Violatio...
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
Matrix Multiplication Report
Matrix Multiplication ReportMatrix Multiplication Report
Matrix Multiplication Report
 
Achieving mass scale with Quasar Fibers
Achieving mass scale with Quasar FibersAchieving mass scale with Quasar Fibers
Achieving mass scale with Quasar Fibers
 
Rust presentation convergeconf
Rust presentation convergeconfRust presentation convergeconf
Rust presentation convergeconf
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
 
Regular Expression Denial of Service RegexDoS
Regular Expression Denial of  Service RegexDoSRegular Expression Denial of  Service RegexDoS
Regular Expression Denial of Service RegexDoS
 
CAP: Scaling, HA
CAP: Scaling, HACAP: Scaling, HA
CAP: Scaling, HA
 
systemverilog-interview-questions.docx
systemverilog-interview-questions.docxsystemverilog-interview-questions.docx
systemverilog-interview-questions.docx
 
Complier design
Complier design Complier design
Complier design
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Reading group nfm - 20170312
Reading group  nfm - 20170312Reading group  nfm - 20170312
Reading group nfm - 20170312
 
Junhua wang ai_next_con
Junhua wang ai_next_conJunhua wang ai_next_con
Junhua wang ai_next_con
 
RubyConf 2015 - Stately State Machines with Ragel - Ian Duggan
RubyConf 2015 - Stately State Machines with Ragel - Ian DugganRubyConf 2015 - Stately State Machines with Ragel - Ian Duggan
RubyConf 2015 - Stately State Machines with Ragel - Ian Duggan
 
FSM.NET presentation
FSM.NET presentationFSM.NET presentation
FSM.NET presentation
 
Introduction to Boost regex
Introduction to Boost regexIntroduction to Boost regex
Introduction to Boost regex
 
Unit2-Part2-MultithreadAlgos.pptx.pdf
Unit2-Part2-MultithreadAlgos.pptx.pdfUnit2-Part2-MultithreadAlgos.pptx.pdf
Unit2-Part2-MultithreadAlgos.pptx.pdf
 
JAVASCRIPT PPT [Autosaved].pptx
JAVASCRIPT PPT [Autosaved].pptxJAVASCRIPT PPT [Autosaved].pptx
JAVASCRIPT PPT [Autosaved].pptx
 
Full Stack Online Course in Marathahalli| AchieversIT
Full Stack Online Course in Marathahalli| AchieversITFull Stack Online Course in Marathahalli| AchieversIT
Full Stack Online Course in Marathahalli| AchieversIT
 
Deep Learning Inference at speed and scale
Deep Learning Inference at speed and scaleDeep Learning Inference at speed and scale
Deep Learning Inference at speed and scale
 

Recently uploaded

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

ReDoS - Regular Expression Denial of Service Explained

  • 2. What are we talking about? The Regular expression Denial of Service (ReDoS) is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size). An attacker can then cause a program using a Regular Expression to enter these extreme situations and then hang for a very long time. - OWASP Definition
  • 4. Regex Interpreter in a Nutshell Regular expression engines are implemented as finite state machines (FSM). The pattern you supply (called Regular Expression) is compiled into a data structure that represents this state machine. When you match a string against this pattern, the regex engine takes each character and decides the state transition within the FSM. If there are no valid state transitions for an input character the match fails. One of the states in the FSM is a terminating/end state. If the regex engine gets there it reports success. Regex engines can arrange the state machines in two different ways: DFA and NFA.
  • 5. Regex Engines - DFA vs NFA Deterministic Finite Automa Non-Deterministic Finite Automa Called deterministic because it can always choose a next state for a given input character; If it cannot go to a valid state it means that the input string does not match the regex pattern. Called non-deterministic because there are cases where the regex engine has to guess which state to go to next. If it guesses wrong it has to go back to a previous state and try a different transition. This process is called backtracking. The NFA will have to try all possible routes through the state machine until it finds the terminating state, the possible routes are exhausted, or there are no more input characters.
  • 7. Which one do we choose? and Why NFA? DFA regex Engines don’t need to backtrack and they are faster NFA regex Engines need to backtrack and it is possible to structure your pattern in such a way that the backtracking will cause nearly infinite loops on certain input sequences, but this feature allow for capture groups. For this reason, what you see in practice is that most modern languages implement their regex engines as NFAs On the other side
  • 8. Did you notice the Security Problem? if you didn’t, please read again slide 5,7 with a focus on red enlightened parts In some cases a not so well structured regular Expression can degenerate in a sort of infinite loop NFA because of its naïve nature: “The algorithm tries one by one all the possible paths (if needed) until a match is found (or all the paths are tried and fail). ” The Backtracking Problem The attacker might use the above knowledge to look for applications that use Regular Expressions, containing an Evil Regex, and send a well-crafted input, that will hang the system.
  • 10. Quantifiers To specify the number of times a token should be matched by the regex engine, you can choose one of the following quantifiers: ? — match the token zero times (not at all) or exactly once * — match the token zero or more times + — match the token one or more times {m,n} — match the token between m and n (both including) times, where mand n are natural numbers and n ≥ m. As you can see by Default they are greedy because they tell the engine to match as many instances of its quantified token or subpattern as possible.
  • 11. Why are Greedy Quantifiers so Heavy? a greedy quantifier will try to match as much as it possibly can. Every time the engine greedily consumes one more input tokens, it has to remember that it made that choice. It will therefore persist its current state and store it so it can come back to it later in the backtracking process. When the regular expression engine backtracks, it performs another match attempt at a different position in the pattern. Storing this backtracking position doesn’t come for free, and neither does the actual backtracking process.
  • 12. Why are Nested Greedy Quantifiers so Evil? - Part 1 Examples of Evil Patterns: ● (a+)+ ● ([a-zA-Z]+)* ● (a|aa)+ ● (a|a?)+ ● (.*a){x} | for x > 10 There is an exponential complexity of O(2^n). This occurs because each quantifier adds a layer of alternative steps to the paths that the NFA has to try before it can certainly tell that there is no match (fail situation). Whenever you see that a quantifier applies to a token that is already quantified, there is potential for the number of steps to explode.
  • 13. Why are Nested Greedy Quantifiers so Evil ? - - Part 2 Example of Evil Regex: /(x+x+)+y/ The above regex turns ugly when the “y” is missing from the subject string. At 21*x long input string the debugger bows out at 2.8 million steps, diagnosing a bad case of catastrophic backtracking, just to find out that there’s no “y”.
  • 15. A simple WAF Rule deployment - On July 2, Cloudflare deployed a new rule in the WAF that caused CPUs to become exhausted. The update contained a regular expression that backtracked enormously and exhausted every CPU core that handles HTTP/HTTPS traffic on the Cloudflare network worldwide; this brought down Cloudflare’s core proxying, CDN and WAF functionality. graph showing CPUs dedicated to serving HTTP/HTTPS traffic spiking to nearly 100% usage across the servers in Cloudflare’s network.
  • 16. The regular expression that was at the heart of the outage /(?:(?:"|'|]|}||d|(?:nan|infinity|true|false|null|undefined|symbol |math)|`|-|+)+[)]*;?((?:s|-|~|!|{}||||+)*.*(?:.*=.*)))/ This rule causing the outage was targeting Cross-site scripting (XSS) attacks. The critical part is the red enlightened .*(?:.*=.*). The (?: and matching ) are a non-capturing group (the parser uses it to match the text, but ignores it later in the final result). For the purposes of the discussion of why this pattern causes CPU exhaustion we can safely ignore it and treat the pattern as .*.*=.*.
  • 17. “Any real-world expression that ask the engine to ‘match anything followed by anything’ can lead to catastrophic backtracking. “ Needed Steps: The indicted regex takes 23 steps to match the string “x=x”. With 20 x’s after the = the engine takes 555 steps to match. That’s not linear. Worst Case : If the x= was missing, so the string was just 20 x’s, the engine would take 4,067 steps to find the pattern doesn’t match. That because of the naive nature of NFA regex engines (it has to try all the possible paths, exploded by the greedy quantifiers).
  • 19. Lazy Quantifiers (quantifier?) In contrast to the standard greedy quantifier, which eats up as many instances of the quantified token as possible, a lazy quantifier tells the engine to match as few of the quantified tokens as needed. A lazy quantifier gives you the shortest match. If the quantified token has matched so few characters that the rest of the pattern can not match, the engine backtracks to the quantified token and makes it expand its match—one step at a time. After matching each new character or subexpression, the engine tries once again to match the rest of the pattern (this behavior is called “helpful”). they are expensive too Using lazy rather than greedy matches helps control the amount of backtracking that occurs in some cases like in the Cloudflare regex : /.*?.*?=.*?/ matches “x=x” in 11 steps instead of 23 and so does matching “x=xxxxxxxxxxxxxxxxxx” From a computing standpoint, this process of matching one item, advancing, failing, backtracking and expanding is expensive in the other direction in some situations. In conclusion lazy quantifiers doesn’t fix every backtrackig problem, but they could be a possible way to reduce it.
  • 20. Possessive Quantifiers (quantifier+) In contrast to the standard docile quantifier, which gives up characters if needed in order to allow the rest of the pattern to match, a possessive quantifier tells the engine that even if what follows in the pattern fails to match, it will hang on to its characters. Possessive quantifiers match fragments of string as solid blocks that cannot be backtracked into: it's all or nothing. This behavior is particularly useful when you know there is no valid reason why the engine should ever backtrack into a section of matched text, as you can save the engine a lot of needless work. Useful, but you need to be sure that no backtracking is needed in your search pattern.
  • 21. fully re-writing the pattern to be more specific and move away from a regular expression engine with that backtracks when a partially successful search path fails. This means having a DFA in which the algorithm executes in time linear in the size of the string being matched against. The only real solution : “NFA version of the indicted Cloudflare regex” “DFA version of the indicted Cloudflare regex converted with https://cyberzhg.github.io/toolbox/nfa2dfa”
  • 22. End Sitography: https://www.rexegg.com/regex-explosive-quantifiers.html - The Explosive Quantifier Trap http://wstoop.co.za/wregex.php - How Regular Expression Engines works https://mariusschulz.com/blog/why-using-the-greedy-in-regular-expressions-is-almost-never-what-you-actually- want#bad-performance-and-incorrect-matches - Why Using the .* Greedy Quantifier is Almost never What you actually want https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS - OWASP ReDoS https://www.regular-expressions.info/catastrophic.html - Catastrophic Backtracking https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/ - Details of the 2 July Outage by Cloudflare