The document discusses regular expressions (regex) and strategies for optimizing performance when using regex. It provides examples of different regex implementations for a custom word checking function and compares their performance. The key points are:
- Implementation #1 using regex is slower than #2 iterating through characters or #3 using string functions like isalpha()
- When writing regex, be careful of repetitions, wildcards, and long target strings as they negatively impact performance
- Strategies like avoiding unnecessary capturing groups, ordering patterns, and minimizing repetitions can improve regex performance.
Arpad Ray's PHPNW08 slides:
Looking at websites from the perspective of potential attackers is a useful technique not only for security professionals.
This talk demonstrates how to use simple PHP scripts to exploit many common security holes in PHP applications, hopefully giving developers a deeper understanding of what it is they are protecting against.
* Getting around common precautions against SQL injection
* Free spam with SMTP injection
* Making a malicious website to exploit PHP sessions
* The holes every attacker hopes for
* Making use of a newly exploited website
Xenogenetics for PL/SQL - infusing with Java best practicesLucas Jellema
Xenogenetics for PL/SQL- Infusing with Java Best Practices and Design Patterns - PL/SQL is a venerable programming language that is both vital and very much alive. This session will discuss how we further rejuvenate and enhance the way we create PL/SQL programs. We'll tap into the world of .NET, Java, and other modern programming language to do some cherry picking: what are the very best practices, concepts, and design patterns and how can we apply them to PL/SQL? We'll explain and demonstrate dependency injection, the observer pattern based on listeners, aspect-oriented programming (AOP), the decorator and template pattern, use of user-defined nested types, and collections.
This document discusses smartmatch (~~), a feature introduced in Perl 5.10 that provides pattern matching capabilities. It was initially designed to work similarly to equality (==) checks but is now more flexible. The document provides examples of how smartmatch can be used for tasks like command line argument checking, array element checking, IP address matching, and URL routing in a concise way. It advocates keeping the smartmatch operator in Perl.
Review unknown code with static analysisDamien Seguy
Review unknown code with static analysis
Code quality is not just for christmas, it is a daily part of the job. So, what do you do when you’re handed with a five feet long pole a million lines of code that must be vetted ? You call static analysis to the rescue. During one hour, we’ll be reviewing totally unknown code code : no name, no usage, not a clue. We’ll apply a wide range of tools, reaching for anything that helps us understand the code and form an opinion on it. Can we break this mystery and learn how everyone else is looking at our code ?
This document discusses PEG.js, a JavaScript parser generator. It begins with an agenda that includes understanding PEG.js, how to use it, and making a domain-specific language (DSL) and parser. The document then covers how parsers are generally made, what PEG.js is, trying out some simple PEG grammars, and making a DSL for conditional logic. Key points include that PEG.js uses Parsing Expression Grammar to easily generate parsers without needing a separate lexer, how to write grammars and generate parsers with PEG.js, and an example of developing a DSL and parser for conditional logic configuration files.
Benchmarking Perl (Chicago UniForum 2006)brian d foy
The document discusses various techniques for benchmarking and profiling Perl code, including:
- Using Benchmark.pm to compare the performance of different code snippets, but noting its limitations in precision and accuracy.
- Profiling applications first using SmallProf to identify bottlenecks before optimizing code.
- Considering what aspects of performance are important for the specific application, like speed, memory usage, or network load.
- Recognizing one's own biases when benchmarking and verifying results with predictions.
This document provides an overview of regular expressions (regexes) and grammars in Perl 6. It discusses key concepts like rules, tokens, and capturing matches. Regexes allow matching patterns in strings, while grammars parse strings according to defined rules and tokens. The document gives examples of grammars for search queries and dates that capture city, country, from and to dates, and guest numbers. It demonstrates parsing strings and accessing captured values to retrieve individual fields.
Arpad Ray's PHPNW08 slides:
Looking at websites from the perspective of potential attackers is a useful technique not only for security professionals.
This talk demonstrates how to use simple PHP scripts to exploit many common security holes in PHP applications, hopefully giving developers a deeper understanding of what it is they are protecting against.
* Getting around common precautions against SQL injection
* Free spam with SMTP injection
* Making a malicious website to exploit PHP sessions
* The holes every attacker hopes for
* Making use of a newly exploited website
Xenogenetics for PL/SQL - infusing with Java best practicesLucas Jellema
Xenogenetics for PL/SQL- Infusing with Java Best Practices and Design Patterns - PL/SQL is a venerable programming language that is both vital and very much alive. This session will discuss how we further rejuvenate and enhance the way we create PL/SQL programs. We'll tap into the world of .NET, Java, and other modern programming language to do some cherry picking: what are the very best practices, concepts, and design patterns and how can we apply them to PL/SQL? We'll explain and demonstrate dependency injection, the observer pattern based on listeners, aspect-oriented programming (AOP), the decorator and template pattern, use of user-defined nested types, and collections.
This document discusses smartmatch (~~), a feature introduced in Perl 5.10 that provides pattern matching capabilities. It was initially designed to work similarly to equality (==) checks but is now more flexible. The document provides examples of how smartmatch can be used for tasks like command line argument checking, array element checking, IP address matching, and URL routing in a concise way. It advocates keeping the smartmatch operator in Perl.
Review unknown code with static analysisDamien Seguy
Review unknown code with static analysis
Code quality is not just for christmas, it is a daily part of the job. So, what do you do when you’re handed with a five feet long pole a million lines of code that must be vetted ? You call static analysis to the rescue. During one hour, we’ll be reviewing totally unknown code code : no name, no usage, not a clue. We’ll apply a wide range of tools, reaching for anything that helps us understand the code and form an opinion on it. Can we break this mystery and learn how everyone else is looking at our code ?
This document discusses PEG.js, a JavaScript parser generator. It begins with an agenda that includes understanding PEG.js, how to use it, and making a domain-specific language (DSL) and parser. The document then covers how parsers are generally made, what PEG.js is, trying out some simple PEG grammars, and making a DSL for conditional logic. Key points include that PEG.js uses Parsing Expression Grammar to easily generate parsers without needing a separate lexer, how to write grammars and generate parsers with PEG.js, and an example of developing a DSL and parser for conditional logic configuration files.
Benchmarking Perl (Chicago UniForum 2006)brian d foy
The document discusses various techniques for benchmarking and profiling Perl code, including:
- Using Benchmark.pm to compare the performance of different code snippets, but noting its limitations in precision and accuracy.
- Profiling applications first using SmallProf to identify bottlenecks before optimizing code.
- Considering what aspects of performance are important for the specific application, like speed, memory usage, or network load.
- Recognizing one's own biases when benchmarking and verifying results with predictions.
This document provides an overview of regular expressions (regexes) and grammars in Perl 6. It discusses key concepts like rules, tokens, and capturing matches. Regexes allow matching patterns in strings, while grammars parse strings according to defined rules and tokens. The document gives examples of grammars for search queries and dates that capture city, country, from and to dates, and guest numbers. It demonstrates parsing strings and accessing captured values to retrieve individual fields.
From typing the test to testing the typeWim Godden
PHP unit testing + new PHPUnit patch for type testing functionality
Seems bullet points are not working and some of the slides are not so clear because of Slideshare conversion.
Presentation given at phpBenelux meeting August 25, 2010
The document summarizes new features and improvements in Zend Framework 1.10, including new components like Zend_Barcode and Zend_Feed_Writer, improvements to existing components, new services like LiveDocx and DeveloperGarden, and updates to the documentation.
Perl 6 for Concurrency and Parallel ComputingAndrew Shitov
This document discusses parallel and concurrent features in Perl 6. It covers implicit parallelism enabled by operators like hyper operators and junctions. Explicit parallelism using feeds, channels, and promises is also discussed. Promises allow asynchronous and parallel execution, and examples are given using Promise.in to run code in threads and the sleep sort algorithm. Further parallel constructs like schedulers, suppliers, signals, threads, atomic operations, locks and semaphores are also mentioned for additional exploration.
The document discusses Test Driven Development (TDD) in AngularJS. It provides examples of writing unit tests for an AngularJS password strength controller and directive. The key points are:
1) Examples are given of writing unit tests for an AngularJS password strength controller that tests the grading of password strength based on length.
2) An example directive for adding a greeting is shown, and a test is written to ensure it replaces the element and binds the greeting name correctly.
3) TDD in AngularJS involves writing tests before code, keeping code testable by separating concerns, and using tests to ensure code works as intended. If code is hard to test it likely has design issues
Benchmarking Perl Lightning Talk (NPW 2007)brian d foy
Benchmarking programming languages is difficult because we are actually benchmarking implementations, not languages themselves. Many factors can influence benchmarks, including hardware, software environment, and the programs being tested. It is important to approach benchmarks with skepticism and awareness of their limitations to avoid being misled. The key is to benchmark programs or tasks that are actually meaningful and useful to analyze performance in realistic scenarios.
The document discusses testing code and assuring quality by learning to use the Perl modules Test::More, Perl::Critic, and Devel::Cover. It provides an overview of different testing techniques in Perl including writing unit tests, ensuring code quality and test coverage, and using the prove command-line tool to run tests.
The document contains code snippets that demonstrate using JavaScript prompts and conditional statements. The first code takes user input for exam marks and uses if/else statements to output the grade. The second takes user input for a year and uses logic to determine if it is a leap year. The third is incomplete code that aims to take user input for maximum marks and marks obtained, calculate the percentage, and output encouragement or advice based on the percentage.
Designing with Groovy Traits - Gr8Conf IndiaNaresha K
The document discusses using traits in Groovy to provide reusable behaviors and properties to classes without relying on inheritance. It describes how traits allow for mixing in multiple capabilities, overriding trait methods, implementing interfaces, declaring abstract methods, and having state. Traits can extend other traits and resolve conflicts using finer control. Traits also allow applying behaviors at runtime and composing common fields. Examples demonstrate composing user context, auditing objects, chaining approvals, and more.
This document provides an introduction and overview of PowerShell for learning purposes. It discusses PowerShell concepts like the REPL, commands, parameters, objects, arrays, scripts, functions, and modules. It also covers more advanced topics such as NuGet, Chocolatey, and package management in PowerShell. The document is intended to help the reader get started with and understand the basics of the PowerShell language.
This document discusses leveraging Groovy for capturing business rules through domain-specific languages (DSLs). It begins with introductions to DSLs and Groovy, explaining their goals and advantages. Examples are provided of using Groovy to remove boilerplate code from Java programs and create internal DSLs. The document demonstrates how Groovy features like closures and meta-programming enable the creation of DSLs for expressing business rules in a natural, domain-focused way.
Regular expressions, Alex Perry, Google, PyCon2014alex_perry
This document discusses using regular expressions (RE) in Python. It introduces the re module for basic REs and parsing strings. The sre module allows examining the internal representation of REs. The sre_yield module iterates over all matching strings of a RE. While Python REs are built-in, the RE2 library provides more predictable performance without backtracking. The document demonstrates various uses of REs including parsing, formatting, counting matches, and representing large sets of strings compactly.
Performance Optimization of Rails ApplicationsSerge Smetana
The document discusses optimizing the performance of Ruby on Rails applications. It covers optimizing Ruby code, Rails code, database queries, using alternative Ruby implementations like JRuby, and optimizing for production environments including shared filesystems, load balancing, and the frontend. Specific optimizations discussed include rewriting parts of the Date class in C, template inlining in Rails, pushing SQL conditions into subqueries, and using memcached instead of filesystem caching on a shared network.
Given that the database, as the canonical repository of data, is the most important part of many applications, why is it that we don't write database unit tests? This talk promotes the practice of implementing tests to directly test the schema, storage, and functionality of databases.
Valerii Vasylkov Erlang. measurements and benefits.Аліна Шепшелей
The document discusses the benefits of Erlang, including its functional nature, powerful pattern matching, built-in concurrency and fault tolerance through let it crash philosophy, ability to perform distributed computation, and capability for hot code upgrades without downtime. It covers Erlang's actor model approach to concurrency, use of processes and message passing, supervision trees for fault tolerance, and tools for debugging, profiling, and detecting bottlenecks.
This document summarizes a presentation about code review using the Coder module in Drupal. The presentation discusses that code review can check for more than just coding standards, including security issues, performance problems, and potential bugs. It provides examples of different rule types in Coder for finding common code problems like deprecated function calls. The presentation encourages contributors to submit new review rules and patches to improve Coder.
Benchmarking is used to measure and compare the performance of code or systems. It involves running standardized tests multiple times to obtain time measurements. PHPBench is a framework for benchmarking PHP code that executes benchmarks in iterations with many revolutions to stabilize results. It performs statistical analysis on the time measurements and can store results. Benchmarking aims to minimize variability through techniques like increasing revolutions, reducing outliers via thresholds, and enforcing consensus through margins of error.
"How keep normal blood pressure using TDD" By Roman LoparevCiklum Ukraine
The document provides an overview of test-driven development (TDD) principles and practices for keeping normal blood pressure. It discusses what TDD is, basic TDD principles like "tests first" and writing minimal code to pass tests, benefits like better design and refactoring, and challenges like requiring discipline. It provides examples of writing tests for a FizzBuzz game in a test-first manner, demonstrating how to name tests clearly and structure them. It also discusses tools like Mockito for mocking and Cobertura for checking code coverage targets.
A few things could be improved in this test:
1. Extract the HTTP client into a dependency rather than creating it directly in the test. This decouples the test from the implementation details of making HTTP requests.
2. Consider using a mock HTTP client in the test to avoid actual network calls. This makes the test faster and isolated.
3. Split the test into two - one for the POST and one for the GET. Having multiple assertions in one test violates the one-assertion-per-test rule and makes the test less readable.
4. Add more validation of the response, e.g. check status codes, response bodies etc. rather than a single assertion.
So in summary
Review unknown code with static analysis Zend con 2017Damien Seguy
Code quality is not just for Christmas, it is a daily part of the job. So, what do you do when you're handed with a five feet long pole a million lines of code that must be vetted? You call static analysis to the rescue. During one hour, we'll be reviewing totally unknown code: no name, no usage, not a clue. We'll apply a wide range of tools, reaching for anything that helps us understand the code and form an opinion on it. Can we break this mystery and learn how everyone else is looking at our code?
This document provides an overview of the C programming language. It covers C fundamentals like data types and operators. It also discusses various control structures like decision making (if-else), loops (for, while, do-while), case control (switch) and functions. Additionally, it explains input/output operations, arrays and string handling in C. The document is presented as lecture notes with sections and subsections on different C concepts along with examples.
From typing the test to testing the typeWim Godden
PHP unit testing + new PHPUnit patch for type testing functionality
Seems bullet points are not working and some of the slides are not so clear because of Slideshare conversion.
Presentation given at phpBenelux meeting August 25, 2010
The document summarizes new features and improvements in Zend Framework 1.10, including new components like Zend_Barcode and Zend_Feed_Writer, improvements to existing components, new services like LiveDocx and DeveloperGarden, and updates to the documentation.
Perl 6 for Concurrency and Parallel ComputingAndrew Shitov
This document discusses parallel and concurrent features in Perl 6. It covers implicit parallelism enabled by operators like hyper operators and junctions. Explicit parallelism using feeds, channels, and promises is also discussed. Promises allow asynchronous and parallel execution, and examples are given using Promise.in to run code in threads and the sleep sort algorithm. Further parallel constructs like schedulers, suppliers, signals, threads, atomic operations, locks and semaphores are also mentioned for additional exploration.
The document discusses Test Driven Development (TDD) in AngularJS. It provides examples of writing unit tests for an AngularJS password strength controller and directive. The key points are:
1) Examples are given of writing unit tests for an AngularJS password strength controller that tests the grading of password strength based on length.
2) An example directive for adding a greeting is shown, and a test is written to ensure it replaces the element and binds the greeting name correctly.
3) TDD in AngularJS involves writing tests before code, keeping code testable by separating concerns, and using tests to ensure code works as intended. If code is hard to test it likely has design issues
Benchmarking Perl Lightning Talk (NPW 2007)brian d foy
Benchmarking programming languages is difficult because we are actually benchmarking implementations, not languages themselves. Many factors can influence benchmarks, including hardware, software environment, and the programs being tested. It is important to approach benchmarks with skepticism and awareness of their limitations to avoid being misled. The key is to benchmark programs or tasks that are actually meaningful and useful to analyze performance in realistic scenarios.
The document discusses testing code and assuring quality by learning to use the Perl modules Test::More, Perl::Critic, and Devel::Cover. It provides an overview of different testing techniques in Perl including writing unit tests, ensuring code quality and test coverage, and using the prove command-line tool to run tests.
The document contains code snippets that demonstrate using JavaScript prompts and conditional statements. The first code takes user input for exam marks and uses if/else statements to output the grade. The second takes user input for a year and uses logic to determine if it is a leap year. The third is incomplete code that aims to take user input for maximum marks and marks obtained, calculate the percentage, and output encouragement or advice based on the percentage.
Designing with Groovy Traits - Gr8Conf IndiaNaresha K
The document discusses using traits in Groovy to provide reusable behaviors and properties to classes without relying on inheritance. It describes how traits allow for mixing in multiple capabilities, overriding trait methods, implementing interfaces, declaring abstract methods, and having state. Traits can extend other traits and resolve conflicts using finer control. Traits also allow applying behaviors at runtime and composing common fields. Examples demonstrate composing user context, auditing objects, chaining approvals, and more.
This document provides an introduction and overview of PowerShell for learning purposes. It discusses PowerShell concepts like the REPL, commands, parameters, objects, arrays, scripts, functions, and modules. It also covers more advanced topics such as NuGet, Chocolatey, and package management in PowerShell. The document is intended to help the reader get started with and understand the basics of the PowerShell language.
This document discusses leveraging Groovy for capturing business rules through domain-specific languages (DSLs). It begins with introductions to DSLs and Groovy, explaining their goals and advantages. Examples are provided of using Groovy to remove boilerplate code from Java programs and create internal DSLs. The document demonstrates how Groovy features like closures and meta-programming enable the creation of DSLs for expressing business rules in a natural, domain-focused way.
Regular expressions, Alex Perry, Google, PyCon2014alex_perry
This document discusses using regular expressions (RE) in Python. It introduces the re module for basic REs and parsing strings. The sre module allows examining the internal representation of REs. The sre_yield module iterates over all matching strings of a RE. While Python REs are built-in, the RE2 library provides more predictable performance without backtracking. The document demonstrates various uses of REs including parsing, formatting, counting matches, and representing large sets of strings compactly.
Performance Optimization of Rails ApplicationsSerge Smetana
The document discusses optimizing the performance of Ruby on Rails applications. It covers optimizing Ruby code, Rails code, database queries, using alternative Ruby implementations like JRuby, and optimizing for production environments including shared filesystems, load balancing, and the frontend. Specific optimizations discussed include rewriting parts of the Date class in C, template inlining in Rails, pushing SQL conditions into subqueries, and using memcached instead of filesystem caching on a shared network.
Given that the database, as the canonical repository of data, is the most important part of many applications, why is it that we don't write database unit tests? This talk promotes the practice of implementing tests to directly test the schema, storage, and functionality of databases.
Valerii Vasylkov Erlang. measurements and benefits.Аліна Шепшелей
The document discusses the benefits of Erlang, including its functional nature, powerful pattern matching, built-in concurrency and fault tolerance through let it crash philosophy, ability to perform distributed computation, and capability for hot code upgrades without downtime. It covers Erlang's actor model approach to concurrency, use of processes and message passing, supervision trees for fault tolerance, and tools for debugging, profiling, and detecting bottlenecks.
This document summarizes a presentation about code review using the Coder module in Drupal. The presentation discusses that code review can check for more than just coding standards, including security issues, performance problems, and potential bugs. It provides examples of different rule types in Coder for finding common code problems like deprecated function calls. The presentation encourages contributors to submit new review rules and patches to improve Coder.
Benchmarking is used to measure and compare the performance of code or systems. It involves running standardized tests multiple times to obtain time measurements. PHPBench is a framework for benchmarking PHP code that executes benchmarks in iterations with many revolutions to stabilize results. It performs statistical analysis on the time measurements and can store results. Benchmarking aims to minimize variability through techniques like increasing revolutions, reducing outliers via thresholds, and enforcing consensus through margins of error.
"How keep normal blood pressure using TDD" By Roman LoparevCiklum Ukraine
The document provides an overview of test-driven development (TDD) principles and practices for keeping normal blood pressure. It discusses what TDD is, basic TDD principles like "tests first" and writing minimal code to pass tests, benefits like better design and refactoring, and challenges like requiring discipline. It provides examples of writing tests for a FizzBuzz game in a test-first manner, demonstrating how to name tests clearly and structure them. It also discusses tools like Mockito for mocking and Cobertura for checking code coverage targets.
A few things could be improved in this test:
1. Extract the HTTP client into a dependency rather than creating it directly in the test. This decouples the test from the implementation details of making HTTP requests.
2. Consider using a mock HTTP client in the test to avoid actual network calls. This makes the test faster and isolated.
3. Split the test into two - one for the POST and one for the GET. Having multiple assertions in one test violates the one-assertion-per-test rule and makes the test less readable.
4. Add more validation of the response, e.g. check status codes, response bodies etc. rather than a single assertion.
So in summary
Review unknown code with static analysis Zend con 2017Damien Seguy
Code quality is not just for Christmas, it is a daily part of the job. So, what do you do when you're handed with a five feet long pole a million lines of code that must be vetted? You call static analysis to the rescue. During one hour, we'll be reviewing totally unknown code: no name, no usage, not a clue. We'll apply a wide range of tools, reaching for anything that helps us understand the code and form an opinion on it. Can we break this mystery and learn how everyone else is looking at our code?
This document provides an overview of the C programming language. It covers C fundamentals like data types and operators. It also discusses various control structures like decision making (if-else), loops (for, while, do-while), case control (switch) and functions. Additionally, it explains input/output operations, arrays and string handling in C. The document is presented as lecture notes with sections and subsections on different C concepts along with examples.
The document discusses conducting a PHP code audit. It outlines steps like interviewing developers, performing black box testing, and conducting an open code audit. When auditing code, the speaker searches for injections like SQL, PHP, and HTML injections. Variables used in includes are reviewed, as are comments, variables, functions, and conditions. Register_globals is identified as a risk that can be emulated. The report would provide an executive summary, problems summary, and details on vulnerabilities found. Community involvement and continuous auditing are recommended for security.
This document summarizes Joseph Adler's talk on fast lookups in R. It discusses:
1. Looking up values in vectors can take O(n) time, while lookups in environments take O(1) time on average since environments use hash tables.
2. It measures the speed of lookups in vectors versus environments of varying sizes, finding lookups in environments are faster, especially for larger tables.
3. It provides suggestions for optimizing lookup speed such as using positions over names and using environments over vectors for larger tables requiring frequent lookups.
Object Orientation vs. Functional Programming in PythonPython Ireland
The document discusses object orientation and functional programming approaches in Python. It covers various object-oriented programming concepts like the template method pattern, abstract base classes, mixins, and composition. It also covers functional programming concepts like callbacks, higher-order functions, decorators, and partial function application. It concludes that Python supports both paradigms well and that depending on the situation, one approach may be more appropriate, but the tools can also complement each other.
The document discusses generating headless JavaScript tests for validations. It describes problems with testing JavaScript across many views, models, and validations. It proposes using server-side and client-side validations, widgets, localization, and regular expressions to solve these problems. Tests are generated and executed using RSpec and a standalone JavaScript interpreter to test validations without a browser.
The document discusses using aspect oriented programming (AOP) in Python to design APIs. It describes how AOP can help separate concerns like security, logging, and serialization into distinct aspects to avoid scattering code across multiple functions. Decorators are proposed as a way to implement aspects for a bioenergy application API. Specific decorator aspects are presented for security, statistics, serialization, and dispatching API calls to core functions. The implementation applies the aspects as decorators to API functions to cleanly separate the concerns.
The document provides guidelines for testing Rails applications. It recommends that any code change should be driven by a failed test. Tests should be close to the code being tested and focus on functionality rather than specific code. It discusses testing models, associations, named scopes, validations, controllers, views, helpers, emails, user interactions, Ajax requests, external sites, Rake tasks, file uploads, legacy applications, and more. The goal is to have tests that result in high-quality code through a test-first approach.
Ruby on Rails is a web application framework built on the Ruby programming language. It utilizes the MVC pattern with ActiveRecord as the ORM layer to simplify interactions with the database. Rails emphasizes conventions like implicit associations and validations to minimize configuration. Its goal is to maximize developer productivity through features like automatic SQL generation and an active community of developers.
The document describes how to use ranges and views to create a calendar layout from a range of dates. It shows how to group dates by month and week, format the weeks, add titles and padding, and lay the months out side by side. Custom range adaptors and facades are defined to chunk ranges, interleave elements, and transpose a range of ranges. The final solution composably transforms the date range into a formatted calendar that can work with infinite input ranges.
2. About A3Sec
● AlienVault's spin-off
● Professional Services, SIEM deployments
● Alienvault's Authorized Training Center (ATC)
for Spain and LATAM
● Team of more than 25 Security Experts
● Own developments and tool integrations
● Advanced Health Check Monitoring
● Web: www.a3sec.com, Twitter: @a3sec
3. About Me
● David Gil <dgil@a3sec.com>
● Developer, Sysadmin, Project Manager
● Really believes in Open Source model
● Programming since he was 9 years old
● Ossim developer at its early stage
● Agent core engine (full regex) and first plugins
● Python lover :-)
● Debian package maintainer (a long, long time ago)
● Sci-Fi books reader and mountain bike rider
4. Summary
1. What is a regexp?
2. When to use regexp?
3. Regex basics
4. Performance Tests
5. Writing regexp (Performance Strategies)
6. Writing plugins (Performance Strategies)
7. Tools
7. Regular Expressions
What is a regex?
Regular expression:
(bb|[^b]{2})dd
Input strings:
bb445, 2ac3357bb, bb3aa2c7,
a2ab64b, abb83fh6l3hi22ui
8. Regular Expressions
What is a regex?
Regular expression:
(bb|[^b]{2})dd
Input strings:
bb445, 2ac3357bb, bb3aa2c7,
a2ab64b, abb83fh6l3hi22ui
9. Summary
1. What is a regexp?
2. When to use regexp?
3. Regex basics
4. Performance Tests
5. Writing regexp (Performance Strategies)
6. Writing plugins (Performance Strategies)
7. Tools
10. Regular Expressions
To RE or not to RE
● Regular expressions are almost never the
right answer
○ Difficult to debug and maintain
○ Performance reasons, slower for simple matching
○ Learning curve
11. Regular Expressions
To RE or not to RE
● Regular expressions are almost never the
right answer
○ Difficult to debug and maintain
○ Performance reasons, slower for simple matching
○ Learning curve
● Python string functions are small C loops:
super fast!
○ beginswith(), endswith(), split(), etc.
12. Regular Expressions
To RE or not to RE
● Regular expressions are almost never the
right answer
○ Difficult to debug and maintain
○ Performance reasons, slower for simple matching
○ Learning curve
● Python string functions are small C loops:
super fast!
○ beginswith(), endswith(), split(), etc.
● Use standard parsing libraries!
Formats: JSON, HTML, XML, CSV, etc.
13. Regular Expressions
To RE or not to RE
Example: URL parsing
● regex:
^(https?://)?([da-z.-]+).([a-z.]{2,6})([/w .-]*)*/?$
● parse_url() php method:
$url = "http://username:password@hostname/path?arg=value#anchor";
print_r(parse_url($url));
(
[scheme] => http
[host] => hostname
[user] => username
[pass] => password
[path] => /path
[query] => arg=value
[fragment] => anchor
)
14. Regular Expressions
To RE or not to RE
But, there are a lot of reasons to use regex:
● powerful
● portable
● fast (with performance in mind)
● useful for complex patterns
● save development time
● short code
● fun :-)
● beautiful?
15. Summary
1. What is a regexp?
2. When to use regexp?
3. Regex basics
4. Performance Tests
5. Writing regexp (Performance Strategies)
6. Writing plugins (Performance Strategies)
7. Tools
20. Regular Expressions
Greedy & Lazy quantifiers: *?, +?
● Greedy vs non-greedy (lazy)
>>> re.findall('A+', 'AAAA')
['AAAA']
>>> re.findall('A+?', 'AAAA')
['A', 'A', 'A', 'A']
● An overall match takes precedence over and
overall non-match
>>> re.findall('<.*>.*</.*>', '<B>i am bold</B>')
>>> re.findall('<(.*)>.*</(.*)>', '<B>i am bold</B>')
21. Regular Expressions
Greedy & Lazy quantifiers: *?, +?
● Greedy vs non-greedy (lazy)
>>> re.findall('A+', 'AAAA')
['AAAA']
>>> re.findall('A+?', 'AAAA')
['A', 'A', 'A', 'A']
● An overall match takes precedence over and
overall non-match
>>> re.findall('<.*>.*</.*>', '<B>i am bold</B>')
>>> re.findall('<(.*)>.*</(.*)>', '<B>i am bold</B>')
● Minimal matching, non-greedy
>>> re.findall('<(.*)>.*', '<B>i am bold</B>')
>>> re.findall('<(.*?)>.*', '<B>i am bold</B>')
22. Summary
1. What is a regexp?
2. When to use regexp?
3. Regex basics
4. Performance Tests
5. Writing regexp (Performance Strategies)
6. Writing plugins (Performance Strategies)
7. Tools
25. Regular Expressions
Performance Test #1
def is_a_word(word):
CHARS = string.uppercase + string.lowercase
regexp = r'^[%s]+$' % CHARS
if re.search(regexp, word) return "YES" else "NOP"
timeit.timeit(s, 'is_a_word(%s)' %(w))
1.49650502205
YES len=4
word
1.65614509583
YES len=25
wordlongerthanpreviousone..
1.92520785332
YES len=60
wordlongerthanpreviosoneplusan..
2.38850092888
YES len=120
wordlongerthanpreviosoneplusan..
1.55924701691
NOP len=10
not a word
1.7087020874
NOP len=25
not a word, just a phrase..
1.92521882057
NOP len=50
not a word, just a phrase bigg..
2.39075493813
NOP len=102
not a word, just a phrase bigg..
26. Regular Expressions
Performance Test #1
def is_a_word(word):
CHARS = string.uppercase + string.lowercase
regexp = r'^[%s]+$' % CHARS
if re.search(regexp, word) return "YES" else "NOP"
timeit.timeit(s, 'is_a_word(%s)' %(w))
1.49650502205
YES len=4
word
1.65614509583
YES len=25
wordlongerthanpreviousone..
1.92520785332
YES len=60
wordlongerthanpreviosoneplusan..
2.38850092888
YES len=120
wordlongerthanpreviosoneplusan..
1.55924701691
NOP len=10
not a word
1.7087020874
NOP len=25
not a word, just a phrase..
1.92521882057
NOP len=50
not a word, just a phrase bigg..
2.39075493813
NOP len=102
not a word, just a phrase bigg..
If the target string is longer, the regex matching
is slower. No matter if success or fail.
28. Regular Expressions
Performance Test #2
def is_a_word(word):
for char in word:
if not char in (CHARS): return "NOP"
return "YES"
timeit.timeit(s, 'is_a_word(%s)' %(w))
0.687522172928 YES len=4
word
1.0725839138
YES len=25
wordlongerthanpreviousone..
2.34717106819
YES len=60
wordlongerthanpreviosoneplusan..
4.31543898582
YES len=120
wordlongerthanpreviosoneplusan..
0.54797577858
NOP len=10
not a word
0.547253847122 NOP len=25
not a word, just a phrase..
0.546499967575 NOP len=50
not a word, just a phrase bigg..
0.553755998611 NOP len=102
not a word, just a phrase bigg..
29. Regular Expressions
Performance Test #2
def is_a_word(word):
for char in word:
if not char in (CHARS): return "NOP"
return "YES"
timeit.timeit(s, 'is_a_word(%s)' %(w))
0.687522172928 YES len=4
word
1.0725839138
YES len=25
wordlongerthanpreviousone..
2.34717106819
YES len=60
wordlongerthanpreviosoneplusan..
4.31543898582
YES len=120
wordlongerthanpreviosoneplusan..
0.54797577858
NOP len=10
not a word
0.547253847122 NOP len=25
not a word, just a phrase..
0.546499967575 NOP len=50
not a word, just a phrase bigg..
0.553755998611 NOP len=102
not a word, just a phrase bigg..
2 python nested loops if success (slow)
But fails at the same point&time (first space)
31. Regular Expressions
Performance Test #3
def is_a_word(word):
return "YES" if word.isalpha() else "NOP"
timeit.timeit(s, 'is_a_word(%s)' %(w))
0.146447896957 YES len=4
word
0.212563037872 YES len=25
wordlongerthanpreviousone..
0.318686008453 YES len=60
wordlongerthanpreviosoneplusan..
0.493942975998 YES len=120
wordlongerthanpreviosoneplusan..
0.14647102356 NOP len=10
not a word
0.146160840988 NOP len=25
not a word, just a phrase..
0.147103071213 NOP len=50
not a word, just a phrase bigg..
0.146239995956 NOP len=102
not a word, just a phrase bigg..
32. Regular Expressions
Performance Test #3
def is_a_word(word):
return "YES" if word.isalpha() else "NOP"
timeit.timeit(s, 'is_a_word(%s)' %(w))
0.146447896957 YES len=4
word
0.212563037872 YES len=25
wordlongerthanpreviousone..
0.318686008453 YES len=60
wordlongerthanpreviosoneplusan..
0.493942975998 YES len=120
wordlongerthanpreviosoneplusan..
0.14647102356 NOP len=10
not a word
0.146160840988 NOP len=25
not a word, just a phrase..
0.147103071213 NOP len=50
not a word, just a phrase bigg..
0.146239995956 NOP len=102
not a word, just a phrase bigg..
Python string functions (fast and small C loops)
33. Summary
1. What is a regexp?
2. When to use regexp?
3. Regex basics
4. Performance Tests
5. Writing regexp (Performance Strategies)
6. Writing plugins (Performance Strategies)
7. Tools
40. Regular Expressions
Performance Strategies
Writing regex
● Use the non-capturing group when no need
to capture and save text to a variable
(?:abc|def|ghi) instead of (abc|def|ghi)
● Pattern most likely to match first
(TRAFFIC_ALLOW|TRAFFIC_DROP|TRAFFIC_DENY)
41. Regular Expressions
Performance Strategies
Writing regex
● Use the non-capturing group when no need
to capture and save text to a variable
(?:abc|def|ghi) instead of (abc|def|ghi)
● Pattern most likely to match first
(TRAFFIC_ALLOW|TRAFFIC_DROP|TRAFFIC_DENY)
TRAFFIC_(ALLOW|DROP|DENY)
42. Regular Expressions
Performance Strategies
Writing regex
● Use the non-capturing group when no need
to capture and save text to a variable
(?:abc|def|ghi) instead of (abc|def|ghi)
● Pattern most likely to match first
(TRAFFIC_ALLOW|TRAFFIC_DROP|TRAFFIC_DENY)
TRAFFIC_(ALLOW|DROP|DENY)
● Use anchors (^ and $) to limit the score
re.findall(r'(ab){2}', 'abcabcabc')
re.findall(r'^(ab){2}','abcabcabc') #failures occur faster
43. Summary
1. What is a regexp?
2. When to use regexp?
3. Regex basics
4. Performance Tests
5. Writing regexp (Performance Strategies)
6. Writing plugins (Performance Strategies)
7. Tools
45. Regular Expressions
Performance Strategies
Writing Agent plugins
● A new process is forked for each loaded
plugin
○ Use the plugins that you really need!
● A plugin is a set of rules (regexp operations)
for matching log lines
○ If a plugin doesn't match a log entry, it fails in ALL its
rules!
○ Reduce the number of rules, use a [translation] table
47. Regular Expressions
Performance Strategies
Writing Agent plugins
● Alphabetical order for rule matching
○ Order your rules by priority, pattern most likely to
match first
● Divide and conquer
○ A plugin is configured to read from a source file, use
dedicated source files per technology
○ Also, use dedicated plugins for each technology
50. Summary
1. What is a regexp?
2. When to use regexp?
3. Regex basics
4. Performance Tests
5. Writing regexp (Performance Strategies)
6. Writing plugins (Performance Strategies)
7. Tools