This document discusses challenges and approaches for performing deep packet inspection (DPI) at speeds of 100 gigabits per second and beyond on x86 platforms. It begins by explaining why DPI is needed at such high speeds, for tasks like large-scale intrusion detection. It then examines the performance requirements for scanning payloads at 100Gbps rates. The document reviews different software approaches for payload matching, such as regular expressions, and hardware that can assist, such as Intel's Hyperscan technology. It also provides examples of how Hyperscan can be integrated into real-world intrusion detection and prevention systems.
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
In the networking world there are a number of ways to increase performance over naive use of basic Berkeley sockets. These techniques have ranged from polling blocking sockets, non-blocking sockets controlled by Epoll, all the way through completely bypassing the Linux kernel for maximum network performance where you talk directly to the network interface card by using something like DPDK or Netmap. All these tools have their place, and generally occupy a space from convenience to performance. But in recent years, that landscape has changed massively.. The tools available to the average Linux systems developer have improved from the creation of io_uring, to the expansion of bpf from a simple filtering language to a full-on programming environment embedded directly in the kernel. Along with that came something called XDP (express datapath). This was Linux kernel's answer to kernel-bypass networking. AF_XDP is the new socket type created by this feature, and generally works very similarly to something like DPDK. History lessons out of the way, this talk will look into, and discuss the merits of this technology, it's place in the broader ecosystem and how it can be used to attain the highest level of performance possible. This talk will dive into crucial details, such as how AF_XDP works, how it can be integrated into a larger system and finally more advanced topics such as request sharding/load balancing. There will be detailed look at the design of AF_XDP, the eBpf code used, as well as the userspace code required to drive it all. It will also include performance numbers from this setup compared to regular kernel networking. And most importantly how to put all this together to handle as much data as possible on a single modern multi-core system.
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
This talk will provide an introduction to injection options of Envoy and then deep dive into ongoing Linux kernel work that enables injecting Envoy while introducing as little latency as possible.
The servicemesh and the sidecar proxy model are on a steep trajectory to redefine many networking and security use cases. This talk explains and demos a new socket redirect Linux kernel technology that allows running Envoy with similar performance as if the sidecar was linked to the application using a UNIX domain socket. The talk will also give an outlook on how Envoy can use the recently merged kernel TLS functionality to gain access to the clear text payload transparently for end to end encrypted applications without requiring to decrypt and re-encrypt any data to further reduce the overhead and latency.
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
Talk for Kernel Recipes 2017 by Brendan Gregg. "Linux perf is a crucial performance analysis tool at Netflix, and is used by a self-service GUI for generating CPU flame graphs and other reports. This sounds like an easy task, however, getting perf to work properly in VM guests running Java, Node.js, containers, and other software, has been at times a challenge. This talk summarizes Linux perf, how we use it at Netflix, the various gotchas we have encountered, and a summary of advanced features."
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
In the networking world there are a number of ways to increase performance over naive use of basic Berkeley sockets. These techniques have ranged from polling blocking sockets, non-blocking sockets controlled by Epoll, all the way through completely bypassing the Linux kernel for maximum network performance where you talk directly to the network interface card by using something like DPDK or Netmap. All these tools have their place, and generally occupy a space from convenience to performance. But in recent years, that landscape has changed massively.. The tools available to the average Linux systems developer have improved from the creation of io_uring, to the expansion of bpf from a simple filtering language to a full-on programming environment embedded directly in the kernel. Along with that came something called XDP (express datapath). This was Linux kernel's answer to kernel-bypass networking. AF_XDP is the new socket type created by this feature, and generally works very similarly to something like DPDK. History lessons out of the way, this talk will look into, and discuss the merits of this technology, it's place in the broader ecosystem and how it can be used to attain the highest level of performance possible. This talk will dive into crucial details, such as how AF_XDP works, how it can be integrated into a larger system and finally more advanced topics such as request sharding/load balancing. There will be detailed look at the design of AF_XDP, the eBpf code used, as well as the userspace code required to drive it all. It will also include performance numbers from this setup compared to regular kernel networking. And most importantly how to put all this together to handle as much data as possible on a single modern multi-core system.
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
This talk will provide an introduction to injection options of Envoy and then deep dive into ongoing Linux kernel work that enables injecting Envoy while introducing as little latency as possible.
The servicemesh and the sidecar proxy model are on a steep trajectory to redefine many networking and security use cases. This talk explains and demos a new socket redirect Linux kernel technology that allows running Envoy with similar performance as if the sidecar was linked to the application using a UNIX domain socket. The talk will also give an outlook on how Envoy can use the recently merged kernel TLS functionality to gain access to the clear text payload transparently for end to end encrypted applications without requiring to decrypt and re-encrypt any data to further reduce the overhead and latency.
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
Talk for Kernel Recipes 2017 by Brendan Gregg. "Linux perf is a crucial performance analysis tool at Netflix, and is used by a self-service GUI for generating CPU flame graphs and other reports. This sounds like an easy task, however, getting perf to work properly in VM guests running Java, Node.js, containers, and other software, has been at times a challenge. This talk summarizes Linux perf, how we use it at Netflix, the various gotchas we have encountered, and a summary of advanced features."
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
Anoop Sam John and Ramkrishna Vasudevan (Intel)
HBase provides an LRU based on heap cache but its size (and so the total data size that can be cached) is limited by Java’s max heap space. This talk highlights our work under HBASE-11425 to allow the HBase read path to work directly from the off-heap area.
Using eBPF for High-Performance Networking in CiliumScyllaDB
The Cilium project is a popular networking solution for Kubernetes, based on eBPF. This talk uses eBPF code and demos to explore the basics of how Cilium makes network connections, and manipulates packets so that they can avoid traversing the kernel's built-in networking stack. You'll see how eBPF enables high-performance networking as well as deep network observability and security.
Vectorized Query Execution in Apache Spark at FacebookDatabricks
A standard query execution system processes one row at a time. Vectorized query execution batches multiples rows together in a columnar format, and each operator uses simple loops to iterate over data within a batch. This feature greatly reduces the CPU usage for reading, writing and query operations like scanning, filtering. In this talk, we will take a deep dive into Facebook's ORC-based vectorized reader and writer implementation, discuss how vectorization affects performance of various data types in Hive/Spark, and quantify the improvements vectorization brings to the Facebook Warehouse.
Speaker: Chen Yang
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Daniel Hochman
Talk given at RedisConf 17 on June 1, 2017 by Daniel Hochman. A video will be published by the conference organizers.
Abstract:
Built-in GEO commands in Redis provide a solid foundation for location-based applications. The scale of Lyft requires a completely different approach to the problem. Learn how to push beyond your constraints to build a highly available, high throughput, horizontally scalable Redis architecture. The techniques presented in this case study are broadly applicable to scaling any type of application powered by Redis. The talk will cover data modeling, open-source solutions, reliability engineering, and Lyft platform.
The Network File System (NFS) Version 4 is a distributed file system similar to previous versions of NFS in its straightforward design, simplified error recovery, and independence of transport protocols and operating systems for file access in a heterogeneous network.
NFS, was developed by Sun Microsystems to provide distributed transparent file access in a heterogeneous network. It achieves this by being relatively simple in design and not relying too heavily on any particular file system model.
This presentation is based on the paper of “The NFS Version 4 Protocol” written by Brian Pawlowski, Spencer Shepler, Carl Beame, Brent Callaghan, Michael Eisler, David Noveck, David Robinson and Robert Thurlow.
Kafka is becoming an ever more popular choice for users to help enable fast data and Streaming. Kafka provides a wide landscape of configuration to allow you to tweak its performance profile. Understanding the internals of Kafka is critical for picking your ideal configuration. Depending on your use case and data needs, different settings will perform very differently. Lets walk through performance essentials of Kafka. Let's talk about how your Consumer configuration, can speed up or slow down the flow of messages to Brokers. Lets talk about message keys, their implications and their impact on partition performance. Lets talk about how to figure out how many partitions and how many Brokers you should have. Let's discuss consumers and what effects their performance. How do you combine all of these choices and develop the best strategy moving forward? How do you test performance of Kafka? I will attempt a live demo with the help of Zeppelin to show in real time how to tune for performance.
Ariel Waizel discusses the Data Plane Development Kit (DPDK), an API for developing fast packet processing code in user space.
* Who needs this library? Why bypass the kernel?
* How does it work?
* How good is it? What are the benchmarks?
* Pros and cons
Ariel worked on kernel development at the IDF, Ben Gurion University, and several companies. He is interested in networking, security, machine learning, and basically everything except UI development. Currently a Solution Architect at ConteXtream (an HPE company), which specializes in SDN solutions for the telecom industry.
In this session, you'll learn how RBD works, including how it:
Uses RADOS classes to make access easier from user space and within the Linux kernel.
Implements thin provisioning.
Builds on RADOS self-managed snapshots for cloning and differential backups.
Increases performance with caching of various kinds.
Uses watch/notify RADOS primitives to handle online management operations.
Integrates with QEMU, libvirt, and OpenStack.
On-demand recording: nginx.com/resources/webinars/nginx-basics-best-practices
You’ve heard of NGINX and the benefits it can provide to your web application, but maybe you’re not sure how to get started. There are a lot of tutorials online, but they can be outdated and contradict each other, making things more challenging. In this webinar we’ll cover the basics of NGINX to help you effectively begin using it as part of your existing or new web app.
This webinar covers how to:
* Install NGINX and verify it's properly running
* Create NGINX configurations for reverse proxy, load balancer, etc.
* Improve performance using keepalives and other NGINX directives
* Debug and troubleshoot using NGINX logs
Delivering High-Availability Web Services with NGINX Plus on AWSNGINX, Inc.
Over 1/3 of websites running on Amazon Web Services (AWS) are delivered and accelerated using NGINX. In this webinar Nginx and Amazon explain how to get started with NGINX Plus on AWS and how to further increase performance and availability of large, dynamic, cloud-based applications integrating with critical AWS services.
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
DPDK (Data Plane Development Kit) Overview by Rami Rosen
* Background and short history
* Advantages and disadvantages
- Very High speed networking acceleration in L2
- How this acceleration is achieved (hugepages, optimizations)
- rte_kni (and KCP)
- VPP (and FD.io project) , providing routing and switching.
- TLDK (Transport Layer Development Kit, TCP/UDP)
* Anatomy of a simple DPDK application.
* Development and governance model
* Testpmd: DPDK CLI tool
* DDP - Dynamic Device Profiles
Rami Rosen is a Linux Kernel expert, the author of "Linux Kernel Networking", Apress, 2014.
Rami had published two articles about DPDK in the last year:
"Network acceleration with DPDK"
https://lwn.net/Articles/725254/
"Userspace Networking with DPDK"
https://www.linuxjournal.com/content/userspace-networking-dpdk
Anoop Sam John and Ramkrishna Vasudevan (Intel)
HBase provides an LRU based on heap cache but its size (and so the total data size that can be cached) is limited by Java’s max heap space. This talk highlights our work under HBASE-11425 to allow the HBase read path to work directly from the off-heap area.
Using eBPF for High-Performance Networking in CiliumScyllaDB
The Cilium project is a popular networking solution for Kubernetes, based on eBPF. This talk uses eBPF code and demos to explore the basics of how Cilium makes network connections, and manipulates packets so that they can avoid traversing the kernel's built-in networking stack. You'll see how eBPF enables high-performance networking as well as deep network observability and security.
Vectorized Query Execution in Apache Spark at FacebookDatabricks
A standard query execution system processes one row at a time. Vectorized query execution batches multiples rows together in a columnar format, and each operator uses simple loops to iterate over data within a batch. This feature greatly reduces the CPU usage for reading, writing and query operations like scanning, filtering. In this talk, we will take a deep dive into Facebook's ORC-based vectorized reader and writer implementation, discuss how vectorization affects performance of various data types in Hive/Spark, and quantify the improvements vectorization brings to the Facebook Warehouse.
Speaker: Chen Yang
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Daniel Hochman
Talk given at RedisConf 17 on June 1, 2017 by Daniel Hochman. A video will be published by the conference organizers.
Abstract:
Built-in GEO commands in Redis provide a solid foundation for location-based applications. The scale of Lyft requires a completely different approach to the problem. Learn how to push beyond your constraints to build a highly available, high throughput, horizontally scalable Redis architecture. The techniques presented in this case study are broadly applicable to scaling any type of application powered by Redis. The talk will cover data modeling, open-source solutions, reliability engineering, and Lyft platform.
The Network File System (NFS) Version 4 is a distributed file system similar to previous versions of NFS in its straightforward design, simplified error recovery, and independence of transport protocols and operating systems for file access in a heterogeneous network.
NFS, was developed by Sun Microsystems to provide distributed transparent file access in a heterogeneous network. It achieves this by being relatively simple in design and not relying too heavily on any particular file system model.
This presentation is based on the paper of “The NFS Version 4 Protocol” written by Brian Pawlowski, Spencer Shepler, Carl Beame, Brent Callaghan, Michael Eisler, David Noveck, David Robinson and Robert Thurlow.
Kafka is becoming an ever more popular choice for users to help enable fast data and Streaming. Kafka provides a wide landscape of configuration to allow you to tweak its performance profile. Understanding the internals of Kafka is critical for picking your ideal configuration. Depending on your use case and data needs, different settings will perform very differently. Lets walk through performance essentials of Kafka. Let's talk about how your Consumer configuration, can speed up or slow down the flow of messages to Brokers. Lets talk about message keys, their implications and their impact on partition performance. Lets talk about how to figure out how many partitions and how many Brokers you should have. Let's discuss consumers and what effects their performance. How do you combine all of these choices and develop the best strategy moving forward? How do you test performance of Kafka? I will attempt a live demo with the help of Zeppelin to show in real time how to tune for performance.
Ariel Waizel discusses the Data Plane Development Kit (DPDK), an API for developing fast packet processing code in user space.
* Who needs this library? Why bypass the kernel?
* How does it work?
* How good is it? What are the benchmarks?
* Pros and cons
Ariel worked on kernel development at the IDF, Ben Gurion University, and several companies. He is interested in networking, security, machine learning, and basically everything except UI development. Currently a Solution Architect at ConteXtream (an HPE company), which specializes in SDN solutions for the telecom industry.
In this session, you'll learn how RBD works, including how it:
Uses RADOS classes to make access easier from user space and within the Linux kernel.
Implements thin provisioning.
Builds on RADOS self-managed snapshots for cloning and differential backups.
Increases performance with caching of various kinds.
Uses watch/notify RADOS primitives to handle online management operations.
Integrates with QEMU, libvirt, and OpenStack.
On-demand recording: nginx.com/resources/webinars/nginx-basics-best-practices
You’ve heard of NGINX and the benefits it can provide to your web application, but maybe you’re not sure how to get started. There are a lot of tutorials online, but they can be outdated and contradict each other, making things more challenging. In this webinar we’ll cover the basics of NGINX to help you effectively begin using it as part of your existing or new web app.
This webinar covers how to:
* Install NGINX and verify it's properly running
* Create NGINX configurations for reverse proxy, load balancer, etc.
* Improve performance using keepalives and other NGINX directives
* Debug and troubleshoot using NGINX logs
Delivering High-Availability Web Services with NGINX Plus on AWSNGINX, Inc.
Over 1/3 of websites running on Amazon Web Services (AWS) are delivered and accelerated using NGINX. In this webinar Nginx and Amazon explain how to get started with NGINX Plus on AWS and how to further increase performance and availability of large, dynamic, cloud-based applications integrating with critical AWS services.
How Netflix Tunes EC2 Instances for PerformanceBrendan Gregg
CMP325 talk for AWS re:Invent 2017, by Brendan Gregg. "
At Netflix we make the best use of AWS EC2 instance types and features to create a high performance cloud, achieving near bare metal speed for our workloads. This session will summarize the configuration, tuning, and activities for delivering the fastest possible EC2 instances, and will help other EC2 users improve performance, reduce latency outliers, and make better use of EC2 features. We'll show how we choose EC2 instance types, how we choose between EC2 Xen modes: HVM, PV, and PVHVM, and the importance of EC2 features such SR-IOV for bare-metal performance. SR-IOV is used by EC2 enhanced networking, and recently for the new i3 instance type for enhanced disk performance as well. We'll also cover kernel tuning and observability tools, from basic to advanced. Advanced performance analysis includes the use of Java and Node.js flame graphs, and the new EC2 Performance Monitoring Counter (PMC) feature released this year."
DPDK (Data Plane Development Kit) Overview by Rami Rosen
* Background and short history
* Advantages and disadvantages
- Very High speed networking acceleration in L2
- How this acceleration is achieved (hugepages, optimizations)
- rte_kni (and KCP)
- VPP (and FD.io project) , providing routing and switching.
- TLDK (Transport Layer Development Kit, TCP/UDP)
* Anatomy of a simple DPDK application.
* Development and governance model
* Testpmd: DPDK CLI tool
* DDP - Dynamic Device Profiles
Rami Rosen is a Linux Kernel expert, the author of "Linux Kernel Networking", Apress, 2014.
Rami had published two articles about DPDK in the last year:
"Network acceleration with DPDK"
https://lwn.net/Articles/725254/
"Userspace Networking with DPDK"
https://www.linuxjournal.com/content/userspace-networking-dpdk
Prezentacja dotycząca wydajnego przetwarzania ruchu IP na PC wygłoszona podczas IT Conference na WAT (http://itacademicday.azurewebsites.net/), listopad 2015.
(Polish only) Talk regarding effective IP traffic processing on x86 platforms, given at IT Academic Day / Military University of Technology in Warsaw, November 2015.
Baromètre EY du capital risque en France - Bilan annuel 2016EY
Le Baromètre EY du capital risque en France recense les opérations de financement en fonds propres des entreprises en phase de création ou durant les premières années d’existence, en date d’opération du 1er janvier au 31 décembre 2016, publiées avant le 20 janvier 2017.
Cloud-native Data: Every Microservice Needs a Cachecornelia davis
Presented at the Pivotal Toronto Users Group, March 2017
Cloud-native applications form the foundation for modern, cloud-scale digital solutions, and the patterns and practices for cloud-native at the app tier are becoming widely understood – statelessness, service discovery, circuit breakers and more. But little has changed in the data tier. Our modern apps are often connected to monolithic shared databases that have monolithic practices wrapped around them. As a result, the autonomy promised by moving to a microservices application architecture is compromised.
With lessons from the application tier to guide us, the industry is now figuring out what the cloud-native architectural patterns are at the data tier. Join us to explore some of these with Cornelia Davis, a five year Cloud Foundry veteran who is now focused on cloud-native data. As it happens, every microservice needs a cache and this evening will drill deep on that topic. She’ll cover a variety of caching patterns and use cases, and demonstrate how their use helps preserve the autonomy that is driving agile software delivery practices today.
Propositions et actions du medef pour le numériqueAdm Medef
Le MEDEF présente ses propositions et ses actions
pour la digitalisation de l’économie française
Le MEDEF a défini une stratégie en faveur de la transformation numérique de l’économie française autour de 5 axes, déclinés en propositions de réformes et en actions :
- Axe 1 / Filière technologique : faire de la France la « Silicon Valley » de l’Europe autour des technologies et des plateformes de la filière IoT (Internet of Things, c’est-à-dire l’Internet des objets) ;
- Axe 2 / Ecosystème industriel : créer un écosystème attractif et compétitif en France autour du prototypage, de la préindustrialisation et de la fabrication de solutions IoT ;
- Axe 3 / Entreprises : Accompagner 100.000 TPE PME et ETI françaises dans leur transformation vers la « Smart economy » : le Programme METAMORPHOSE (sensibilisation, formation, accompagnement et financement)
- Axe 4 / Attractivité : rendre la France « business friendly » pour attirer les investisseurs et favoriser la croissance de nos start-up et PME en ETI et en grandes entreprises ;
- Axe 5 / Communication : mettre en place une stratégie de communication internationale autour de notre vision et de notre stratégie « smart economy ».
LIBER Webinar: Are the FAIR Data Principles really fair?LIBER Europe
The FAIR Data Principles are a hot topic in research data managment. Their adoption within the H2020 funding programme means researchers now have to pay much more attention to how their share, publish and archive their data.
In this light, how can libraries help their research communities implement the FAIR principles? And write better data management plans?
This questions were addressed in a LIBER webinar containing some guidance and reflections on the principles themselves. Presented by Alastair Dunning, Head Research Data Services at the TU Delft (hosts of the 4TU.Centre for Research Data), it is based on a study of 37 data repositories (from subject specific repositories, to generic data archives, to national infrastructures), seeing how far they comply with each of the individual facets of the Data principles.
Project Management Principles to Improve Work, Life, and your Mental HealthDenise (Dee) Teal
Having moved from Front End development has presented some fantastic learning opportunities for the author on how to bring order from the chaos of beginning a large scale WordPress project.
These principles can be appropriated to small business AND to personal and home based projects. The net result is can be reduced stress and anxiety...
Here are some of those principles.
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Andrés de la Peña
This presentation introduces the open sourced Lucene based implementation of the Cassandra secondary indexes developed by Stratio. It allows users to make complex queries in Cassandra using CQL3, including full text search, top-k queries and free multivariable search. Relevance queries and filters can be combined to make searches such as “give me the 100 tweets that best matches this phrase of those written in a certain date range”.
Cluster-wide relevance search allows retrieving the N more relevant results that meet a given condition. It’s done through a modified version of Cassandra’s storage proxy in which the coordinator node requests the N best results of each node in the cluster in parallel and combines their partial results to get the N best of them.
Stratio’s index is fully compatible with Apache Spark and Apache Hadoop because it supports all the key/token restrictions in the CQL3 statements. Filters are a powerful help when analyzing the data stored in Cassandra with MapReduce frameworks such as Hadoop or, even better, Spark. Filtering the job input avoids full data scanning, dramatically reducing the amount of data to be processed.
Any cell in the tables can be indexed, including primary keys as well as collections. CQL3 wide rows are also supported.
Buyers no longer use voicemails and emails from strangers to learn about products. This information is online, whenever buyers are interested. This SlideShare presentation show sellers how to connect in a meaningful way by starting conversations around the buyer’s plans, goals and challenges.
This presentation is one class in HubSpot Academy's free sales training course. You can enroll here: http://certification.hubspot.com/inbound-sales-certification
Modern Prospecting Techniques for Connecting with Prospects (from Sales Hacke...HubSpot
Sales is a difficult world to be in because buyers aren't putting up with salespeople anymore. Instead of helping and building relationships, sales reps are still focused on closing prospects - even when they aren't ready to buy! So buyers ignore them. Because of that, even great sales reps would be lucky to get on the phone with someone.
While buyers have evolved and become more sophisticated, sales reps and training programs have been slow to adapt to that change.
Learn actionable modern prospecting techniques you can apply immediately from two best selling authors and sales experts: Max Altschuler CEO of Sales Hacker, and Mark Roberge CRO of HubSpot.
Class 1: Email Marketing Certification course: Email Marketing and Your BusinessHubSpot
*From HubSpot Academy*
Over the past few decades, people have radically changed the way they live, work and buy. This class will give you an overview of an adaptive, inbound approach to sending emails that provide value and drive growth for your business. It will also teach you about the four big themes of a modern email marketing program: segmentation, personalization, mobile, and optimization.
Apache Solr on Hadoop is enabling organizations to collect, process and search larger, more varied data. Apache Spark is is making a large impact across the industry, changing the way we think about batch processing and replacing MapReduce in many cases. But how can production users easily migrate ingestion of HDFS data into Solr from MapReduce to Spark? How can they update and delete existing documents in Solr at scale? And how can they easily build flexible data ingestion pipelines? Cloudera Search Software Engineer Wolfgang Hoschek will present an architecture and solution to this problem. How was Apache Solr, Spark, Crunch, and Morphlines integrated to allow for scalable and flexible ingestion of HDFS data into Solr? What are the solved problems and what's still to come? Join us for an exciting discussion on this new technology.
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
A presentation cum workshop on Real time Analytics with Apache Kafka and Apache Spark. Apache Kafka is a distributed publish-subscribe messaging while other side Spark Streaming brings Spark's language-integrated API to stream processing, allows to write streaming applications very quickly and easily. It supports both Java and Scala. In this workshop we are going to explore Apache Kafka, Zookeeper and Spark with a Web click streaming example using Spark Streaming. A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing.
As presented at LinuxCon/CloudOpen 2015 Seattle Washington, August 19th 2015. Sagi Brody & Logan Best
This session will focus on real world deployments of DDoS mitigation strategies in every layer of the network. It will give an overview of methods to prevent these attacks and best practices on how to provide protection in complex cloud platforms. The session will also outline what we have found in our experience managing and running thousands of Linux and Unix managed service platforms and what specifically can be done to offer protection at every layer. The session will offer insight and examples from both a business and technical perspective.
The overall volume of Internet traffic has been growing in a tremendous rate day-by-day which also contains unwanted malicious traffic. It has been a continuous challenge for the network operators to effectively identify the threats from line rate traffic. Hyperscan is a pattern (in terms of regular expression) matching software ideal for applications such as intrusion prevention/detection system, antivirus, unified threat management, deep packet inspection systems, etc.
Hyperscan works in two phases. At first, the customer patterns are parsed and compiled into databases in terms of bytecode. During runtime, these bytecode are used to search for patterns against blocks/streams data. Hyperscan library runs entirely in software and scales with IA processors to provide the maximum throughput of 293 Gbps.
http://bit.ly/1BTaXZP – As organizations look for even faster ways to derive value from big data, they are turning to Apache Spark is an in-memory processing framework that offers lightning-fast big data analytics, providing speed, developer productivity, and real-time processing advantages. The Spark software stack includes a core data-processing engine, an interface for interactive querying, Spark Streaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction the Spark stack, explain how Spark has lighting fast results, and how it complements Apache Hadoop. By the end of the session, you’ll come away with a deeper understanding of how you can unlock deeper insights from your data, faster, with Spark.
Large-scale projects development (scaling LAMP)Alexey Rybak
This 8-hours tutorial was given at various conferences including Percona conference (London), DevConf (Moscow), Highload++ (Moscow).
ABSTRACT
During this tutorial we will cover various topics related to high scalability for the LAMP stack. This workshop is divided into three sections.
The first section covers basic principles of shared nothing architectures and horizontal scaling for the app//cache/database tiers.
Section two of this tutorial is devoted to MySQL sharding techniques, queues and a few performance-related tips and tricks.
In section three we will cover the practical approach for measuring site performance and quality, porviding a "lean" support philosophy, connecting buesiness and technology metrics.
In addition we will cover a very useful Pinba real-time statistical server, it's features and various use cases. All of the sections will be based on real-world examples built in Badoo, one of the biggest dating sites on the Internet.
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseDatabricks
The ever-growing continuous influx of data causes every component in a system to burst at its seams. GPUs and ASICs are helping on the compute side, whereas in-memory and flash storage devices are utilized to keep up with those local IOPS. All of those can perform extremely well in smaller setups and under contained workloads. However, today's workloads require more and more power that directly translates into higher scale. Training major AI models can no longer fit into humble setups. Streaming ingestion systems are barely keeping up with the load. These are just a few examples of why enterprises require a massive versatile infrastructure, that continuously grows and scales. The problems start when workloads are then scaled out to reveal the hardships of traditional network infrastructures in coping with those bandwidth hungry and latency sensitive applications. In this talk, we are going to dive into how intelligent hardware offloads can mitigate network bottlenecks in Big Data and AI platforms, and compare the offering and performance of what's available in major public clouds, as well as a la carte on-premise solutions.
WarsawITDays_ ApacheNiFi202
Apache NiFi 202: Integration and Best Practices
Timothy Spann, Principal Developer Advocate, Cloudera
APACHE NIFI, BEST PRACTICES, STREAMING, KAFKA, PULSAR, FLINK
MNIEJ
https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html https://github.com/tspannhw/EverythingApacheNiFi https://www.datainmotion.dev/2020/12/basic-understanding-of-cloudera-flow.html https://www.datainmotion.dev/2020/10/top-25-use-cases-of-cloudera-flow.html In this talk, we will walk step by step through Apache NiFi from the first load to the first application. I will include slides, articles, and examples to take away as a Quick Start to utilizing Apache NiFi in your real-time dataflows. I will help you get up and running locally on your laptop or Docker. We will also show how to connect NiFi to Kafka, Pulsar, Flink, MQTT, and databases. I will cover: Advanced Record Processing Provenance Scheduling and Cron Relationships Routing Basic Cluster Architecture Listeners Controller Services Handling Errors Sources and Sinks
Timothy Spann,
Principal Developer Advocate, Cloudera
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...inside-BigData.com
In this deck from the Stanford HPC Conference, DK Panda from Ohio State University presents: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing.
"This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern HPC clusters. An overview of RDMA-based designs for Hadoop (HDFS, MapReduce, RPC and HBase), Spark, Memcached, Swift, and Kafka using native RDMA support for InfiniBand and RoCE will be presented. Enhanced designs for these components to exploit NVM-based in-memory technology and parallel file systems (such as Lustre) will also be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project (http://hibd.cse.ohio-state.edu) will be shown."
Watch the video: https://youtu.be/iLTYkTandEA
Learn more: http://web.cse.ohio-state.edu/~panda.2/
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
Splice Machine is an open-source database that combines the benefits of modern lambda architectures with the full expressiveness of ANSI-SQL. Like lambda architectures, it employs separate compute engines for different workloads - some call this an HTAP database (Hybrid Transactional and Analytical Platform). This talk describes the architecture and implementation of Splice Machine V2.0. The system is powered by a sharded key-value store for fast short reads and writes, and short range scans (Apache HBase) and an in-memory, cluster data flow engine for analytics (Apache Spark). It differs from most other clustered SQL systems such as Impala, SparkSQL, and Hive because it combines analytical processing with a distributed Multi-Value Concurrency Method that provides fine-grained concurrency which is required to power real-time applications. This talk will highlight the Splice Machine storage representation, transaction engine, cost-based optimizer, and present the detailed execution of operational queries on HBase, and the detailed execution of analytical queries on Spark. We will compare and contrast how Splice Machine executes queries with other HTAP systems such as Apache Phoenix and Apache Trafodian. We will end with some roadmap items under development involving new row-based and column-based storage encodings.
Speakers:
Monte Zweben, is a technology industry veteran. Monte’s early career was spent with the NASA Ames Research Center as the Deputy Chief of the Artificial Intelligence Branch, where he won the prestigious Space Act Award for his work on the Space Shuttle program. He then founded and was the Chairman and CEO of Red Pepper Software, a leading supply chain optimization company, which merged in 1996 with PeopleSoft, where he was VP and General Manager, Manufacturing Business Unit. In 1998, he was the founder and CEO of Blue Martini Software – the leader in e-commerce and multi-channel systems for retailers. Blue Martini went public on NASDAQ in one of the most successful IPOs of 2000, and is now part of JDA. Following Blue Martini, he was the chairman of SeeSaw Networks, a digital, place-based media company. Monte is also the co-author of Intelligent Scheduling and has published articles in the Harvard Business Review and various computer science journals and conference proceedings. He currently serves on the Board of Directors of Rocket Fuel Inc. as well as the Dean’s Advisory Board for Carnegie-Mellon’s School of Computer Science.
Similar to Spy hard, challenges of 100G deep packet inspection on x86 platform (20)
Using open source tools for network device dataplane testing.
Our experiences from redGuardian DDoS mitigation scrubber testing.
Presented at PLNOG 20 (2018).
Ochrona przed atakami DDoS na platformie x86. Czy można mieć jednocześnie wyd...Redge Technologies
(Polish only)
Gaming/DDoS mitigation/x86 performance and elasticity.
Talk given at Net::IP meetup in Wrocław, Poland (2017.05): https://www.meetup.com/Wroclaw-Net-IP-Meetup/events/238738376/
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
3. Deep packet inspection (DPI)
no DPI
• packet header lookup
• route based on destination (unless
PBR)
• classify with static rules or state data
• cheap
DPI
• packet header and payload lookup
• may route based on content (e.g.
uplinks for priority and `bulky’ traffic)
• classify with static rules, state data,
multiple patterns and custom logic
• expensive?
3
4. 100+ Gbit DPI – why?
• end customers typically < 10G uplinks
– L7 filtering (WAF, IPS etc.) requested by enterprises
– multiple IDS, IPS, NGFW, UTM and WAFs on the market
– can be handled with open source tools
• 100G+ speeds: ISP/Telco/large DCs
– do not want to interfere with traffic
• unless hit by huge DDoS attack
• or kindly asked by local régime
4
5. Mirai botnet attacks – examples
• attack_tcp_stomp
– establish legal TCP connection, then flood it
– not to confuse with STOMP protocol
• attack_udp_dns
– DNS „water torture”, FQDN with random host
• attack_app_http
– HTTP request flood
• attack_app_cfnull
– HTTP POST junk
5
source: https://github.com/rosgos/Mirai-Source-Code
DPI may help
easy :)
6. Large DDoS attacks in 2016 – examples
1. 150M pps (650Gbps) of TCP SYN packets (mixed size), spoofed IPs
2. 1.75M rps peak of HTTP requests (~121B/r) from ~52k src IPs
3. 220k rps (360Gbps) of large HTTP requests from ~128k src IPs
4. ~1Tbps of recursive „water torture” DNS queries
sources:
• https://blog.cloudflare.com/say-cheese-a-snapshot-of-the-massive-ddos-attacks-coming-from-iot-cameras/
• https://www.incapsula.com/blog/650gbps-ddos-attack-leet-botnet.html
• http://dyn.com/blog/dyn-analysis-summary-of-friday-october-21-attack/
6
DPI may help
7. 100Gbit/s sizing
• ~148.8 Mpps in small frames, but no payload to scan
• ~8.127 Mpps in 1514B frames
• ~12.19 GB/s of IP payload
• given 16 core machine, our target is:
– ~0.5M – 2M lookups /s per core
– up to ~762 MB/s per core
– note: not all packets and not entire payloads have to be scanned
7
8. Payload lookup – position
• fixed
– e.g. NTP
• network protocol aware
– e.g. DNS
• application aware
– e.g. HTTP
• anywhere in the packet
– bad idea
$ strings /usr/bin/* | grep -c sex
93
8
9. Protocol design rant
"string: variable-length byte field, encoded in UTF-8, terminated by 0x00”
source: https://developer.valvesoftware.com/wiki/Server_queries
9
10. Software payload lookup – approaches
Method Example
fixed position literal matching (sequence) <you name it>
fixed position literal matching (trie) DPDK ACL
computed position literal matching tc u32
application aware classifier nDPI, netfilter l7-filter
application level gateway (ALG) netfilter nf_conntrack_*
programmable data path netfilter xt_bpf, nftables, XDP+eBPF
embedded scripting language NPFLua, pflua
hybrid with state machines Hyperscan, Tempesta FW
regexp engine Bro, Snort, Suricata
10
13. Finite–state machine
• abstract machine
• has states and transitions
• some states are "accept states"
• input updates machine state
• accepts and rejects input sequence
of symbols
sources:
• https://en.wikipedia.org/wiki/State_diagram
• https://en.wikipedia.org/wiki/Deterministic_finite_automaton
example: accepts binary strings with even number of zeroes
13
14. DFA vs. NFA
• Deterministic finite automaton (DFA)
– each of its transitions is uniquely determined by its source state and input
symbol
– reading an input symbol is required for each state transition.
• Nondeterministic finite automaton (NFA) otherwise
• NFA can be converted to DFA
– DFA is efficient to execute, but may grow
– NFA is easier to construct, but may be slower
tools:
• http://hackingoff.com/compilers/regular-expression-to-nfa-dfa
• http://ivanzuzak.info/noam/webapps/fsm_simulator/
14
15. PCRE vs. DFA and NFA
• PCRE (Perl Compatible Regular Expression) engine is powerful
• typical PCRE engine comes as NFA + backtracking
• DFA matches regular language (pure) thus can be used to match only
some of PCREs
• less features, faster engines!
– Hyperscan, https://01.org/hyperscan
– Perl Incompatible Regular Expressions, https://github.com/yandex/pire
15
16. Features considered harmful
• back-tracking (trial and error)
• back references 1
• lookarounds (lookahead, lookbehind) (?<!a)b
• conditional regexps (?(?=regex)then|else)
16
see also: http://www.regular-expressions.info
17. Case: catastrophic backtracking
• 34 min Stack Overflow outage in 2016
• s+$
• „malformed post contained roughly 20,000 consecutive
characters of whitespace on a comment line”
• O(n2)
• in other cases it may be 2n
sources:
• http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016
• http://www.regular-expressions.info/catastrophic.html
17
>>> sum(range(0,20001))
200010000
18. Sources
1. „Finite State Machine Parsing for Internet Protocols: Faster Than You Think”,
http://www.cs.dartmouth.edu/~pete/pubs/LangSec-2014-fsm-parsers.pdf
2. „100G Intrusion Detection”, http://go.lbl.gov/100g
3. „DotStar: Breaking the Scalability and Performance Barriers in Regular Expression Set Matching”,
http://domino.watson.ibm.com/library/cyberdig.nsf/papers/F38C0227DBF5C7E78525758C005BD05C/$File/rc24645.pdf
4. „Fast Regular Expression Matching Using Dual Glushkov NFA”,
https://www-alg.ist.hokudai.ac.jp/~thomas/TCSTR/tcstr_14_73/tcstr_14_73.pdf
5. PIRE discussion: https://news.ycombinator.com/item?id=10209775
18
20. What is Hyperscan?
• „high-performance multiple regex matching library”
• C (run-time, API) and C++ (compiler), BSD licensed
• runs on Intel CPUs only, uses:
– SIMD (Single Instruction, Multiple Data)
– BMI (Bit Manipulation Instruction Sets)
• „typically used in a DPI library stack”
20
21. Hyperscan history
• developed by Sensory Networks
• 2003-2008 hardware prototypes (GPGPU, FPGA), NodalCore C-series accelerators
• 2009 software-based Hyperscan created (note: hardware approach dead end)
• 2009-2015 evolution (commercial)
• 2015 acquired by Intel, released on BSD license
• 2017 v4.4 release
sources:
• https://01.org/hyperscan
• https://lists.01.org/pipermail/hyperscan/2017-January/000078.html
• "Hyperscan In SURICATA: STATE OF THE UNION"
21
23. How it works – regexp database
# pattern flags min offset max offset min length
0 ^foo
1 bar$
2 w+bazs{2} singlematch
3 d+ leftmost 5
4 loremnipsum dotall 10
n ^(all|your|base) caseless 15
23
database is a group of regexps and their settings, thousands of regexps possible
24. How it works – independent scanning contexts
24
regex
database
compiled
earlierinput core 0
matcher, local data (scratch)
input core n
matcher, local data (scratch)
25. How it works
• may return multiple matches
• by default, returns only end offset
• not greedy
• regexp expression parsed and split into:
– literals (fixed strings)
– DFA engines
– NFA engines
– custom engines (prefix, suffix, infix, outfix)
– not Aho-Corasick
• scanning mode – block, streaming, vectored
25
PCMPEQB (compare packed bytes in
xmm2/m128 and xmm1 for equality)
POPCNT (return the Count of Number
of Bits Set to 1)
26. DPDK ACL vs. Hyperscan regexp
DPDK ACL
• compiled to „ACL”
• fixed position pattern
• looks up all fields in the packet
• looks up multiple packets at once in
one ACL (up to 16 categories)
• predictable speed
• returns one match (highest priority) per
category
regexp as ACL1
• compiled to „DB”
• dynamic position pattern
• skip not relevant fields
• looks up one packet in DB (multiple
regexps at once)
• speed depends on input
• may return multiple matches
26
1 speculation, v4.5 is not released yet
27. Sources (Hyperscan)
1. http://01org.github.io/hyperscan/
2. http://www.slideshare.net/harryvanhaaren/hyperscan-mohammad-abdul-awal
3. „HYPERSCAN PERFORMANCE BENCHMARK ON INTEL XEON PROCESSORS, Delivering 160 Gbps DPI Throughput on the Intel
Xeon Processor E5-2600 Series”,
https://networkbuilders.intel.com/docs/1645-Hyperscan-Performance-Benchmark-on-Intel-Xeon-Processors.pdf
4. „HOW WE MATCH REGULAR EXPRESSIONS”, https://01.org/node/3777
5. „Hyperscan Glossary, a few philosophical points”, https://lists.01.org/pipermail/hyperscan/2016-September/000035.html
6. „Software-based Acceleration of Deep Packet Inspection on Intel Architecture”,
https://openisf.files.wordpress.com/2015/11/oisf-keynote-2015-geoff-langdale.pdf
7. "Hyperscan In SURICATA: STATE OF THE UNION",
http://suricon.net/wp-content/uploads/2016/11/SuriCon2016_GeoffLangdale.pdf
8. „Hyperscan in Rspamd”, http://www.slideshare.net/VsevolodStakhov/rspamdhyperscan
9. https://www.reddit.com/r/cpp/comments/3picdx/hyperscan_highperformance_multiple_regex_matching/
27
29. Basic benchmark
• Xeon E3-1231 v3 @ 3.40GHz, turbo mode disabled, 10G ixgbe port, 1 core
• two cache lines prefetched
• results in Mpps
29
network net.1 acl
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 0
pass
end
regex baz "^foobar"
network net.1 acl
regex drop baz pass udp
pass
end
plnog_udp_acl rx_median 12.912; tx_median 0.000; gen_rx 0.000; gen_tx 14.881
plnog_udp_regexp rx_median 9.832; tx_median 0.000; gen_rx 0.000; gen_tx 14.881
30. Basic benchmark
// ETH() / IP() / UDP() / ('x'*64 + 'foobar')
regex baz "^(.{8}){0,8}foobar"
network net.1 acl
regex drop baz pass udp
pass
end
matching
plnog_udp_acl_many rx_median 5.846; tx_median 0.000; gen_rx 0.000; gen_tx 9.191
plnog_udp_regexp_many rx_median 2.921; tx_median 0.000; gen_rx 0.000; gen_tx 9.191
not matching
plnog_udp_acl_many rx_median 4.518; tx_median 4.518; gen_rx 4.517; gen_tx 9.124
plnog_udp_regexp_many rx_median 5.352; tx_median 5.352; gen_rx 5.353; gen_tx 9.124
30
network net.1 acl
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 0
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 8
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 16
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 24
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 32
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 40
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 48
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 56
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 64
pass
end
31. Summary
• header and payload are the same
• regexp engines can be fast
• careful benchmarking required
• x86 platform can compete with „hardware appliances”
31