Netty is a Java framework that provides tools for developing high performance and event-driven network applications. It uses non-blocking I/O and zero-copy techniques to minimize overhead and maximize throughput and scalability. Netty provides buffers, codecs, pipelines and handlers that allow building applications as a stack of processing layers. Example applications include a discard server and an HTTP file server that demonstrate Netty's core features and event-driven architecture.
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
Apache Hudi is an open data lake platform, designed around the streaming data model. At its core, Hudi provides a transactions, upserts, deletes on data lake storage, while also enabling CDC capabilities. Hudi also provides a coherent set of table services, which can clean, compact, cluster and optimize storage layout for better query performance. Finally, Hudi's data services provide out-of-box support for streaming data from event systems into lake storage in near real-time.
In this talk, we will walk through an end-end use case for change data capture from a relational database, starting with capture changes using the Pulsar CDC connector and then demonstrate how you can use the Hudi deltastreamer tool to then apply these changes into a table on the data lake. We will discuss various tips to operationalizing and monitoring such pipelines. We will conclude with some guidance on future integrations between the two projects including a native Hudi/Pulsar connector and Hudi tiered storage.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...HostedbyConfluent
Imagine a world where you can access metrics, events, traces, and logs in seconds without changing code. Even more, a world where you can run scripts to debug metrics as code. In this session, you will learn about eBPF, a powerful technology with origins in the Linux kernel that holds the potential to fundamentally change how Networking, Observability, and Security are delivered.
We’ll see eBPF in action applied to the Kafka world: identify Kafka consumers, producers, and brokers, see how they interact with each other and how many resources they consume. We'll even learn how to measure consumer lag without external components. If you want to know what’s next in Kafka observability, this session is for you.
Watch this talk here: https://www.confluent.io/online-talks/from-zero-to-hero-with-kafka-connect-on-demand
Integrating Apache Kafka® with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working.
This talk will discuss the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We'll do a live demo of building pipelines with Apache Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we'll go hands-on in methodically diagnosing and resolving common issues encountered with Apache Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Apache Kafka Connect in containers.
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
Apache Hudi is an open data lake platform, designed around the streaming data model. At its core, Hudi provides a transactions, upserts, deletes on data lake storage, while also enabling CDC capabilities. Hudi also provides a coherent set of table services, which can clean, compact, cluster and optimize storage layout for better query performance. Finally, Hudi's data services provide out-of-box support for streaming data from event systems into lake storage in near real-time.
In this talk, we will walk through an end-end use case for change data capture from a relational database, starting with capture changes using the Pulsar CDC connector and then demonstrate how you can use the Hudi deltastreamer tool to then apply these changes into a table on the data lake. We will discuss various tips to operationalizing and monitoring such pipelines. We will conclude with some guidance on future integrations between the two projects including a native Hudi/Pulsar connector and Hudi tiered storage.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...HostedbyConfluent
Imagine a world where you can access metrics, events, traces, and logs in seconds without changing code. Even more, a world where you can run scripts to debug metrics as code. In this session, you will learn about eBPF, a powerful technology with origins in the Linux kernel that holds the potential to fundamentally change how Networking, Observability, and Security are delivered.
We’ll see eBPF in action applied to the Kafka world: identify Kafka consumers, producers, and brokers, see how they interact with each other and how many resources they consume. We'll even learn how to measure consumer lag without external components. If you want to know what’s next in Kafka observability, this session is for you.
Watch this talk here: https://www.confluent.io/online-talks/from-zero-to-hero-with-kafka-connect-on-demand
Integrating Apache Kafka® with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working.
This talk will discuss the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We'll do a live demo of building pipelines with Apache Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we'll go hands-on in methodically diagnosing and resolving common issues encountered with Apache Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Apache Kafka Connect in containers.
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
Did you like it? Check out our E-book: Apache NiFi - A Complete Guide
https://ebook.getindata.com/apache-nifi-complete-guide
Apache NiFi is one of the most popular services for running ETL pipelines otherwise it’s not the youngest technology. During the talk, there are described all details about migrating pipelines from the old Hadoop platform to the Kubernetes, managing everything as the code, monitoring all corner cases of NiFi and making it a robust solution that is user-friendly even for non-programmers.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job?
This session explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.
No matter if you think about open source Apache Kafka, a cloud service like Confluent Cloud, or another technology using the Kafka protocol like Redpanda or Pulsar, check out this slide deck.
A detailed article about this topic:
https://www.kai-waehner.de/blog/2022/01/04/when-not-to-use-apache-kafka/
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
Những năm gần đây, cùng với sự bùng nổ của các startup cùng các loại công nghệ như máy học, lượng dữ liệu phát sinh cần thu thập và xử lý trong các hệ thống ngày càng tăng cao.
Chính vì vậy, đối với các hệ thống lớn thì việc lưu trữ và xử lý dữ liệu trên một node database đã không đáp ứng được nữa, đòi hỏi phải sử dụng nhiều node kết nối với nhau để hình thành database cluster.
Đối với các database cluster nói riêng và hệ thống Distributed System nói chung, có khá nhiều chủ đề thú vị để đào sâu. Trong buổi thảo luận này, chúng ta sẽ giới hạn trong việc khảo sát về cách ba hệ thống Redis, Elastic Search và Cassandra tổ chức cluster cũng như sự trade-off giữa tính nhất quán (consistency) và khả năng đáp ứng (availability) của ba hệ thống này.
- Speaker: Lộc Võ - Lead Software Engineer @ Grab
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
A talk given at ACM SIGMOD 2018 in support of the paper <a href="https://arxiv.org/abs/1802.10233"> Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources</a>.
Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
A short presentation about practical aspects of asynchronous data transfer with Netty
HTML version: http://vcherkassky.github.com/reveal.js/netty.html
Additional resources (from last slide):
* https://github.com/netty/netty
* http://seeallhearall.blogspot.co.uk/2012/05/netty-tutorial-part-1-introduction-to.html
Done with reveal.js: https://github.com/hakimel/reveal.js/
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
Did you like it? Check out our E-book: Apache NiFi - A Complete Guide
https://ebook.getindata.com/apache-nifi-complete-guide
Apache NiFi is one of the most popular services for running ETL pipelines otherwise it’s not the youngest technology. During the talk, there are described all details about migrating pipelines from the old Hadoop platform to the Kubernetes, managing everything as the code, monitoring all corner cases of NiFi and making it a robust solution that is user-friendly even for non-programmers.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job?
This session explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.
No matter if you think about open source Apache Kafka, a cloud service like Confluent Cloud, or another technology using the Kafka protocol like Redpanda or Pulsar, check out this slide deck.
A detailed article about this topic:
https://www.kai-waehner.de/blog/2022/01/04/when-not-to-use-apache-kafka/
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
Những năm gần đây, cùng với sự bùng nổ của các startup cùng các loại công nghệ như máy học, lượng dữ liệu phát sinh cần thu thập và xử lý trong các hệ thống ngày càng tăng cao.
Chính vì vậy, đối với các hệ thống lớn thì việc lưu trữ và xử lý dữ liệu trên một node database đã không đáp ứng được nữa, đòi hỏi phải sử dụng nhiều node kết nối với nhau để hình thành database cluster.
Đối với các database cluster nói riêng và hệ thống Distributed System nói chung, có khá nhiều chủ đề thú vị để đào sâu. Trong buổi thảo luận này, chúng ta sẽ giới hạn trong việc khảo sát về cách ba hệ thống Redis, Elastic Search và Cassandra tổ chức cluster cũng như sự trade-off giữa tính nhất quán (consistency) và khả năng đáp ứng (availability) của ba hệ thống này.
- Speaker: Lộc Võ - Lead Software Engineer @ Grab
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
A talk given at ACM SIGMOD 2018 in support of the paper <a href="https://arxiv.org/abs/1802.10233"> Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources</a>.
Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
A short presentation about practical aspects of asynchronous data transfer with Netty
HTML version: http://vcherkassky.github.com/reveal.js/netty.html
Additional resources (from last slide):
* https://github.com/netty/netty
* http://seeallhearall.blogspot.co.uk/2012/05/netty-tutorial-part-1-introduction-to.html
Done with reveal.js: https://github.com/hakimel/reveal.js/
Building scalable network applications with Netty (as presented on NLJUG JFal...Jaap ter Woerds
The presentation I gave on creating server application with Netty, including an example of how it is used to power XMS the mobile messaging platform of eBuddy.
Example code is on github: https://github.com/jaapterwoerds/jfall-netty4
More information on eBuddy: xms.me and tech.ebuddy.com
This presentation on building servers explains what is Netty, why choosing it and shows how with very little code you can build an asynchronous app server.
Netty Notes Part 3 - Channel Pipeline and EventLoopsRick Hightower
Learning more about Netty helps me understand Vert.x better. Netty in Action is a great book. The threading model of Netty is very important to understanding event loops and reactive programming.
Netty is an asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients. AND IT'S TRUE!
In this talk given at JBCNConf 2015 in Barcelona, we will see how we use Netty at Trovit since 2013, what brought to us and how it opened our minds. We will share tips that helped us to learn more about Netty, some performance tricks and all things that worked for us.
Netty Notes Part 2 - Transports and BuffersRick Hightower
Continues on from Part 1 of Netty Notes which covered an overview of Netty concepts. Dives into transports and buffer usage, and why Netty matters for performance.
Asynchronous, Event-driven Network Application Development with NettyErsin Er
"Asynchronous, Event-driven Network Application Development with Netty" presented at Ankara JUG in 2015, June.
The presentation starts with motivations for Non-Blocking I/O and continues with general overview of NIO and Netty. The actual talk was supplied with Netty's own examples.
High speed networks and Java (Ryan Sciampacone)Chris Bailey
Networking technology has improved constantly over time, and it is now regularly possible to get bandwidths of 10 Gbps and often considerably more. Is this purely “free speed,” or does it simply create new application bottlenecks and scaling challenges? This session begins by discussing how to enable Java for high-speed communications, such as SDP, and then moves on to sharing some hard-learned real-world experiences showing how improving network speeds often results in unexpected surprises. Come hear about the amazing promise of RDMA and the sometimes sobering reality of high-speed networks. Take away a clear view of the issues, and hear some practical advice on achieving great performance when moving Java applications to high-speed networks.
Seven years ago at LCA, Van Jacobsen introduced the concept of net channels but since then the concept of user mode networking has not hit the mainstream. There are several different user mode networking environments: Intel DPDK, BSD netmap, and Solarflare OpenOnload. Each of these provides higher performance than standard Linux kernel networking; but also creates new problems. This talk will explore the issues created by user space networking including performance, internal architecture, security and licensing.
Seastar at Linux Foundation Collaboration SummitDon Marti
We have developed a new framework, Seastar, for high-throughput server applications, along with a key-value store capable of millions of transactions per second. Seastar, which runs on OSv and Linux, is completely asynchronous and based on shared-nothing data structures that eliminate costly locking between CPUs. SeaStar is event-driven and supports writing non-blocking, asynchronous server code in a straightforward manner that facilitates debugging and reasoning about performance.
Notes on a High-Performance JSON ProtocolDaniel Austin
This is my presentation from JSConf 2011. I am proposing a new Web protocol to improve performance across the Internet. It's based on a dual-band protocol layered over TCP/IP and UDP and is backward compatible with existing HTTP-based systems.
Internet of Threads (IoTh), di Renzo Davoli (VirtualSquare) Codemotion
Cosa connette Internet? Se inizialmente fu disegnata per connettere macchine, oggi connette conoscenza. Questo cambiamento di prospettiva si puo' rileggere dal punti di vista della struttura dei sistemi operativi e delle reti superando la logica dello stack unico per macchina (per macchina virtuale o "container"). Con IoTh la creazione di stack, assegnamento di indizzi IP diventa una operazione ordinaria (non per sysadm), come la scelta della stampante. Si aprono cosi' nuovi orizzonti.
Overview of the HARE project looking at new models for operating systems services and application runtimes at the scale of thousands or millions of nodes.
The Transmission Control Protocol (TCP) is used by the vast majority of applications to transport their data reliably across the Internet and in the cloud. TCP was designed in the 1970s and has slowly evolved since then. Today's networks are multipath: mobile devices have multiple wireless interfaces, datacenters have many redundant paths between servers, and multihoming has become the norm for big server farms. Meanwhile, TCP is essentially a single-path protocol: when a TCP connection is established, the connection is bound to the IP addresses of the two communicating hosts and these cannot change. Multipath TCP (MPTCP) is a major modification to TCP that allows multiple paths to be used simultaneously by a single transport connection. Multipath TCP circumvents the issues mentioned above and several others that affect TCP. The IETF is currently finalising the Multipath TCP RFC and an implementation in the Linux kernel is available today.
This tutorial will present in details the design of Multipath TCP and the role that it could play in cloud environments. We will start with a presentation of the current Internet landscape and explain how various types of middleboxes have influenced the design of Multipath TCP. Second we will describe in details the connection establishment and release procedures as well as the data transfer mechanisms that are specific to Multipath TCP. We will then discuss several use cases for the deployment of Multipath TCP including improving the performance of datacenters and
mobile WiFi offloading on smartphones. All these use cases are key when both accessing cloud-based services or when providing them. We will end the tutorial with some open research issues.
This tutorial was given at the IEEE Cloud'Net 2012 conference in novembrer 2012.
The pptx version containing animations that are not shown here is available from http://www.multipath-tcp.org
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Igalia
By Andy Wingo.
It used to be that to set up a serious network, you needed to stock racks and racks with specialized proprietary single-purpose boxes. This was because only specialized hardware could handle the hundreds of gigabits per second that might flow through any given box.
Things have changed. With the rise of cheap commodity Xeon-based servers and widespread availability of 10 gigabit network cards, an off-the-shelf server with a few NICs can now handle the workload. The age of open source software-driven routers is fully here -- but it doesn't look like what we thought it would, 10 years ago.
We thought it would be Linux everywhere, but it turns out that Linux's networking stack is just too slow. To get around this problem, modern high-speed software switches bypass the kernel entirely, instead booting network cards and handling traffic entirely from user-space. The up-side of this is that now we have the possibility of using pleasant, hackable, open source, standalone software stacks to deliver network applications that are tailored to specific needs.
This talk presents Snabb, a toolkit for building user-space network functions. Snabb is entirely written in the expressive Lua language, minimizing the amount of code that you have to write to get stuff done. Snabb specifically uses the LuaJIT implementation of Lua, giving us excellent code generation as
well as efficient access to low-level binary data and AVX2 assembly generation.
Snabb's goal is to be "rewritable software": software that's so simple that you could explain it to someone and they could write their own. By the end of the presentation, you too should have this feeling.
We will also describe how Snabb is used in practice in major telecoms and ISPs to provide IPv6 transition technologies to entire countries. Using Snabb allowed a small team of open-source hackers to ship a product that competed favorably
against offerings from traditional network vendors.
(c) linux.conf.au 2017, CC-BY-SA
Hobart, 16-20 January 2017
https://linux.conf.au
Pysense: wireless sensor computing in Python?Davide Carboni
PySense aims at bringing wireless sensor (and "internet of things") macroprogramming to the audience of Python programmers. WSN macroprogramming is an emerging approach where the network is seen as a whole and the programmer focuses only on the application logic. The PySense runtime environment
partitions the code and transmits code snippets to the right nodes finding a balance between energy
consumption and computing performances.
Softwerkskammer Lübeck 08/2018 Event Sourcing and CQRSDaniel Bimschas
Introductory-level talk about event sourcing and command query responsibility segregation (CQRS), held at the "Softwerkskammer Lübeck" Meetup (https://www.meetup.com/Softwerkskammer-Luebeck/events/gjsxslyxlbdb/) in August 2018. Code examples that were shown live can be found at https://github.com/danbim/ledger-example and https://github.com/danbim/pwa-scoring.
Tutorial slides about the wireless sensor network SmartSantander/WISEBED experimental facility. Held at the Senzations Summer School in Palic, Serbia 2013.
Tutorial slides about how to run interactive node-level experiment on the wireless sensor network SmartSantander/WISEBED experimental facility. Held at the Senzations Summer School in Palic, Serbia 2013.
Tutorial slides about how to set up your own wireless sensor network testbed using SmartSantander/WISEBED technologies. Held at the Senzations Summer School in Palic, Serbia 2013.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
4. Traditional Approach
• copying and context switches between kernel
and user space à poor performance!
Socket
Read Buffer NIC Buffer
Fast Buffer
Slow Kernel Context
Application Context
Application
App Buffer
Slide Source: Distributed Systems Course 2011/2012 by Dennis Pfisterer, Institute of Telematics, University of Lübeck, Germany 4
5. Zero-Copy Approach
• Kernel handles the copy process via Direct Memory Access (DMA)
– No CPU load
– Lower load on bus system
– No copying between kernelspace and userspace
Socket
Read Buffer NIC Buffer
Fast Buffer
Kernel Context
Perfect!
Task Application Context
Application
App Buffer
Slide Source: Distributed Systems Course 2011/2012 by Dennis Pfisterer, Institute of Telematics, University of Lübeck, Germany 5
6. Simple Benchmark: Copy vs. Zero-Copy
Duration [ms]
Data [Mbyte]
Slide Source: Distributed Systems Course 2011/2012 by Dennis Pfisterer, Institute of Telematics, University of Lübeck, Germany 6
7. Zero-Copy Between Communication Layers
• Often copying is not necessary
– If data is not modi ed a slice can be passed
forward without copying to a different buffer
Ethernet IP TCP HTTP XML
Application
Ethernet IP TCP HTTP XML
Transport Ethernet IP TCP HTTP XML
Internet Ethernet IP TCP HTTP XML
Link Layer Ethernet IP TCP HTTP XML
Slide Source: Distributed Systems Course 2011/2012 by Dennis Pfisterer, Institute of Telematics, University of Lübeck, Germany 7
8. Zero-Copy Between Communication Layers
• Sometimes slices of multiple packages can be
combined to extract e.g., a payload that is split
over multiple packages
• Newly “created” buffer points to original buffers
à No copying necessary
Virtual
HTTP (Part 1) HTTP (Part 2)
Buffer
Received TCP HTTP (Part 1) TCP HTTP (Part 2)
Buffers
8
10. Request Processing in Multi-Thread Servers
t1: Thread S1: ServerSocket Waits most of time db1: DataBase
socket = accept()
s2: Socket without doing
<<create>>(socket)
t2: Thread actual work!
run()
socket = accept()
waitForData()
bytes = read() <<create>>
d1: Decoder
decode(bytes)
waitForData()
bytes = read()
req = decode(bytes)
<<create>>
s1: Servlet
processRequest(req)
query(...)
response results
write(response)
= thread idle
10
11. Request Processing in Multi-Thread Servers
• Usually one thread per request
– Thread idle most of the time (e.g. waiting for I/O)
– Thread even more idle when network slow
– Number of simultaneous clients mostly limited by
maximum number of threads
• Thread construction is expensive
– Higher latency when constructing on request
– Can be improved using pools of Threads
(see Java‘s ExecutorService & Executors classes)
11
12. Request Processing in Event-Driven Servers
s1: Socket s2: Socket io1: NioWorker e1: ExecutorThread = request 1
dataAvailable() = request 2
bytes = read() handleEvent(s1, bytes)
<<create>>
dataAvailable() d1: Decoder
decode(bytes)
bytes = read() handleEvent(s2, bytes)
<<create>>
dataAvailable() d2: Decoder
decode(bytes)
bytes = read() handleEvent(s1, bytes)
req = decode(bytes)
resp = processRequest(bytes)
write(resp)
dataAvailable()
bytes = read() handleEvent(s2, bytes)
req = decode(bytes)
resp = processRequest(bytes)
write(resp)
Disclaimer: this slide may contain errors and is far away from real implementation code but should do good for illustrative purposes 12
13. Request Processing in Event-Driven Servers
• Calls to I/O functions of OS are non-blocking
• Heavy usage of zero-copy strategies
• Everything is an event
– Data available for reading
– Writing data
– Connection established / shut down
• Different requests share threads
• Work is split into small tasks
– Tasks are solved by worker threads from a pool
– Thread number and control decoupled from individual
connections / simultaneous requests
• Number of simultaneous clients can be very high
– Netty: 50.000 on commodity hardware!
13
15. Introduction to Netty
• „The Netty project is an effort to provide an asynchronous
event-driven network application framework for rapid
development of maintainable high-performance protocol
servers & clients.“
Source: http://netty.io
• Good reasons to use Netty:
• Simpli es development
• Amazing performance
• Amazing documentation (Tutorials, JavaDocs), clean concepts
• Lots of documenting examples
• Unit testability for protocols
• Heavily used at e.g., twitter for performance critical applications
15
17. Introduction to Netty - Buffers
• Netty uses a zero-copy strategy for efficiency
• Primitive byte[] are wrapped in a ChannelBuffer
• Simple read/write operations, e.g.:
– writeByte()
– writeLong()
– readByte()
– readLong()
– …
• Hides complexities such as byte order
• Uses low overhead index pointers for realization:
17
18. Introduction to Netty - Buffers
• Combine & slice ChannelBuffers without copying
any payload data by e.g.,
– ChannelBuffer.slice(int index, int length)
– ChannelBuffers.wrappedBuffer(ChannelBuffer... Buffers)
• Pseudo-Code Example:
requestPart1 = buffer1.slice(OFFSET_PAYLOAD,
buffer1.readableBytes() – OFFSET_PAYLOAD);
requestPart2 = buffer2.slice(OFFSET_PAYLOAD,
buffer2.readableBytes() – OFFSET_PAYLOAD);
request = ChannelBuffers.wrappedBuffer(requestPart1, requestPart2);
Virtual
HTTP (Part 1) HTTP (Part 2)
Buffer
Received TCP HTTP (Part 1) TCP HTTP (Part 2)
Buffers
18
20. Introduction to Netty - Codes
• Many protocol encoders/decoders included
– Base64
– Zlib
– Framing/Deframing
– HTTP
– WebSockets
– Google Protocol Buffers
– Real-Time Streaming Protocol (RTSP)
– Java Object Serialization
– String
– (SSL/TLS)
20
21. Introduction to Netty - Codecs
• Abstract base classes for easy implementation
– OneToOneEncoder
– OneToOneDecoder
• Converts one Object (e.g. a ChannelBuffer) into another (e.g.
a HttpServletRequest)
– ReplayingDecoder
• The bytes necessary to decode an Object (e.g. a
HttpServletRequest) may be split over multiple data events
• Manual implementation forces to check and accumulate data
all the time
• ReplayingDecoder allows you to implement decoding
methods just like all required bytes were already received
21
23. Introduction to Netty – Pipelines & Handlers
• Every socket is attached
to a ChannelPipeline
• It contains a stack of
handlers
– Protocol
Encoders / Decoders
– Security Layers
(SSL/TLS, Authentication)
– Application Logic
– ...
23
24. Introduction to Netty – Pipelines & Handlers
• One ChannelPipeline per
Connection
• Handlers can handle
– Downstream events
– Upstream events
– Or both
• Everything is an event
– Data available for reading
– Writing data
– Connection established /
shut down
– …
24
25. Netty – ChannelPipeline Example: HTTP(S) Client
Client Application • Applications based
read(httpResponse) write(httpRequest) on Netty are built as
Channel a stack
httpResponse httpRequest
• Application Logic
ChannelPipeline
HttpRequestDecoder HttpRequestEncoder
String String
sites on top of the
StringDecoder StringEncoder channel
ChannelBuffer ChannelBuffer
• Everything else
SSLDecoder SSLEncoder
(decoding,
ChannelBuffer ChannelBuffer
securing, ...) is done
OS Socket object
inside the pipeline
Disclaimer: this slide is imprecise, may contain errors and there’s no one-to-one implementation. It shows a logic conceptual view of the Netty pipeline. 25