The document provides an overview of key concepts for working with Apache Kafka including:
1. Kafka only provides at-most-once or at-least-once delivery out of the box and discusses how messages can be lost or duplicated. Exactly-once delivery was introduced in later versions.
2. Using more partitions can increase unavailability if a broker fails uncleanly, since leadership elections must occur for each partition, and can increase latency by serializing replication across partitions.
3. The schema registry helps enforce schemas when using Avro to serialize data to Kafka, avoiding issues from schema changes.
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...HostedbyConfluent
When choosing an event streaming platform, Kafka shouldn’t be the only technology you look at. There are a plethora of others in the messaging space today, including open source and proprietary software as well as a range of cloud services. So how do you know you are choosing the right one? A great way to deepen our understanding of event streaming and Kafka is exploring the trade-offs in distributed system design and learning about the choices made by the Kafka project. We’ll look at how Kafka stacks up against other technologies in the space, including traditional messaging systems like Apache ActiveMQ and RabbitMQ as well as more contemporary ones, such as BookKeeper derivatives like Apache Pulsar or Pravega. This talk focuses on the technical details such as difference in messaging models, how data is stored locally as well as across machines in a cluster, when (not) to add tiers to your system, and more. By the end of the talk, you should have a good high-level understanding of how these systems compare and which you should choose for different types of use cases.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Apache Pulsar: The Next Generation Messaging and Queuing SystemDatabricks
Apache Pulsar is the next generation messaging and queuing system with unique design trade-offs driven by the need for scalability and durability. Its two layered architecture of separating message storage from serving led to an implementation that unifies the flexibility and the high-level constructs of messaging, queuing and light weight computing with the scalable properties of log storage systems.
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
Stream Processing with Apache Kafka and .NETconfluent
Presentation from South Bay.NET meetup on 3/30.
Speaker: Matt Howlett, Software Engineer at Confluent
Apache Kafka is a scalable streaming platform that forms a key part of the infrastructure at many companies including Uber, Netflix, Walmart, Airbnb, Goldman Sachs and LinkedIn. In this talk Matt will give a technical overview of Kafka, discuss some typical use cases (from surge pricing to fraud detection to web analytics) and show you how to use Kafka from within your C#/.NET applications.
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...HostedbyConfluent
When choosing an event streaming platform, Kafka shouldn’t be the only technology you look at. There are a plethora of others in the messaging space today, including open source and proprietary software as well as a range of cloud services. So how do you know you are choosing the right one? A great way to deepen our understanding of event streaming and Kafka is exploring the trade-offs in distributed system design and learning about the choices made by the Kafka project. We’ll look at how Kafka stacks up against other technologies in the space, including traditional messaging systems like Apache ActiveMQ and RabbitMQ as well as more contemporary ones, such as BookKeeper derivatives like Apache Pulsar or Pravega. This talk focuses on the technical details such as difference in messaging models, how data is stored locally as well as across machines in a cluster, when (not) to add tiers to your system, and more. By the end of the talk, you should have a good high-level understanding of how these systems compare and which you should choose for different types of use cases.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Apache Pulsar: The Next Generation Messaging and Queuing SystemDatabricks
Apache Pulsar is the next generation messaging and queuing system with unique design trade-offs driven by the need for scalability and durability. Its two layered architecture of separating message storage from serving led to an implementation that unifies the flexibility and the high-level constructs of messaging, queuing and light weight computing with the scalable properties of log storage systems.
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
Stream Processing with Apache Kafka and .NETconfluent
Presentation from South Bay.NET meetup on 3/30.
Speaker: Matt Howlett, Software Engineer at Confluent
Apache Kafka is a scalable streaming platform that forms a key part of the infrastructure at many companies including Uber, Netflix, Walmart, Airbnb, Goldman Sachs and LinkedIn. In this talk Matt will give a technical overview of Kafka, discuss some typical use cases (from surge pricing to fraud detection to web analytics) and show you how to use Kafka from within your C#/.NET applications.
This session goes through the understanding of Apache Kafka, its components and working with best practices to achieve fault tolerant system with high availability and consistency by tuning Kafka brokers and producer to achieve the best result.
Introducing Apache Kafka - a visual overview. Presented at the Canberra Big Data Meetup 7 February 2019. We build a Kafka "postal service" to explain the main Kafka concepts, and explain how consumers receive different messages depending on whether there's a key or not.
Power of the Log: LSM & Append Only Data Structuresconfluent
This talk is about the beauty of sequential access and append-only data structures. We'll do this in the context of a little-known paper entitled “Log Structured Merge Trees”. LSM describes a surprisingly counterintuitive approach to storing and accessing data in a sequential fashion. It came to prominence in Google's Big Table paper and today, the use of Logs, LSM and append-only data structures drive many of the world's most influential storage systems: Cassandra, HBase, RocksDB, Kafka and more. Finally, we'll look at how the beauty of sequential access goes beyond database internals, right through to how applications communicate, share data and scale.
Modern businesses have data at their core, and this data is changing continuously. How can we harness this torrent of information in real-time? The answer is stream processing, and the technology that has since become the core platform for streaming data is Apache Kafka. Among the thousands of companies that use Kafka to transform and reshape their industries are the likes of Netflix, Uber, PayPal, and AirBnB, but also established players such as Goldman Sachs, Cisco, and Oracle.
Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: there are many technologies that need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we, as engineers, would like to work vs. how we actually end up working in practice.
In this session we talk about how Apache Kafka helps you to radically simplify your data processing architectures. We cover how you can now build normal applications to serve your real-time processing needs — rather than building clusters or similar special-purpose infrastructure — and still benefit from properties such as high scalability, distributed computing, and fault-tolerance, which are typically associated exclusively with cluster technologies. Notably, we introduce Kafka’s Streams API, its abstractions for streams and tables, and its recently introduced Interactive Queries functionality. As we will see, Kafka makes such architectures equally viable for small, medium, and large scale use cases.
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)Shivji Kumar Jha
In order to leverage the best performance characters of your stream backend, it is important to understand the nitty gritty details of how pulsar stores your data. Understanding this empowers you to design your use case solutioning so as to make the best use of resources at hand as well as get the optimum amount of consistency, availability, latency and throughput for a given amount of resources at hand.
With this underlying philosophy, in this talk, we will get to the bottom of storage tier of pulsar (apache bookkeeper), the barebones of the bookkeeper storage semantics, how it is used in different use cases ( even other than pulsar), understand the object models of storage in pulsar, different kinds of data structures and algorithms pulsar uses therein and how that maps to the semantics of the storage class shipped with pulsar by default. Oh yes, you can change the storage backend too with some additional code!
This session will empower you with the right background to map your data right with pulsar.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.
RabbitMQ vs Kafka
Messaging is at the core of many architectures and two giants in the messaging space are RabbitMQ and Apache Kafka. In this webinar we'll take a look at RabbitMQ and Kafka within the context of real-time event-driven architectures.
In this session we’re joined by guest speaker Jack Vanlightly who will explore what RabbitMQ and Apache Kafka are and their approach to messaging. Each technology has made very different decisions regarding every aspect of their design, each with strengths and weaknesses, enabling different architectural patterns.
WEBINAR LIVE DATE: Wednesday 23 May 2018 | 17:30 CEST / 16:30 BST / 11:30 EDT / 08:30 PDT
Link to video: https://www.youtube.com/watch?v=sjDnqrnnYNM
———————————————————————
SPEAKER CONTACT DETAILS
JACK VANLIGHTLY - Jack Vanlightly is a software architect based in Barcelona specialising in event-driven architectures, data processing pipelines and data stores both relational and non-relational.
Twitter: https://twitter.com/vanlightly
———————————————————————
COMPANY CONTACT DETAILS
ERLANG SOLUTIONS
- Website: https://www.erlang-solutions.com
- Twitter: https://www.twitter.com/ErlangSolutions
- LinkedIn: http://www.linkedin.com/company/erlan…
- GitHub: https://github.com/esl
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Like many other messaging systems, Kafka has put limit on the maximum message size. User will fail to produce a message if it is too large. This limit makes a lot of sense and people usually send to Kafka a reference link which refers to a large message stored somewhere else. However, in some scenarios, it would be good to be able to send messages through Kafka without external storage. At LinkedIn, we have a few use cases that can benefit from such feature. This talk covers our solution to send large message through Kafka without additional storage.
This session goes through the understanding of Apache Kafka, its components and working with best practices to achieve fault tolerant system with high availability and consistency by tuning Kafka brokers and producer to achieve the best result.
Introducing Apache Kafka - a visual overview. Presented at the Canberra Big Data Meetup 7 February 2019. We build a Kafka "postal service" to explain the main Kafka concepts, and explain how consumers receive different messages depending on whether there's a key or not.
Power of the Log: LSM & Append Only Data Structuresconfluent
This talk is about the beauty of sequential access and append-only data structures. We'll do this in the context of a little-known paper entitled “Log Structured Merge Trees”. LSM describes a surprisingly counterintuitive approach to storing and accessing data in a sequential fashion. It came to prominence in Google's Big Table paper and today, the use of Logs, LSM and append-only data structures drive many of the world's most influential storage systems: Cassandra, HBase, RocksDB, Kafka and more. Finally, we'll look at how the beauty of sequential access goes beyond database internals, right through to how applications communicate, share data and scale.
Modern businesses have data at their core, and this data is changing continuously. How can we harness this torrent of information in real-time? The answer is stream processing, and the technology that has since become the core platform for streaming data is Apache Kafka. Among the thousands of companies that use Kafka to transform and reshape their industries are the likes of Netflix, Uber, PayPal, and AirBnB, but also established players such as Goldman Sachs, Cisco, and Oracle.
Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: there are many technologies that need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we, as engineers, would like to work vs. how we actually end up working in practice.
In this session we talk about how Apache Kafka helps you to radically simplify your data processing architectures. We cover how you can now build normal applications to serve your real-time processing needs — rather than building clusters or similar special-purpose infrastructure — and still benefit from properties such as high scalability, distributed computing, and fault-tolerance, which are typically associated exclusively with cluster technologies. Notably, we introduce Kafka’s Streams API, its abstractions for streams and tables, and its recently introduced Interactive Queries functionality. As we will see, Kafka makes such architectures equally viable for small, medium, and large scale use cases.
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)Shivji Kumar Jha
In order to leverage the best performance characters of your stream backend, it is important to understand the nitty gritty details of how pulsar stores your data. Understanding this empowers you to design your use case solutioning so as to make the best use of resources at hand as well as get the optimum amount of consistency, availability, latency and throughput for a given amount of resources at hand.
With this underlying philosophy, in this talk, we will get to the bottom of storage tier of pulsar (apache bookkeeper), the barebones of the bookkeeper storage semantics, how it is used in different use cases ( even other than pulsar), understand the object models of storage in pulsar, different kinds of data structures and algorithms pulsar uses therein and how that maps to the semantics of the storage class shipped with pulsar by default. Oh yes, you can change the storage backend too with some additional code!
This session will empower you with the right background to map your data right with pulsar.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.
RabbitMQ vs Kafka
Messaging is at the core of many architectures and two giants in the messaging space are RabbitMQ and Apache Kafka. In this webinar we'll take a look at RabbitMQ and Kafka within the context of real-time event-driven architectures.
In this session we’re joined by guest speaker Jack Vanlightly who will explore what RabbitMQ and Apache Kafka are and their approach to messaging. Each technology has made very different decisions regarding every aspect of their design, each with strengths and weaknesses, enabling different architectural patterns.
WEBINAR LIVE DATE: Wednesday 23 May 2018 | 17:30 CEST / 16:30 BST / 11:30 EDT / 08:30 PDT
Link to video: https://www.youtube.com/watch?v=sjDnqrnnYNM
———————————————————————
SPEAKER CONTACT DETAILS
JACK VANLIGHTLY - Jack Vanlightly is a software architect based in Barcelona specialising in event-driven architectures, data processing pipelines and data stores both relational and non-relational.
Twitter: https://twitter.com/vanlightly
———————————————————————
COMPANY CONTACT DETAILS
ERLANG SOLUTIONS
- Website: https://www.erlang-solutions.com
- Twitter: https://www.twitter.com/ErlangSolutions
- LinkedIn: http://www.linkedin.com/company/erlan…
- GitHub: https://github.com/esl
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Like many other messaging systems, Kafka has put limit on the maximum message size. User will fail to produce a message if it is too large. This limit makes a lot of sense and people usually send to Kafka a reference link which refers to a large message stored somewhere else. However, in some scenarios, it would be good to be able to send messages through Kafka without external storage. At LinkedIn, we have a few use cases that can benefit from such feature. This talk covers our solution to send large message through Kafka without additional storage.
In this session you will learn:
1. Kafka Overview
2. Need for Kafka
3. Kafka Architecture
4. Kafka Components
5. ZooKeeper Overview
6. Leader Node
For more information, visit: https://www.mindsmapped.com/courses/big-data-hadoop/hadoop-developer-training-a-step-by-step-tutorial/
Introduction to Kafka Streams PresentationKnoldus Inc.
Kafka Streams is a client library providing organizations with a particularly efficient framework for processing streaming data. It offers a streamlined method for creating applications and microservices that must process data in real-time to be effective. Using the Streams API within Apache Kafka, the solution fundamentally transforms input Kafka topics into output Kafka topics. The benefits are important: Kafka Streams pairs the ease of utilizing standard Java and Scala application code on the client end with the strength of Kafka’s robust server-side cluster architecture.
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps_Fest
Apache Kafka зараз на хайпі. Все більше компаній починають використовувати її, як message bus. Проте Kafka може набагато більше, аніж бути просто транспортом. Її реальна міць і краса розкриваються, коли Kafka стає центральною нервовою системою вашої архітектури. Вона швидка, надійна і доволі гнучка для різних сценаріїв використання.
На цій доповіді Сергій поділитися досвідом побудови data streaming платформи. Ми поговоримо про те, як Kafka працює, як її потрібно конфігурувати і в які халепи можна потрапити, якщо Kafka використовується неоптимально.
How to use kakfa for storing intermediate data and use it as a pub/sub model with each of the Producer/Consumer/Topic configs deeply and the Internals working of it.
Kafka is a real-time, fault-tolerant, scalable messaging system.
It is a publish-subscribe system that connects various applications with the help of messages - producers and consumers of information.
Removing performance bottlenecks with Kafka Monitoring and topic configurationKnoldus Inc.
Apache Kafka is a distributed messaging system used to build real-time data pipelines & streaming applications. Since applications rely heavily on efficient data transfer, message passing platforms like Kafka cannot afford a breakdown or poor performance.
But how do we ensure that Kafka is running well and successfully streaming messages at low latency? This is where Kafka monitoring steps in.
Here’s the agenda of the webinar -
> Why Kafka monitoring?
> Top 10 Kafka metrics to focus on
> How to change Kafka topic configuration at runtime?
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Enhancing Performance with Globus and the Science DMZ
A Quick Guide to Refresh Kafka Skills
1. Quick Guide to Refresh Kafka skills
1. Limitations of Kafka
A. Kafka is an at least once or at most once delivery system
but not Exactly once.
At most once scenario – lose of messages
At least once scenario – duplicates of messages
An at-most-once scenario happens when the commit
interval has occurred, and that in turn triggers Kafka to
automatically commit the last used offset. Meanwhile,
let us say the consumer did not get a chance to complete
the processing of the messages and consumer has
crashed. Now when consumer restarts, it starts to
receive messages from the last committed offset, in
essence consumer could lose a few messages in
between.
At-least-once scenario happens when consumer
processes a message and commits the message into its
persistent store and consumer crashes at that point.
Meanwhile, let us say Kafka could not get a chance to
commit the offset to the broker sincecommitinterval has
not passed. Now when the consumer restarts, it gets
delivered with a few older messages from the last
committed offset.
https://dzone.com/articles/kafka-clients-at-most-once-
at-least-once-exactly-o
More Partitions May Increase Unavailability
In the common casewhen a broker is shutdown cleanly,
the controller will proactively move the leaders off the
shutting down broker one at a time. The moving of a
singleleader takes only a few milliseconds.So, from the
clients perspective, there is only a small window of
unavailability during a clean broker shutdown.
However, when a broker is shutdown uncleanly (e.g., kill
-9), the observed unavailability could be proportional to
the number of partitions. Suppose that a broker has a
total of 2000 partitions, each with 2 replicas. Roughly,
this broker will be the leader for about 1000 partitions.
When this broker failsuncleanly,all those1000 partitions
become unavailable at exactly the same time. Suppose
that it takes 5 ms to elect a new leader for a single
partition. It will take up to 5 seconds to elect the new
leader for all 1000 partitions. So, for some partitions,
their observed unavailability can be 5 seconds plus the
time taken to detect the failure.
More Partitions May Increase End-to-end Latency
The end-to-end latency in Kafka is defined by the time
from when a message is published by the producer to
when the message is read by the consumer. Kafka only
exposes a message to a consumer after it has been
committed, i.e., when the messageis replicated to all the
in-sync replicas.So,the time to commit a message can be
a significant portion of the end-to-end latency. By
default, a Kafka broker only uses a single thread to
replicatedata from another broker, for all partitions that
share replicas between the two brokers. Our
experiments show that replicating 1000 partitions from
one broker to another can add about 20 ms latency,
which implies that the end-to-end latency is at least 20
ms. This can be too high for some real-time applications.
https://de.confluent.io/blog/how-choose-number-
topics-partitions-kafka-cluster
High-Level Stream DSL – Using the Stream DSL, a user
can express transformation, aggregations, grouping, etc.
With each transformation, data has to be serialized and
written into the topic. Then, for the next operation in the
chain, it has to be read from the topic, meaning that all
sideoperations happen for every entity (likepartition key
calculation, persisting to disk, etc.)
Kafka Streams assumes that the Serde class used for
serialization or deserialization is the one provided in the
config. With changingtheformat of data in the operation
chain, the user has to provide the appropriate Serde. If
existingSerdes can'thandletheused format, the user has
to create a custom Serde. No big deal — justextend the
Serde class and implement a custom serializer and
deserializer. From one class, we ended up with 4 — not
so optimal. So, for each custom format of data in the
operation chain, we create three additional classes. An
alternative approach is to use generic JSON or AVRO
Serdes. One more thing: the user has to specify a Serde
for both key and value parts of the message
Restarting of the application. After the breakdown, the
application will go through each internal topic to the last
valid offset, and this can take some time — especially if
logcompaction is notused and/or retention period is not
set up.
https://dzone.com/articles/problem-with-kafka-
streams-1
2. Exactly once delivery in Kafka
A. exactly-once semantics is introduced in Apache Kafka
0.11 release and Confluent Platform 3.3.
https://medium.com/@jaykreps/exactly-once-support-
in-apache-kafka-55e1fdd0a35f
For Previous versions of kafka we can partially achieve
this - Exactly-onceKafka Static Consumer via Assign (One
and Only One Message Delivery)
Steps
2. Step1 : Set enable.auto.commit = false
Step2 : Don’t make call to consumer.commitSync(); after
processingmessage.
Step3 : Register consumer to specific partition using
‘assign’call.
Step 4 : On startup of the consumer seek to specific
message offset by calling
consumer.seek(topicPartition,offset);
Step 5 : While processing the messages, get hold of the
offset of each message. Store the processed message’s
offset in an atomic way along with the processed
message using atomic-transaction. When data is stored
in relational database atomicity is easier to implement.
For non-relational data-store such as HDFS store or No-
SQL store one way to achieve atomicity is as follows:
Store the offset along with the message.
Step 6 : Implement idempotent as a safety net.
https://dzone.com/articles/kafka-clients-at-most-once-
at-least-once-exactly-o
3. Why Avro produce and consumer is preferred
A. Avro is an open source binary message exchange
protocol. Avro helps to send optimized messages across
the wirehence reducingthe network overhead. Avro can
enforce schema for messages that can be defined using
JSON. Avro can generate binding objects in various
programming languages from these schemas. Message
payloads are automatically bound, and these generate
objects on the consumer side.Avro is natively supported
and highly recommended to use along with Kafka
https://dzone.com/articles/kafka-clients-at-most-once-
at-least-once-exactly-o
4. Schema registry
A. Kafka takes bytes as an input and sends bytes as an
output – No data verification. Obviously, your data has
meaning beyond bytes, so your consumers need to parse
itand later on interpret it.They mainly occur in thesetwo
situations: The field you’re looking for doesn’t exist
anymore. The type of the field has changed (e.g. what
used to be a String is now an Integer)
What are our options to prevent and overcome these
issues?
Catch exception on parsing errors. Your code becomes
ugly and very hard to maintain. 👎
Never ever change the data producer and triple check
your producer code will never forget to send a field.
That’s what most companies do. But after a few key
people quit, all your “safeguards” are gone. 👎👎
Adopt a data format and enforce rules that allowyou to
perform schema evolution while guaranteeing not to
break your downstream applications. 👏 (Sounds too
good to be true ?) - That data format is Apache Avro.
Avro is one of the fastest serializable and deserializable
data formats. It support schema evolution.
The Kafka Avro Serializer
The engineering beauty of this architecture is that now,
your Producers usea new Serializer,provided courtesy of
Confluent, named the KafkaAvroSerializer. Upon
producing Avro data to Kafka, the following will happen
(simplified version):
Your producer will check if the schema is available is in
the Schema Registry. If not available, it will register and
cache it
The Schema Registry will verify if theschema is either the
same as before or a valid evolution. If not, it will return
an exception and the KafkaAvroSerializer will crash your
producer. Better safe than sorry
If the schema is valid and all checks pass, the producer
will only includea reference to the Schema (the Schema
ID) in the message sent to Kafka,not the whole schema.
The advantage of this is that now, your messages sent to
Kafka are much smaller!
https://medium.com/@stephane.maarek/introduction-
to-schemas-in-apache-kafka-with-the-confluent-
schema-registry-3bf55e401321
5. In-Sync replica (AKA ISR)
A. Every topic partition in Kafka isreplicated n times, where
n is the replication factor of the topic. This allows Kafka
to automatically failover to these replicas when a server
in the cluster fails so that messages remain available in
the presence of failures.Replication in Kafka happens at
the partition granularity where the partition’s write-
ahead log is replicated in order to n servers. Out of the n
replicas, one replica is designated as the leader while
others are followers. As the name suggests, the leader
takes the writes from the producer and the followers
merely copy the leader’s log in order.
When a producer sends a message to the broker, it is
written by the leader and replicated to all the partition’s
3. replicas. A message is committed only after it has been
successfully copied to all the in-sync replicas.
WHAT DOES IT MEAN FOR A REPLICA TO be caughtup to
the leader? - replica that has not “caught up” to the
leader’s log as possibly being marked as an out-of-sync
replica. take an example of a single partition topic foo
with a replication factor of 3.Assumethatthe replicas for
this partition live on brokers 1, 2 and 3 and that 3
messages have been committed on topic foo. Replica on
broker 1 is the current leader and replicas 2 and 3 are
followers and all replicasarepartof the ISR. Also assume
that replica.lag.max.messages is set to 4 which means
that as longas afollower is behind theleader by notmore
than 3 messages,itwill notberemoved from the ISR.And
replica.lag.time.max.ms is set to 500 ms which means
that as long as the followers send a fetch request to the
leader every 500 ms or sooner, they will not be marked
dead and will not be removed from the ISR.
A replica can be out-of-sync with the leader for several
reasons-Slow replica: A follower replica that is
consistently not able to catch up with the writes on the
leader for a certain period of time. One of the most
common reasons for this is an I/O bottleneck on the
follower replica causing it to append the copied
messages at a rate slower than it can consumer from the
leader.
Stuck replica: A follower replica that has stopped
fetching from the leader for a certain period of time. A
replica could be stuck either due to a GC pause or
because it has failed or died.
Bootstrapping replica: When the user increases the
replication factor of the topic, the new follower replicas
are out-of-sync until they are fully caught up to the
leader’s log
https://www.confluent.io/blog/hands-free-kafka-replication-a-
lesson-in-operational-simplicity/
6. Kafka log compaction
A. In the Kafka cluster,the retention policy can be set on a
per-topic basis such as time based, size-based, or log
compaction-based. Log compaction retains at least the
last known value for each record key for a single topic
partition. Compacted logs are useful for restoring state
after a crash or system failure.
Other important usecases is CDC (change data capture)
Kafka logcompaction also allows for deletes. A message
with a key and a null payload acts like a tombstone, a
delete marker for that key. Tombstones get cleared after
a period. Log compaction periodically runs in the
background by recopying log segments. Compaction
does not block reads and can be throttled to avoid
impacting I/O of producers and consumers.
The Kafka Log Cleaner does log compaction. The Log
cleaner has a pool of background compaction threads.
These threads recopy log segment files, removing older
records whose key reappears recently in the log.
Topic config min.compaction.lag.ms gets used to
guarantee a minimum period that must pass before a
message can be compacted. The consumer sees all
4. tombstones as long as the consumer reaches head of a
log in a period less than the topic config
delete.retention.ms (the default is 24 hours). Log
compaction will never re-order messages, just remove
some. Partition offset for a message never changes.
http://cloudurable.com/blog/kafka-architecture-log-
compaction/index.html
7. kafka record keys for partition Strategy
A. Key is an optional metadata,thatcan besent with a Kafka
message, and by default, itis used to route message to a
specific partition.E.g. if you're sendinga message m with
key as k, to a topic mytopic that has p partitions,then m
goes to the partition Hash(k) % p in mytopic. It has no
connection to the offset of a partition whatsoever.
Offsets are used by consumers to keep track of the
position of last read message in a partition.
If a PartitionKeyStrategy is used with a topic,the valueis
used as the message key, and is then implicitly used to
select the partition according to the default behavior of
the Kafka client:
If a valid partition number is specified that partition will
be used when sending the record. If no partition is
specified but a key is present a partition will be chosen
using a hash of the key. If neither key nor partition is
present a partition will be assigned in a round-robin
fashion.
It might be desirable in some cases to control these
independently. For example, you might wish to have a
message key that is more fine-grained than the partition
key, for use with Kafka log compaction on sub-graphs of
the entity state.
https://blog.newrelic.com/engineering/effective-
strategies-kafka-topic-partitioning/
https://stackoverflow.com/questions/51245962/how-
to-choose-a-key-and-offset-for-a-kafka-producer
8. kafka Mirror Maker
A. Kafka's mirroringfeaturemakes itpossibleto maintain a
replica of an existing Kafka cluster.
Mirror Maker is just a regular Java Producer/Consumer
pair. Data is read from topics in the origin cluster and
written to a topic with the same name in the destination
cluster. You can run many such mirroring processes to
increase throughput and for fault-tolerance (if one
process dies, the others will take overs the additional
load).
Alternative is Confluent Replicator. Confluent Replicator
is a more complete solution that handles topic
configuration as well as data and integrates with Kafka
Connect and Control Center to improve availability,
scalability and ease of use.
https://docs.confluent.io/current/multi-dc-
replicator/mirrormaker.html
9. Challenges faced while working with Kafka
A. Duplicates – at least once behavior – Remove the
duplicates by checkingwith previous persisted messages
or implement exactly once behavior.
Consumer liveness – if a consumer take more time to
process the message. The group coordinator expects
group members to send it regular heartbeats to indicate
that they remain active. A background heartbeat thread
runs in the consumer sending regular heartbeats to the
coordinator. If the coordinator does not receive a
heartbeat from a group member within the session
timeout, the coordinator removes the member from the
group and starts a rebalance of the group. The session
timeout can be much shorter than the maximum polling
interval so that the time taken to detect a failed
consumer can be shorteven if message processingtakes
a long time.
You can configurethe maximumpollinginterval usingthe
max.poll.interval.ms property and the session timeout
usingthe session.timeout.ms property. You will typically
not need to use these settings unless ittakes more than
5 minutes to process a batch of messages.
If you have problems with message handling caused by
message flooding, you can set a consumer option to
control the speed of message consumption. Use
fetch.max.bytes and max.poll.records to control how
much data a call to poll() can return.
https://console.bluemix.net/docs/services/EventStream
s/eventstreams114.html#consuming_messages