This document provides an agenda and overview of Kafka on Kubernetes. It begins with an introduction to Kafka fundamentals and messaging systems. It then discusses key ideas behind Kafka's architecture like data parallelism and batching. The rest of the document explains various Kafka concepts in detail like topics, partitions, producers, consumers, and replication. It also introduces Kubernetes concepts relevant for running Kafka like StatefulSets, StorageClasses and the operator pattern. The goal is to help understand how to build event-driven systems using Kafka and deploy it on Kubernetes.
Operatorhub.io and your Kubernetes cluster | DevNation Tech TalkRed Hat Developers
An Operator packages and manages the entire lifecycle of an application. You might not be building your own Operators (yet), but Operators make it easier to install and consistently manage the foundation components like databases, file systems, and middleware that your applications rely on -- and they’re not just for OpenShift. We’ll show you how to use the Operator Lifecycle Manager (OLM) and the Operators at OperatorHub.io with your Kubernetes cluster.
Kubernetes: The evolution of distributed systems | DevNation Tech TalkRed Hat Developers
Cloud-native applications of the future will consist of hybrid workloads: stateful applications, batch jobs, stateless microservices, and functions (maybe even something else) wrapped as Linux containers and deployed via Kubernetes on any cloud. Functions and the so-called serverless computing model is the latest evolution of what started as service-oriented architecture years ago. But is it the last step of the application architecture evolution and is it here to stay? During this talk, we will take you on a journey exploring distributed application needs, and how they evolved with Kubernetes, Istio, Knative, Dapr, and other projects. By the end of the session, you will know what is coming after microservices.
How Docker EE is Finnish Railway’s Ticket to App ModernizationDocker, Inc.
VR Group-Finnish Railways is responsible for 118 million passenger rides and moving 41 million tons of cargo a year and is seeing overall growth in rail transit throughout Finland. A priority for the organization is to provide improved customer services, including an improved seat reservation system and bringing modern experiences like next generation mobile apps to their passengers. These improvements require looking at their application portfolio and deciding to either:
Revise: Transform legacy applications to more cost efficient solutions
Redesign: Redesign and rewrite mainframe-based solutions to microservices
In this session, Markus Niskanen, Integration Manager at VR Group, and Oscar Renalias, Sr. Technology Architect at Accenture will discuss how they leveraged Docker EE and the public cloud to be the common platform for these different application modernization projects. They will cover how they are leveraging Docker and the cloud to renew and optimize their application portfolio for greater ROI, leading to organization-wide adaptation of DevOps principles and cultural change in an industry that is over 150 years old.
Operatorhub.io and your Kubernetes cluster | DevNation Tech TalkRed Hat Developers
An Operator packages and manages the entire lifecycle of an application. You might not be building your own Operators (yet), but Operators make it easier to install and consistently manage the foundation components like databases, file systems, and middleware that your applications rely on -- and they’re not just for OpenShift. We’ll show you how to use the Operator Lifecycle Manager (OLM) and the Operators at OperatorHub.io with your Kubernetes cluster.
Kubernetes: The evolution of distributed systems | DevNation Tech TalkRed Hat Developers
Cloud-native applications of the future will consist of hybrid workloads: stateful applications, batch jobs, stateless microservices, and functions (maybe even something else) wrapped as Linux containers and deployed via Kubernetes on any cloud. Functions and the so-called serverless computing model is the latest evolution of what started as service-oriented architecture years ago. But is it the last step of the application architecture evolution and is it here to stay? During this talk, we will take you on a journey exploring distributed application needs, and how they evolved with Kubernetes, Istio, Knative, Dapr, and other projects. By the end of the session, you will know what is coming after microservices.
How Docker EE is Finnish Railway’s Ticket to App ModernizationDocker, Inc.
VR Group-Finnish Railways is responsible for 118 million passenger rides and moving 41 million tons of cargo a year and is seeing overall growth in rail transit throughout Finland. A priority for the organization is to provide improved customer services, including an improved seat reservation system and bringing modern experiences like next generation mobile apps to their passengers. These improvements require looking at their application portfolio and deciding to either:
Revise: Transform legacy applications to more cost efficient solutions
Redesign: Redesign and rewrite mainframe-based solutions to microservices
In this session, Markus Niskanen, Integration Manager at VR Group, and Oscar Renalias, Sr. Technology Architect at Accenture will discuss how they leveraged Docker EE and the public cloud to be the common platform for these different application modernization projects. They will cover how they are leveraging Docker and the cloud to renew and optimize their application portfolio for greater ROI, leading to organization-wide adaptation of DevOps principles and cultural change in an industry that is over 150 years old.
Migrating from oracle soa suite to microservices on kubernetesKonveyor Community
Watch presentation recording: https://youtu.be/cxH6WjDZc2c
In this session, we’ll explore how Randoli helped a Postal Technology company migrate their payment gateway applications off Oracle SOA Suite to Camel/Springboot on Kubernetes.
The primary drivers for the migration were: move to cloud-native technologies in keeping with the organizational digital transformation mandate; move away from an outdated centralized platform to a decentralized architecture for efficiency, scalability, and manageability; and very high licensing costs of the existing platform.
We’ll discuss:
- The high-level approach we took during the migration including architecture and design decisions.
- How we used Camel/Springboot to implement the services.
- Why and how we used Drools for implementing business rules.
- The test-driven approach using Camel testing framework and how it helped reduce issues.
- CI/CD and build process on Kubernetes.
- How we tackled logging, monitoring, and tracing challenges.
Presenter: Rajith Attapattu, Managing Partner & CTO @ Randoli Inc.
In this talk, Marco and Shashi go in depth on the Kong Mesosphere DC/OS integration and how it enables developers to deploy Kong on a Mesosphere DC/OS cluster to simplify operations and achieve higher resource utilization.
Apache Flink is an open source platform which is a streaming data flow engine that provides communication, fault-tolerance, and data-distribution for distributed computations over data streams. Flink is a top level project of Apache. Flink is a scalable data analytics framework that is fully compatible to Hadoop. Flink can execute both stream processing and batch processing easily.
Building Developer Pipelines with PKS, Harbor, Clair, and ConcourseVMware Tanzu
SpringOne Platform 2017
Thomas Kraus, VMware; Merlin Glynn, VMware
Today's developer needs to rapidly build and deploy code in a consistent, predictable, and declarative manner. This session will illustrate how companies can leverage PKS, Kubernetes, Harbor, Clair, and Concourse to achieve these goals. The session will provide a solution overview for developing, building, and deploying applications using Container technologies from VMware and Pivotal. A brief review of each of the technologies being discussed will be provided. The session will include a proposed end to end solution leveraging all of these technologies to provide a better developer experience. The session will conclude with a demonstration illustrating a development workflow leveraging these technologies to initially develop and then update an Application running on PKS and Kubernetes.
GitOps is the best modern practice for CD with KubernetesVolodymyr Shynkar
Evolution of infrastructure as code, a framework that can drastically improve deployment speed and development efficiency.
Youtube version: https://www.youtube.com/watch?v=z2kHFpCPum8
Kubernetes is awesome! But what does it takes for a Java developer to design, implement and run Cloud Native applications? In this session, we will look at Kubernetes from a user point of view and demonstrate how to consume it effectively. We will discover which concerns Kubernetes addresses and how it helps to develop highly scalable and resilient Java applications.
FOSDEM TALK: https://fosdem.org/2017/schedule/event/cnjavadev/
Deploying Anything as a Service (XaaS) Using Operators on KubernetesAll Things Open
Presented by: Jeff Spahr
Presented at the All Things Open 2021
Raleigh, NC, USA
Raleigh Convention Center
Abstract: Kubernetes has long since solved compute as a service, but what if you want to deploy higher level services without reimplementing the finer details of how to scale, cluster, and upgrade those services? Custom Resource Definitions (CRDs) allow users to expand the Kubernetes API to create resources like 'kind: elasticsearch' or 'kind: mariadb'. Operators manage those CRDs and take on orchestration and lifecycle management of those services.
In this talk I'll cover the what and why of Operators on Kubernetes with a focus on what real world problems this solves for Kubernetes end users. I'll walk through deploying operators for common high level services that make up a production application.
The XaaS walkthrough and demo will include some of the following technologies:
* Cloud Services (EC2, S3)
* Databases (MariaDB, Vitess, Elasticsearch)
* Load balancers (F5, NGINX)
* Streaming (Kafka, RabbitMQ)
You'll leave this session with a foundation to start offering XaaS to your end users.
Cloud-Native Modernization or Death? A false dichotomy. | DevNation Tech TalkRed Hat Developers
Are you familiar with the tight coupling of apps with their underlying platform that makes change hard or lack of scalability, performance, and flexibility of existing apps built with legacy technology or the fact that re-architecting apps cannot be done overnight? If yes to any of these, you probably think that you have “Cloud-Native Modernization or Death”. But what if there is another way that shows you the incremental steps to refactor the application to microservices and make use of Kubernetes/OpenShift to effectively deploy and manage it at scale on the cloud? This session guides developers on how to get started on their cloud-native journey, starting with monolithic application migration to a modern container platform using Kubernetes/OpenShift, and modernizing applications using microservices and Red Hat Cloud-Native Runtimes (Spring Boot and Quarkus).
Zero-downtime deployment of Micro-services with KubernetesWojciech Barczyński
Talk on deployment strategies with Kubernetes covering kubernetes configuration files and the actual implementation of your service in Golang.
You will find demos for recreate, rolling updates, blue-green, and canary deployments.
Source and demos, you will find on github: https://github.com/wojciech12/talk_zero_downtime_deployment_with_kubernetes
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger
Running applications on Kubernetes can provide a lot of benefits: more dev speed, lower ops costs, and a higher elasticity & resiliency in production. Kubernetes is the place to be for cloud native apps. But what to do if you’ve no shiny new cloud native apps but a whole bunch of JEE legacy systems? No chance to leverage the advantages of Kubernetes? Yes you can!
We’re facing the challenge of migrating hundreds of JEE legacy applications of a major German insurance company onto a Kubernetes cluster within one year. We're now close to the finish line and it worked pretty well so far.
The talk will be about the lessons we've learned - the best practices and pitfalls we've discovered along our way. We'll provide our answers to life, the universe and a cloud native journey like:
- What technical constraints of Kubernetes can be obstacles for applications and how to tackle these?
- How to architect a landscape of hundreds of containerized applications with their surrounding infrastructure like DBs MQs and IAM and heavy requirements on security?
- How to industrialize and govern the migration process?
- How to leverage the possibilities of a cloud native platform like Kubernetes without challenging the tight timeline?
Deploy Prometheus on Kubernetes to monitor Containers. Containers are dynamic and often deployed in large quantities. In such an environment, monitoring is crucial to help with the overall health of the kubernetes environment. This tutorial explains how to deploy prometheus on Kubernetes.
Migrating from oracle soa suite to microservices on kubernetesKonveyor Community
Watch presentation recording: https://youtu.be/cxH6WjDZc2c
In this session, we’ll explore how Randoli helped a Postal Technology company migrate their payment gateway applications off Oracle SOA Suite to Camel/Springboot on Kubernetes.
The primary drivers for the migration were: move to cloud-native technologies in keeping with the organizational digital transformation mandate; move away from an outdated centralized platform to a decentralized architecture for efficiency, scalability, and manageability; and very high licensing costs of the existing platform.
We’ll discuss:
- The high-level approach we took during the migration including architecture and design decisions.
- How we used Camel/Springboot to implement the services.
- Why and how we used Drools for implementing business rules.
- The test-driven approach using Camel testing framework and how it helped reduce issues.
- CI/CD and build process on Kubernetes.
- How we tackled logging, monitoring, and tracing challenges.
Presenter: Rajith Attapattu, Managing Partner & CTO @ Randoli Inc.
In this talk, Marco and Shashi go in depth on the Kong Mesosphere DC/OS integration and how it enables developers to deploy Kong on a Mesosphere DC/OS cluster to simplify operations and achieve higher resource utilization.
Apache Flink is an open source platform which is a streaming data flow engine that provides communication, fault-tolerance, and data-distribution for distributed computations over data streams. Flink is a top level project of Apache. Flink is a scalable data analytics framework that is fully compatible to Hadoop. Flink can execute both stream processing and batch processing easily.
Building Developer Pipelines with PKS, Harbor, Clair, and ConcourseVMware Tanzu
SpringOne Platform 2017
Thomas Kraus, VMware; Merlin Glynn, VMware
Today's developer needs to rapidly build and deploy code in a consistent, predictable, and declarative manner. This session will illustrate how companies can leverage PKS, Kubernetes, Harbor, Clair, and Concourse to achieve these goals. The session will provide a solution overview for developing, building, and deploying applications using Container technologies from VMware and Pivotal. A brief review of each of the technologies being discussed will be provided. The session will include a proposed end to end solution leveraging all of these technologies to provide a better developer experience. The session will conclude with a demonstration illustrating a development workflow leveraging these technologies to initially develop and then update an Application running on PKS and Kubernetes.
GitOps is the best modern practice for CD with KubernetesVolodymyr Shynkar
Evolution of infrastructure as code, a framework that can drastically improve deployment speed and development efficiency.
Youtube version: https://www.youtube.com/watch?v=z2kHFpCPum8
Kubernetes is awesome! But what does it takes for a Java developer to design, implement and run Cloud Native applications? In this session, we will look at Kubernetes from a user point of view and demonstrate how to consume it effectively. We will discover which concerns Kubernetes addresses and how it helps to develop highly scalable and resilient Java applications.
FOSDEM TALK: https://fosdem.org/2017/schedule/event/cnjavadev/
Deploying Anything as a Service (XaaS) Using Operators on KubernetesAll Things Open
Presented by: Jeff Spahr
Presented at the All Things Open 2021
Raleigh, NC, USA
Raleigh Convention Center
Abstract: Kubernetes has long since solved compute as a service, but what if you want to deploy higher level services without reimplementing the finer details of how to scale, cluster, and upgrade those services? Custom Resource Definitions (CRDs) allow users to expand the Kubernetes API to create resources like 'kind: elasticsearch' or 'kind: mariadb'. Operators manage those CRDs and take on orchestration and lifecycle management of those services.
In this talk I'll cover the what and why of Operators on Kubernetes with a focus on what real world problems this solves for Kubernetes end users. I'll walk through deploying operators for common high level services that make up a production application.
The XaaS walkthrough and demo will include some of the following technologies:
* Cloud Services (EC2, S3)
* Databases (MariaDB, Vitess, Elasticsearch)
* Load balancers (F5, NGINX)
* Streaming (Kafka, RabbitMQ)
You'll leave this session with a foundation to start offering XaaS to your end users.
Cloud-Native Modernization or Death? A false dichotomy. | DevNation Tech TalkRed Hat Developers
Are you familiar with the tight coupling of apps with their underlying platform that makes change hard or lack of scalability, performance, and flexibility of existing apps built with legacy technology or the fact that re-architecting apps cannot be done overnight? If yes to any of these, you probably think that you have “Cloud-Native Modernization or Death”. But what if there is another way that shows you the incremental steps to refactor the application to microservices and make use of Kubernetes/OpenShift to effectively deploy and manage it at scale on the cloud? This session guides developers on how to get started on their cloud-native journey, starting with monolithic application migration to a modern container platform using Kubernetes/OpenShift, and modernizing applications using microservices and Red Hat Cloud-Native Runtimes (Spring Boot and Quarkus).
Zero-downtime deployment of Micro-services with KubernetesWojciech Barczyński
Talk on deployment strategies with Kubernetes covering kubernetes configuration files and the actual implementation of your service in Golang.
You will find demos for recreate, rolling updates, blue-green, and canary deployments.
Source and demos, you will find on github: https://github.com/wojciech12/talk_zero_downtime_deployment_with_kubernetes
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger
Running applications on Kubernetes can provide a lot of benefits: more dev speed, lower ops costs, and a higher elasticity & resiliency in production. Kubernetes is the place to be for cloud native apps. But what to do if you’ve no shiny new cloud native apps but a whole bunch of JEE legacy systems? No chance to leverage the advantages of Kubernetes? Yes you can!
We’re facing the challenge of migrating hundreds of JEE legacy applications of a major German insurance company onto a Kubernetes cluster within one year. We're now close to the finish line and it worked pretty well so far.
The talk will be about the lessons we've learned - the best practices and pitfalls we've discovered along our way. We'll provide our answers to life, the universe and a cloud native journey like:
- What technical constraints of Kubernetes can be obstacles for applications and how to tackle these?
- How to architect a landscape of hundreds of containerized applications with their surrounding infrastructure like DBs MQs and IAM and heavy requirements on security?
- How to industrialize and govern the migration process?
- How to leverage the possibilities of a cloud native platform like Kubernetes without challenging the tight timeline?
Deploy Prometheus on Kubernetes to monitor Containers. Containers are dynamic and often deployed in large quantities. In such an environment, monitoring is crucial to help with the overall health of the kubernetes environment. This tutorial explains how to deploy prometheus on Kubernetes.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...HostedbyConfluent
Many organizations use Apache Kafka® to build data pipelines that span multiple geographically distributed data centers, for use cases ranging from high availability and disaster recovery, to data aggregation and regulatory compliance.
The journey from single-cluster deployments to multi-cluster deployments can be daunting, as you need to deal with networking configurations, security models and operational challenges. Geo-replication support for Kafka has come a long way, with both open-source and commercial solutions that support various replication topologies and disaster recovery strategies.
So, grab your towel, and join us on this journey as we look at tools, practices, and patterns that can help us build reliable, scalable, secure, global (if not inter-galactic) data pipelines that meet your business needs, and might even save the world from certain destruction.
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpJosé Román Martín Gil
Apache Kafka is the most used data streaming broker by companies. It could manage millions of messages easily and it is the base of many architectures based in events, micro-services, orchestration, ... and now cloud environments. OpenShift is the most extended Platform as a Service (PaaS). It is based in Kubernetes and it helps the companies to deploy easily any kind of workload in a cloud environment. Thanks many of its features it is the base for many architectures based in stateless applications to build new Cloud Native Applications. Strimzi is an open source community that implements a set of Kubernetes Operators to help you to manage and deploy Apache Kafka brokers in OpenShift environments.
These slides will introduce you Strimzi as a new component on OpenShift to manage your Apache Kafka clusters.
Slides used at OpenShift Meetup Spain:
- https://www.meetup.com/es-ES/openshift_spain/events/261284764/
Intro to Apache Kafka I gave at the Big Data Meetup in Geneva in June 2016. Covers the basics and gets into some more advanced topics. Includes demo and source code to write clients and unit tests in Java (GitHub repo on the last slides).
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...HostedbyConfluent
Apache Kafka fundamentally changes how organizations build and deploy a universal data pipeline that is scalable, reliable, and durable enough to meet the needs of digital-first organizations. However, as powerful as Kafka is today, it’s not an event-streaming platform - and getting it there on your own is a long, complicated, and expensive process. Earlier this year Confluent announced Project Metamorphosis - our plan to bring the best characteristics of cloud native systems to Apache Kafka. Since May we’ve begun transforming Confluent Cloud and Confluent Platform to do just that.
Join two of our Product Managers, Dan Rosanova and Addison Huddy to: Learn how we’ve evolved Confluent Cloud with the first phase of Project Metamorphosis releases
See how Confluent Platform 6.0 brings these transformational, cloud-like qualities to self-managed Kafka
Get a sneak peak of our next Metamorphosis theme and how it impacts your Kafka and event-streaming strategy.
Matteo Merli, the tech lead for Cloud Messaging Service at Yahoo, went through their design decisions, how they reached that and how they leverage Apache BookKeeper to implement a multi-tenant messaging service.
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMEconfluent
Confluent Platform is supporting London Metal Exchange’s Kafka Centre of Excellence across a number of projects with the main objective to provide a reliable, resilient, scalable and overall efficient Kafka as a Service model to the teams across the entire London Metal Exchange estate.
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Curr...HostedbyConfluent
Zero Down Time Move From Apache Kafka to Confluent With Justin Dempsey | Current 2022
Kafka has been a crucial facet of the overall SAS Customer Intelligence 360 (CI360) architecture for quite some time. Until 2021, Kafka supporting CI360 was managed on standalone virtual machines. Traditional VM backed infrastructure posed administrative challenges for ensuring consistent software patching, adding scale on demand, and providing a highly available, redundant, and durable message bus for the CI360 microservices.
The goal was clear, the backend Kafka platform needed to move from the aging legacy systems to a more cost effective and stable solution.
The standalone VM backed Kafka clusters were migrated to the Amazon Elastic Kubernetes Service (EKS) with zero down time. Cluster Linking and the Confluent Operator were used as part of this effort. Both technologies were crucial in ensuring that the systems were online and available throughout the migration.
This session details the journey for moving standalone Kafka to Kafka on K8S. During the session, scope of the journey including Total Cost of Ownership (TCO), technical architecture, and the migration itself will be discussed.
NOTE: Experiences related to this effort are being published in a joint case study between SAS and Confluent titled, ""SAS Powers Instant, Real-Time Omnichannel Marketing at Massive Scale with Confluent's Hybrid Capabilities"".
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...HostedbyConfluent
Confluent Cloud runs a modified version of Apache Kafka - redesigned to be cloud-native and deliver a serverless user experience. In this talk, we will discuss key improvements we've made to Kafka and how they contribute to Confluent Cloud availability, elasticity, and multi-tenancy. You'll learn about innovations that you can use on-prem, and everything you need to make the most of Confluent Cloud.
Concepts and Patterns for Streaming Services with KafkaQAware GmbH
Cloud Native Night March 2020, Mainz: Talk by Perry Krol (@perkrol, Confluent)
=== Please download slides if blurred! ===
Abstract: Proven approaches such as service-oriented and event-driven architectures are joined by newer techniques such as microservices, reactive architectures, DevOps, and stream processing. Many of these patterns are successful by themselves, but they provide a more holistic and compelling approach when applied together. In this session Confluent will provide insights how service-based architectures and stream processing tools such as Apache Kafka® can help you build business-critical systems. You will learn why streaming beats request-response based architectures in complex, contemporary use cases, and explain why replayable logs such as Kafka provide a backbone for both service communication and shared datasets.
Based on these principles, we will explore how event collaboration and event sourcing patterns increase safety and recoverability with functional, event-driven approaches, apply patterns including Event Sourcing and CQRS, and how to build multi-team systems with microservices and SOA using patterns such as “inside out databases” and “event streams as a source of truth”.
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
Speakers: Ravi Dubey, Senior Manager, Software Engineering, Capital One + Jeff Sharpe, Software Engineer, Capital One
Capital One supports interactions with real-time streaming transactional data using Apache Kafka®. Kafka helps deliver information to internal operation teams and bank tellers to assist with assessing risk and protect customers in a myriad of ways.
Inside the bank, Kafka allows Capital One to build a real-time system that takes advantage of modern data and cloud technologies without exposing customers to unnecessary data breaches, or violating privacy regulations. These examples demonstrate how a streaming platform enables Capital One to act on their visions faster and in a more scalable way through the Kafka solution, helping establish Capital One as an innovator in the banking space.
Join us for this online talk on lessons learned, best practices and technical patterns of Capital One’s deployment of Apache Kafka.
-Find out how Kafka delivers on a 5-second service-level agreement (SLA) for inside branch tellers.
-Learn how to combine and host data in-memory and prevent personally identifiable information (PII) violations of in-flight transactions.
-Understand how Capital One manages Kafka Docker containers using Kubernetes.
Watch the recording: https://videos.confluent.io/watch/6e6ukQNnmASwkf9Gkdhh69?.
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
Developing Realtime Data Pipelines With Apache Kafka. Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers. Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
Keystone - Processing over Half a Trillion events per day with 8 million events & 17 GB per second peaks, and at-least once processing semantics. We will explore in detail how we employ Kafka, Samza, and Docker at scale to implement a multi-tenant pipeline. We will also look at the evolution to its current state and where the pipeline is headed next in offering a self-service stream processing infrastructure atop the Kafka based pipeline and support Spark Streaming.
The Tanzu Developer Connect is a hands-on workshop that dives deep into TAP. Attendees receive a hands on experience. This is a great program to leverage accounts with current TAP opportunities.
The Tanzu Developer Connect is a hands-on workshop that dives deep into TAP. Attendees receive a hands on experience. This is a great program to leverage accounts with current TAP opportunities.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Your Digital Assistant.
Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data.
Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you.
Feasible Features
One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated
User Friendly – can be easily used on Android, iOS, and Web Interface
Multiple Accessibility – Log in through any device from any place at any time
One app for all industries – a Visitor Management System that works for any organisation.
Stress-free Sign-up
Visitor is registered and checked-in by the Receptionist
Host gets a notification, where they opt to Approve the meeting
Host notifies the Receptionist of the end of the meeting
Visitor is checked-out by the Receptionist
Host enters notes and remarks of the meeting
Customizable Components
Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings
Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors
VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information
Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments
Alerts & Notifications – Get notified on SMS, email, and application
Parking Management – Manage availability of parking space
Individual log-in – Every user has their own log-in id
Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system
Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization.
"Secure Your Premises with VizMan (VMS) – Get It Now"
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
2. Cover w/ Image
Agenda
■ Kafka Fundamentals - Pub/Sub Done
Right
■ Kafka On K8s
■ Building Event Driven Systems
■ Demo
○ Provision a Kafka Cluster On PKS
5. Messaging
Systems
Why not traditional messaging
systems for the centralized
pipeline?
Transient Vs Durable Messages
Consumer Publish - Push vs Pull Based Mechanism
Offset Tracking - Replay Messages On Consumer
Failures
Horizontal Scalability
Distributed - Partitioning & Replication
6. Key Ideas
Key Idea 1: Data parallelism leads to scale out
Randomly distribute clients across
partitions
Key Idea 2: Disks are fast when used sequentially
Store messages as a write ahead log
Key Idea 3: Batching makes best use of the network
Batched transfer, compression, no JVM
caching (low memory footprint) & Zero Copy
13. Why File System
& Not Memory?
Lean differences with sequential access b/w file
system & memory speeds
Kafka runs on JVM
● Heavy object overheads for data stored in
memory
● Increased GC Time
14. Zero Copy
Page Cache
Socket Buffer NIC Buffer
Application Context
Kernel Context
User Space Buffer
OS send-file
15. Brokers
Cluster Aware
Receives messages from Producers,
Assigns Offset & Writes To Disk
Fetches Messages for consumers
reading partitions & responding
with committed messages.
One elected as Controller - Admin,
assigns partitions to brokers &
Monitoring
Topic Retention - Time or Size
Based
Topic A Partition 0 Topic A Partition 1
Topic A Partition 0 Topic A Partition 1
Broker 0 (Controller)
Broker 1
Leader
LeaderReplica
Replica
Kafka Cluster
Producer Consumer
Messages for A/0
Messages for A/1
Messages from A/0
Messages for A/1
16. Producers
Producers accept a ProducerRecord
ProducerRecord Key & Values are serialized
into byte array by Serializer
Partitioner - Chooses partition by key if not
specified & adds record to a specific batch for
the partition
Separate threads handles sending batches to
the brokers
Three Methods: 1. Fire & Forget 2.
Synchronous 3. Asynchronous
17. Consumers
Consumer Groups For Consumption Scaling
Topic Partitions distributed among consumers
in a group
Partitions are rebalanced on consumer
additions or crashes (consumer unavailability
& loss of consumer cache)
The story of Kafka began at LinkedIn, where their engineering team was challenged with the task of redefining their infrastructure. Breaking down monolithic applications into microservices allowed LinkedIn to scale their search, profile, communications, etc efficiently. However there was a need to share to data among these different services.
Data sources: 1. User Activity - Page views, Ad Impressions, etc 2. Server Log Metrics & Monitoring Data 3. Computationally derived data from downstream systems.
Data driven products: 1. Recommendation Engine - Connections, Endorsements 2. Profile Stats - How many searches did you appear in this week?, Who viewed your profile? 3. Visualizations - Graphs showing increase or dip in profile views 4. Infrastructure Monitoring - Restarts, Upgrades, Utilizations etc.
Challenges: Varied Systems/Applications, Volume & Durability
What did this unified log/data pipeline model bring?
Decouple producers & consumers
Simplified addition of new producers or consumers.
Single source of truth for real time and batch processing applications.
Distributed scaling for varying volume demands
Traditional messaging systems follows a push based mechanism, pushing messages to consumers. If the speed of the push is faster than the consumer processing speed, then consumers may become overwhelmed (back pressure protocol). It does not offer a centralized data pipeline for addressing real time and batch processing consumers unanimously.
RabbitMQ uses a push model and prevents overwhelming consumers via the consumer configured prefetch limit. This is great for low latency messaging and works well for RabbitMQ's queue based architecture. Kafka on the other hand uses a pull model where consumers request batches of messages from a given offset. To avoid tight loops when no messages exist beyond the current offset Kafka allows for long-polling.
A pull model makes sense for Kafka due to its partitions. As Kafka guarantees message order in a partition with no competing consumers, we can leverage the batching of messages for a more efficient message delivery that gives us higher throughput.
Messages are transient at an exchange/routing which assumes availability of consumers as well as removed post consumption by the consumer.
Distributed: A distributed system in its most simplest definition is a group of computers working together as to appear as a single computer to the end-user.These machines have a shared state, operate concurrently and can fail independently without affecting the whole system’s uptime.
To achieve high throughput, Kafka implements a three tiered architecture, and can scale out any of the tiers. In the middle, we have the Kafka cluster which runs one or more brokers, the producer tier that publishes data into the cluster and the consumer tier consumes data from the cluster.
Topics in Kafka are logical category to which messages/records are published and consumed from. The little blue boxes in the Kafka brokers represent partitions of a topic. A topic can be partitioned into multiple topic partitions and a topic partition is the unit that is distributed to the brokers.
A message is the unit of data within Kafka aka record. A message is simply an array of bytes as far as Kafka is concerned, so the data contained within it does not have a specific format or meaning to Kafka. A message can have an optional key which provides the option of controlled distribution to partitions.
The hash of the key is computed and the hash modulo based on the number of partitions for the topic.
Example: Assuming there are 5 partitions and the hash of keys compute as:
0%5=0
2%5=2
4%5=4
7%5=2
9%5=4
10%5=0
A topic is a named stream of records/messages.
Topics are stored in commit logs as partition segments.
Topic partitions are units of parallelism.
Record orders are guaranteed only within a partition.
Records in a partition are appended in a sequential order and are assigned a sequential id called Offset. Offsets are ever growing numbers.
The offset identifies each record location within a partition.
A commit log is not a new concept. Its been used long enough in the database market for writing out information about the records they will be modifying, before applying the changes to all the various data structures it maintains - transaction logs. The log is the record of what happened, and each table or index is a projection of this history into some useful data structure or index. Since the log is immediately persisted it is used as the authoritative source in restoring all other persistent structures in the event of a crash. Eventually logs were used for replicating data between databases. Many databases allow transmitting portions of log to replica databases.
To handle retention, Kafka often needs to find messages that need to be purged. With a single long partition, this is going to be slow. A partition is therefore segmented into multiple segments. On a disk a partition is directory and each segment is a commit log.
When Kafka writes to a partition, it writes to a segment — the active segment. If the segment’s size limit is reached, a new segment is opened and that becomes the new active segment.Segments are named by their base offset. The base offset of a segment is an offset greater than offsets in previous segments and less than or equal to offsets in that segment.
Each message is its value, offset, timestamp, key, message size, compression codec, checksum, and version of the message format.
The data format on disk is exactly the same as what the broker receives from the producer over the network and sends to its consumers. This allows Kafka to efficiently transfer data with zero copy.
Segments are two files: its log and index
The segment index maps offsets to their message positions in the segment. The index file is memory mapped, and the offset look up uses binary search to find the nearest offset less than or equal to the target offset.The index file is made up of 8 byte entries, 4 bytes to store the offset relative to the base offset and 4 bytes to store the position. The offset is relative to the base offset so that only 4 bytes is needed to store the offset. For example: let’s say the base offset is 10000000000000000000, rather than having to store subsequent offsets 10000000000000000001 and 10000000000000000002 they are just 1 and 2.
Kafka relies on the filesystem for the storage and caching. The problem is disks are slower than RAM. This is because the seek-time through a disk is large compared to the time required for actually reading the data.But if you can avoid seeking, then you can achieve latencies as low as RAM in some cases. This is done by Kafka through Sequential I/O.One advantage of Sequential I/O is you get a cache without writing any logic in your application for it. Modern operating systems allocate most of their free memory to disk-caching. So, if you are reading in an ordered fashion, the OS can always read-ahead and store data in a cache on each disk read.
This is much better than maintaining a cache in a JVM application. This is because JVM objects are “heavy” and can lead to high garbage collection, which becomes worse as data size increases.
One of the major inefficiencies of data processing system is the serialization and deserialization of data into formats suitable for storing & transmitting (JSON). How does Kafka avoid this?
By using the standardized binary data format between producers, consumers and brokers.
Zero Copy
Keeping data in the same format as it would be sent over the network helps copying directly from page cache to socket buffer removing the application context
The controller is responsible for administrative operations, including assigning partitions to brokers and monitoring for broker failures. A partition is owned by a single broker in the cluster, and that broker is called the leader of the partition. A partition may be assigned to multiple brokers, which will result in the partition being replicated. This provides redundancy of messages in the partition, such that another broker can take over leadership if there is a broker failure. However, all consumers and producers operating on that partition must connect to the leader.
Kafka brokers are configured with a default retention setting for topics, either retaining messages for some period of time (e.g., 7 days) or until the topic reaches a certain size in bytes (e.g., 1 GB). Once these limits are reached, messages are expired and deleted so that the retention configuration is a minimum amount of data available at any time. Individual topics can also be configured with their own retention settings so that messages are stored for only as long as they are useful.
Producers balance load as described above.
Fire-and-forget
We send a message to the server and don’t really care if it arrives succesfully or not. Most of the time, it will arrive successfully, since Kafka is highly available and the producer will retry sending messages automatically. However, some messages will get lost using this method.
Synchronous send
We send a message, the send() method returns a Future object, and we use get() to wait on the future and see if the send() was successful or not.
Asynchronous send
We call the send() method with a callback function, which gets triggered when it receives a response from the Kafka broker.
A producer object can be used by multiple threads to send messages. Its typical to start with one producer and one thread.
If better throughput is needed, more threads that use the same producer can be added. Once this ceases to increase throughput, more producers to the application to achieve even higher throughput can be added.
If we add more consumers to a single group with a single topic than we have partitions, some of the consumers will be idle and get no messages at all.
The main way we scale data consumption from a Kafka topic is by adding more consumers to a consumer group. It is common for Kafka consumers to do high-latency operations such as write to a database or a time-consuming computation on the data. In these cases, a single consumer can’t possibly keep up with the rate data flows into a topic, and adding more consumers that share the load by having each consumer own just a subset of the partitions and messages is our main method of scaling. This is a good reason to create topics with a large number of partitions—it allows adding more consumers when the load increases.
In addition to having multiple consumers within a group, we may also have multiple consumer groups to the same topic. Kafka scales to large number of consumer groups and consumers without impacting performance.