Real-Time Dynamic Data Export Using the Kafka Ecosystem

The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight

Integrating Apache Kafka with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren’t working. This talk will discuss the key design concepts within Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We’ll do a live demo of building pipelines with Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we’ll go hands-on in methodically diagnosing and resolving common issues encountered with Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Kafka Connect in containers.

From Zero to Hero with Kafka Connect (Robin Moffat, Confluent) Kafka Summit L...

Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job? This session explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka. No matter if you think about open source Apache Kafka, a cloud service like Confluent Cloud, or another technology using the Kafka protocol like Redpanda or Pulsar, check out this slide deck. A detailed article about this topic: https://www.kai-waehner.de/blog/2022/01/04/when-not-to-use-apache-kafka/

Internal Hive

Recruit Technologies

When NOT to use Apache Kafka?

Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. Many ML Platforms cover data collection, feature engineering, training, deploying, productionalization, and monitoring but few, if any, do all of the above seamlessly. Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python and Spark and can be used in modular pieces as each ML problem presents unique challenges. Through standardization of the path to production, training environments and the methods for collecting and transforming data on Spark, each model is reproducible and iterable. This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adapted in Airbnb and we have variety of models running in production. We have seen the overall model development time go down from many months to days on Bighead. We plan to open source Bighead to allow the wider community to benefit from our work.

Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...

Kafka and Machine Learning in Banking and Insurance Industry

Many of the Big Data and IoT use cases are based on combining data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.

Oracle Advanced Analytics

aghosh_us

Data Ingestion in Big Data and IoT platforms

Guido Schmutz

Lyft is on the mission to improve people’s lives with the world’s best transportation. Starting 2019, Lyft has been running both Batch ETL and ML spark workloads primarily on Kubernetes with the Apache Spark on k8s operator. However, with the increasing scale of workloads in frequency and resource requirements, we started hitting numerous reliability issues related to IP allocation, container images, IAM role assignment, and Kubernetes Control Plane. To continue supporting growing Spark usage with Lyft, the team came up with a hybrid architecture optimized for containerized and non-containerized workload based on Kubernetes and YARN. In this talk, we will also cover a dynamic runtime controller that helps with per environment config overrides and easy switchover between resource managers.

Hybrid Apache Spark Architecture with YARN and Kubernetes

Live commerce combines instant purchasing of a featured product and audience participation. This talk explores the need for real-time data streaming with Apache Kafka between applications to enable live commerce across online stores and brick & mortar stores across regions, countries, and continents in any retail business. The discussion covers several building blocks of a live commerce enterprise architecture, including transactional data processing, omnichannel, natural language processing, augmented reality, edge computing, and more.

Databricks Partner Enablement Guide.pdf

ssuserb74636

Kafka for Live Commerce to Transform the Retail and Shopping Metaverse

Bring Satellite and Drone Imagery into your Data Science Workflows

The Journey to Data Mesh with Confluent

Companies are dealing with increasingly large data sets and looking for ways to significantly improve the scale and cost of Big Data analysis with AWS. This hands-on session shows you how you can achieve that. With hundreds of pre-built connectors, you will learn how to get your on-premise and cloud data into Redshift in minutes, not days, and at a significantly reduced costs using Informatica Cloud Integration. With fully certified support for large scale RDS deployments and Informatica’s Vibe Data Stream solution for automated streaming data collection for Kinesis, Informatica offers a comprehensive cloud integration solution for Big Data analytics with AWS. The ability to seamlessly migrate Informatica’s PowerCenter to Amazon Cloud (EC2) offers customers a Cloud migration path, with even higher performance and lower costs.

Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...

Amazon Web Services

Streaming all over the World: Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka. Learn about various case studies for event streaming with Apache Kafka across industries. The talk explores architectures for real-world deployments from Audi, BMW, Disney, Generali, Paypal, Tesla, Unity, Walmart, William Hill, and more. Use cases include fraud detection, mainframe offloading, predictive maintenance, cybersecurity, edge computing, track&trace, live betting, and much more.

Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka

This session discusses uses cases leveraging Apache Kafka open source ecosystem as streaming platform to process IoT data. See use cases, architectural alternatives and a live demo of how devices connect to Kafka via MQTT. Learn how to analyze the IoT data either natively on Kafka with Kafka Streams/KSQL, or on an external big data cluster like Spark, Flink or Elastic leveraging Kafka Connect, and how to leverage TensorFlow for Machine Learning. The focus is on connected cars / connected vehicles and V2X use cases respectively mobility services. A live demo shows how to build a cloud-native IoT infrastructure on Kubernetes to connect and process streaming data in real-time from 100.000 cars to do predictive maintenance at scale in real-time. Code for the live demo on Github: https://github.com/kaiwaehner/hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference

Connected Vehicles and V2X with Apache Kafka

Deploying Kafka to support multiple teams or even an entire company has many benefits. It reduces operational costs, simplifies onboarding of new applications as your adoption grows, and consolidates all your data in one place. However, this makes applications sharing the cluster vulnerable to any one or few of them taking all cluster resources. The combined cluster load also becomes less predictable, increasing the risk of overloading the cluster and data unavailability. In this talk, we will describe how to use quota framework in Apache Kafka to ensure that a misconfigured client or unexpected increase in client load does not monopolize broker resources. You will get a deeper understanding of bandwidth and request quotas, how they get enforced, and gain intuition for setting the limits for your use-cases. While quotas limit individual applications, there must be enough cluster capacity to support the combined application load. Onboarding new applications or scaling the usage of existing applications may require manual quota adjustments and upfront capacity planning to ensure high availability. We will describe the steps we took toward solving this problem in Confluent Cloud, where we must immediately support unpredictable load with high availability. We implemented a custom broker quota plugin (KIP-257) to replace static per broker quota allocation with dynamic and self-tuning quotas based on the available capacity (which we also detect dynamically). By learning our journey, you will have more insights into the relevant problems and techniques to address them.

Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...

HostedbyConfluent

What's hot (20)

What is API Product Management by PayPal Director of Product

ADF Mapping Data Flows Training Slides V1

Apache Spark at Airbnb

The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight

From Zero to Hero with Kafka Connect (Robin Moffat, Confluent) Kafka Summit L...

Internal Hive

When NOT to use Apache Kafka?

Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...

Kafka and Machine Learning in Banking and Insurance Industry

Oracle Advanced Analytics

Data Ingestion in Big Data and IoT platforms

Hybrid Apache Spark Architecture with YARN and Kubernetes

Databricks Partner Enablement Guide.pdf

Kafka for Live Commerce to Transform the Retail and Shopping Metaverse

Bring Satellite and Drone Imagery into your Data Science Workflows

The Journey to Data Mesh with Confluent

Power Big Data Analytics with Informatica Cloud Integration for Redshift, Kin...

Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka

Connected Vehicles and V2X with Apache Kafka

Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...

Similar to Real-Time Dynamic Data Export Using the Kafka Ecosystem

___________________________________________ Meetup#7 | Session 2 | 21/03/2018 | Taboola _____________________________________________ In this talk, we will present our multi-DC Kafka architecture, and discuss how we tackle sending and handling 10B+ messages per day, with maximum availability and no tolerance for data loss. Our architecture includes technologies such as Cassandra, Spark, HDFS, and Vertica - with Kafka as the backbone that feeds them all.

Distributed Kafka Architecture Taboola Scale

Apache Kafka TLV

goto; London: Keeping your Cloud Footprint in Check

Coburn Watson

Jitney, Kafka at Airbnb

alexismidon

Festive Tech Calendar 2021

Callon Campbell

Building cloud native data microservice

Nilanjan Roy

OSMC 2023 | Current State of Icinga by Bernd Erk

NETWAYS

Distributed applications are becoming more popular with the increasing popularity of microservices (however you want to define that term). But the principles of distributed application development are key if you want to build a system that is resilient, responsive, elastic and maintainable. In this workshop, we’ll review the principles of CQRS and the Reactive Manifesto, and how they complement each other. We’ll build an application that can handle a large stream of data, and allow users to still have a responsive experience while interacting with real-time and near-real-time data. We’ll look at Akka.NET as the workhorse inside your services, and how the principles of CQRS can help with your service-to-service communications. We’ll also look at how Event Sourcing can aid in managing your domain state, and how an event stream can be used to project data for your system for a number of different uses. We’ll build our own simple event store, but also look at commercially available stores, too. This session will focus on using Akka.NET along with a few other tools and technologies, such as EventStore and MongoDB. The concepts learned in this session will be applicable to a number of different tools, technologies and languages.

Reactive Development: Commands, Actors and Events. Oh My!!

David Hoerster

Serverless in the Azure World

Kasun Kodagoda

Automated Data Synchronization: Data Loader, Data Mirror & Beyond

JeremyOtt5

A leading Ad Tech company serves billions of advertising impressions and collects terabytes of data from several Data centers across the world. In batch data processing, data is collected at different geographic locations and processed at regular intervals. This system brings delay of at least 1 hour before an event is accounted for. The goal of having real time streaming was to provide publishers, Demand Side Platforms (DSP's) and agencies actionable insights in a few minutes from the time of event generation. This Ad Tech company uses DataTorrent RTS powered by Apex for: • Real time reporting • Resource monitoring • Real time learning • Allocation engine Tushar Gosavi from DataTorrent will take the audience through the architecture, custom operators developed, use cases for real time and the challenges involved in implementing streaming systems at scale where multiple data centers are in play. Tushar is a Senior Engineer at DataTorrent and has worked in distributed systems and storage domains.

Real Time Insights for Advertising Tech

Apache Apex

CouchbasetoHadoop_Matt_Michael_Justin v4

Michael Kehoe

DataEngConf SF16 - Scalable and Reliable Logging at Pinterest

Hakka Labs

At Pinterest, hundreds of services and third-party tools that are implemented in various programming languages generate billions of events every day. To achieve scalable and reliable low latency logging, there are several challenges: (1) uploading logs that are generated in various formats from tens of thousands of hosts to Kafka in a timely manner; (2) running Kafka reliably on Amazon Web Services where the virtual instances are less reliable than on-premises hardware; (3) moving tens of terabytes data per day from Kafka to cloud storage reliably and efficiently, and guaranteeing exact one time persistence per message. In this talk, we will present Pinterest’s logging pipeline, and share our experience addressing these challenges. We will dive deep into the three components we developed: data uploading from service hosts to Kafka, data transportation from Kafka to S3, and data sanitization. We will also share our experience in operating Kafka at scale in the cloud.

Scalable and Reliable Logging at Pinterest

Krishna Gade

Serverless architecture is the next big shift in computing - completely abstracting the underlying infrastructure and focusing 100% on the business logic. Today we can create applications directly in our browser and leave the decision how they are hosted and scaled to the cloud provider. Moreover, this approach give us incredible control over the granularity of our applications since most of the time we are dealing with single function at a time. In this presentation we will cover: • Introduce Serverless Architectures • Talk about the advantages of Serverless Architectures • Discuss in details in event-driven computing • Cover common Serverless approaches • See practical applications with Azure Functions • Compare AWS Lambda and Azure Functions • Talk about open source alternatives • Explore the relation between Microservices and Serverless Architectures

Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...

Tokyo Azure Meetup

This talk will cover two years of Zabbix deployment at Globo.com - the web branch of Rede Globo, a major media player in Brazil providing access to media products to more than 1,8 million visitors hourly, 45 millions each day. The case study will include migration from legacy systems, integration, templates and deployment, suming up the challenges, solutions and the gained knowledge. Zabbix Conference 2015

Filipe paternot - Case Study: Zabbix Deployment at Globo.com

Zabbix

AWS for Java Developers workshop

Rory Preddy

Serverless architectures let you build and deploy applications and services with infrastructure resources that require zero administration. In the past, you had to provision and scale servers to run your application code, install and operate distributed databases, and build and run custom software to handle API requests. Now, AWS provides a stack of scalable, fully-managed services that eliminates these operational complexities. In this session, you will learn about the benefits of serverless architectures and the basics of the serverless stack AWS provides. We will also walk through how you can use serverless architectures for everything from data processing to mobile and web backends. AWS DevDay San Francisco, June 21, 2016. Presenter: Jeremy Edberg, Co-Founder, CloudNative, & AWS Community Hero

Getting Started with Serverless Architectures

Amazon Web Services

Introduction to Google Cloud Platform

Sujai Prakasam

MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...

MongoDB

Evolving s3 story

Avi Perez

Similar to Real-Time Dynamic Data Export Using the Kafka Ecosystem (20)

Distributed Kafka Architecture Taboola Scale

goto; London: Keeping your Cloud Footprint in Check

Jitney, Kafka at Airbnb

Festive Tech Calendar 2021

Building cloud native data microservice

OSMC 2023 | Current State of Icinga by Bernd Erk

Reactive Development: Commands, Actors and Events. Oh My!!

Serverless in the Azure World

Automated Data Synchronization: Data Loader, Data Mirror & Beyond

Real Time Insights for Advertising Tech

CouchbasetoHadoop_Matt_Michael_Justin v4

DataEngConf SF16 - Scalable and Reliable Logging at Pinterest

Scalable and Reliable Logging at Pinterest

Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...

Filipe paternot - Case Study: Zabbix Deployment at Globo.com

AWS for Java Developers workshop

Getting Started with Serverless Architectures

Introduction to Google Cloud Platform

MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...

Evolving s3 story

More from confluent

Evolving Data Governance for the Real-time Streaming and AI Era

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...

Santander Stream Processing with Apache Flink

Unlocking the Power of IoT: A comprehensive approach to real-time insights

El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real. Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real. En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.

Workshop híbrido: Stream Processing con Flink

Our talk will explore the transformative impact of integrating Confluent, HiveMQ, and SparkPlug in Industry 4.0, emphasizing the creation of a Unified Namespace. In addition to the creation of a Unified Namespace, our webinar will also delve into Stream Governance and Scaling, highlighting how these aspects are crucial for managing complex data flows and ensuring robust, scalable IIoT-Platforms. You will learn how to ensure data accuracy and reliability, expand your data processing capabilities, and optimize your data management processes. Don't miss out on this opportunity to learn from industry experts and take your business to the next level.

Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...

La arquitectura impulsada por eventos (EDA) será el corazón del ecosistema de MAPFRE. Para seguir siendo competitivas, las empresas de hoy dependen cada vez más del análisis de datos en tiempo real, lo que les permite obtener información y tiempos de respuesta más rápidos. Los negocios con datos en tiempo real consisten en tomar conciencia de la situación, detectar y responder a lo que está sucediendo en el mundo ahora.

AWS Immersion Day Mapfre - Confluent

Eventos y Microservicios - Santander TechTalk

Q&A with Confluent Experts: Navigating Networking in Confluent Cloud

Citi TechTalk Session 2: Kafka Deep Dive

Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.

Build real-time streaming data pipelines to AWS with Confluent

Q&A with Confluent Professional Services: Confluent Service Mesh

Citi Tech Talk: Event Driven Kafka Microservices

An in depth look at how Confluent is being used in the financial services industry. Gain an understanding of how organisations are utilising data in motion to solve common problems and gain benefits from their real time data capabilities. It will look more deeply into some specific use cases and show how Confluent technology is used to manage costs and mitigate risks. This session is aimed at Solutions Architects, Sales Engineers and Pre Sales, and also the more technically minded business aligned people. Whilst this is not a deeply technical session, a level of knowledge around Kafka would be helpful.

Confluent & GSI Webinars series - Session 3

Transforming applications built with traditional messaging solutions such as TIBCO, MQ and Solace to be scalable, reliable and ready for the move to cloud How can applications built with traditional messaging technologies like TIBCO, Solace and IBM MQ be modernised and be made cloud ready? What are the advantages to Event Streaming approaches to pub/sub vs traditional message queues? What are the strengeths and weaknesses of both approaches, and what use cases and requirements are actually a better fit for messaging than Kafka?

Citi Tech Talk: Messaging Modernization

Citi Tech Talk: Data Governance for streaming and real time data

Confluent & GSI Webinars series: Session 2

Vous apprendrez également à : • Créer plus rapidement des produits et fonctionnalités à l’aide d’une suite complète de connecteurs et d’outils de gestion des flux, et à connecter vos environnements à des pipelines de données • Protéger vos données et charges de travail les plus critiques grâce à des garanties intégrées en matière de sécurité, de gouvernance et de résilience • Déployer Kafka à grande échelle en quelques minutes tout en réduisant les coûts et la charge opérationnelle associés

Data In Motion Paris 2023

Confluent Partner Tech Talk with Synthesis

The Future of Application Development - API Days - Melbourne 2023