From Postgres to Event-Driven: using docker-compose to build CDC pipelines into Apache Kafka®

Meetup: Streaming Data Pipeline Development

Meetup: Streaming Data Pipeline Development In this interactive session, Tim will lead participants through how to best build streaming data pipelines. He will cover how to build applications from some common use cases and highlight tips, tricks, best practices and patterns. He will show how to build the easy way and then dive deep into the underlying open source technologies including Apache NiFi, Apache Flink, Apache Kafka and Apache Iceberg. If you wish to follow along, please download open source projects beforehand. You can also download this helpful streaming platform: https://docs.cloudera.com/csp-ce/latest/installation/topics/csp-ce-installing-ce.html All source code and slides will be shared for those interested in building their own FLaNK Apps. https://www.flankstack.dev/ You can join the meeting virtually here: https://cloudera.zoom.us/j/91603330726 Speaker - Tim Spann Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.

"Most data practitioners grapple with data quality issues and data pipeline complexities—it's the bane of their existence. Data engineers, in particular, strive to design and deploy robust data pipelines that serve reliable data in a performant manner so that their organizations can make the most of their valuable corporate data assets. Databricks Delta, part of Databricks Runtime, is a next-generation unified analytics engine built on top of Apache Spark. Built on open standards, Delta employs co-designed compute and storage and is compatible with Spark API’s. It powers high data reliability and query performance to support big data use cases, from batch and streaming ingests, fast interactive queries to machine learning. In this tutorial we will discuss the requirements of modern data pipelines, the challenges data engineers face when it comes to data reliability and performance and how Delta can help. Through presentation, code examples and notebooks, we will explain pipeline challenges and the use of Delta to address them. You will walk away with an understanding of how you can apply this innovation to your data architecture and the benefits you can gain. This tutorial will be both instructor-led and hands-on interactive session. Instructions in how to get tutorial materials will be covered in class. WHAT YOU’LL LEARN: – Understand the key data reliability and performance data pipelines challenges – How Databricks Delta helps build robust pipelines at scale – Understand how Delta fits within an Apache Spark™ environment – How to use Delta to realize data reliability improvements – How to deliver performance gains using Delta PREREQUISITES: – A fully-charged laptop (8-16GB memory) with Chrome or Firefox – Pre-register for Databricks Community Edition" Speakers: Steven Yu, Burak Yavuz

Apache Iceberg - A Table Format for Hige Analytic Datasets

Alluxio, Inc.

Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...

HostedbyConfluent

Apache Kafka With Spark Structured Streaming With Emma LIU, Nitin Saksena, Ram Dhakne | Current 2022 A well-architected data lakehouse provides an open data platform that combines streaming with data warehousing, data engineering, data science and ML. This opens a world beyond streaming to solving business problems in real-time with analytics and AI. See how companies like Albertsons have used Databricks and Confluent together to combine Kafka streaming with Databricks for their digital transformation. In this talk, you will learn: - The built-in streaming capabilities of a lakehouse - Best practices for integrating Kafka with Spark Structured Streaming - How Albertsons architected their data platform for real-time data processing and real-time analytics

Change Data Streaming Patterns for Microservices With Debezium

(Gunnar Morling, RedHat) Kafka Summit SF 2018 Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/): secret sauce for change data capture (CDC) streaming changes from your datastore that enables you to solve multiple challenges: synchronizing data between microservices, gradually extracting microservices from existing monoliths, maintaining different read models in CQRS-style architectures, updating caches and full-text indexes and feeding operational data to your analytics tools Join this session to learn what CDC is about, how it can be implemented using Debezium, an open source CDC solution based on Apache Kafka and how it can be utilized for your microservices. Find out how Debezium captures all the changes from datastores such as MySQL, PostgreSQL and MongoDB, how to react to the change events in near real time and how Debezium is designed to not compromise on data correctness and completeness also if things go wrong. In a live demo we’ll show how to set up a change data stream out of your application’s database without any code changes needed. You’ll see how to sink the change events into other databases and how to push data changes to your clients using WebSockets.

Iceberg: a fast table format for S3

Cicero Joasyo Mateus de Moura

Netflix’s Big Data Platform team manages data warehouse in Amazon S3 with over 60 petabytes of data and writes hundreds of terabytes of data every day. With a data warehouse at this scale, it is a constant challenge to keep improving performance. This talk will focus on Iceberg, a new table metadata format that is designed for managing huge tables backed by S3 storage. Iceberg decreases job planning time from minutes to under a second, while also isolating reads from writes to guarantee jobs always use consistent table snapshots. In this session, you'll learn: • Some background about big data at Netflix • Why Iceberg is needed and the drawbacks of the current tables used by Spark and Hive • How Iceberg maintains table metadata to make queries fast and reliable • The benefits of Iceberg's design and how it is changing the way Netflix manages its data warehouse • How you can get started using Iceberg Speaker Ryan Blue, Software Engineer, Netflix

Data platform modernization with Databricks.pptx

CalvinSim10

High-speed Database Throughput Using Apache Arrow Flight SQL

ScyllaDB

Stream de dados e Data Lake com Debezium, Delta Lake e EMR

Stream Processing using Apache Flink in Zalando's World of Microservices - Re...

Zalando Technology

In this talk we present Zalando's microservices architecture, introduce Saiki – our next generation data integration and distribution platform on AWS and show how we employ stream processing for near-real time business intelligence. Zalando is one of the largest online fashion retailers in Europe. In order to secure our future growth and remain competitive in this dynamic market, we are transitioning from a monolithic to a microservices architecture and from a hierarchical to an agile organization. We first have a look at how business intelligence processes have been working inside Zalando for the last years and present our current approach - Saiki. It is a scalable, cloud-based data integration and distribution infrastructure that makes data from our many microservices readily available for analytical teams. We no longer live in a world of static data sets, but are instead confronted with an endless stream of events that constantly inform us about relevant happenings from all over the enterprise. The processing of these event streams enables us to do near-real time business intelligence. In this context we have evaluated Apache Flink vs. Apache Spark in order to choose the right stream processing framework. Given our requirements, we decided to use Flink as part of our technology stack, alongside with Kafka and Elasticsearch. With these technologies we are currently working on two use cases: a near real-time business process monitoring solution and streaming ETL. Monitoring our business processes enables us to check if technically the Zalando platform works. It also helps us analyze data streams on the fly, e.g. order velocities, delivery velocities and to control service level agreements. On the other hand, streaming ETL is used to relinquish resources from our relational data warehouse, as it struggles with increasingly high loads. In addition to that, it also reduces the latency and facilitates the platform scalability. Finally, we have an outlook on our future use cases, e.g. near-real time sales and price monitoring. Another aspect to be addressed is to lower the entry barrier of stream processing for our colleagues coming from a relational database background.

Oracle RAC on Extended Distance Clusters - Presentation

Building Reliable Lakehouses with Apache Flink and Delta Lake

Flink Forward

Flink Forward San Francisco 2022. Apache Flink and Delta Lake together allow you to build the foundation for your data lakehouses by ensuring the reliability of your concurrent streams from processing to the underlying cloud object-store. Together, the Flink/Delta Connector enables you to store data in Delta tables such that you harness Delta’s reliability by providing ACID transactions and scalability while maintaining Flink’s end-to-end exactly-once processing. This ensures that the data from Flink is written to Delta Tables in an idempotent manner such that even if the Flink pipeline is restarted from its checkpoint information, the pipeline will guarantee no data is lost or duplicated thus preserving the exactly-once semantics of Flink. by Scott Sandre & Denny Lee

The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...

DataStax

In this presentation, we will look into JIRAs, JavaDocs and system log entries to gain a deeper understanding on how LCS works under the hood. We will explain what scenarios don't work well for LCS and (more importantly) why. We will leverage legacy TRACE/DEBUG level log for compaction related objects as well as some newer compaction logging information introduced in C* 3.6 (CASSANDRA-10805) to gain better insights. About the Speakers Wei Deng Solutions Architect, DataStax Solutions Architect for DataStax. I have a strong interest in big data, cloud application and distributed computing practices.

The delta architecture

Prakash Chockalingam

Lambda architecture is a popular technique where records are processed by a batch system and streaming system in parallel. The results are then combined during query time to provide a complete answer. Strict latency requirements to process old and recently generated events made this architecture popular. The key downside to this architecture is the development and operational overhead of managing two different systems. There have been attempts to unify batch and streaming into a single system in the past. Organizations have not been that successful though in those attempts. But, with the advent of Delta Lake, we are seeing lot of engineers adopting a simple continuous data flow model to process data as it arrives. We call this architecture, The Delta Architecture.

"It can always get worse!" – Lessons Learned in over 20 years working with Or...

Introducing Change Data Capture with Debezium

ChengKuan Gan

Build Real-Time Applications with Databricks Streaming

In this presentation, we will study a recent use case we implemented recently. In this use case we are working with a large, metropolitan fire department. Our company has already created a complete analytics architecture for the department based upon Azure Data Factory, Databricks, Delta Lake, Azure SQL and Azure SQL Server Analytics Services (SSAS). While this architecture works very well for the department, they would like to add a real-time channel to their reporting infrastructure. This channel should serve up the following information: •The most up-to-date locations and status of equipment (fire trucks, ambulances, ladders etc.) • The current locations and status of firefighters, EMT personnel and other relevant fire department employees • The current list of active incidents within the city The above information should be visualized through an automatically updating dashboard. The central component of the dashboard will be map which automatically updates with the locations and incidents. This view should be as real-time as possible and will be used by the fire chiefs to assist with real-time decision-making on resource and equipment deployments. In this presentation, we will leverage Databricks, Spark Structured Streaming, Delta Lake and the Azure platform to create this real-time delivery channel.

Free Training: How to Build a Lakehouse

Every business today wants to leverage data to drive strategic initiatives with machine learning, data science and analytics — but runs into challenges from siloed teams, proprietary technologies and unreliable data. That’s why enterprises are turning to the lakehouse because it offers a single platform to unify all your data, analytics and AI workloads. Join our How to Build a Lakehouse technical training, where we’ll explore how to use Apache SparkTM, Delta Lake, and other open source technologies to build a better lakehouse. This virtual session will include concepts, architectures and demos. Here’s what you’ll learn in this 2-hour session: How Delta Lake combines the best of data warehouses and data lakes for improved data reliability, performance and security How to use Apache Spark and Delta Lake to perform ETL processing, manage late-arriving data, and repair corrupted data directly on your lakehouse

Stream processing using Kafka

Knoldus Inc.

Building Real-Time Travel Alerts

Building Real-time Travel Alerts In this session, we will walk through how to build a complete streaming application to send alerts based on travel advisories from public data. We will also join in other data sources of relevance and push out alerts. We will show you how to build this streaming application with Apache NiFi, Apache Kafka, and Apache Flink and show you when/why/how, and what to build to maximize performance, productivity, and ease of development. Let's get streaming. Apache Flink Apache Kafka Apache NiFi FLaNK Stack Tim Spann Big Data Conference Europe 2023

Present and future of unified, portable, and efficient data processing with A...

The world of big data involves an ever-changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In a way, Apache Beam is a glue that can connect the big data ecosystem together; it enables users to "run any data processing pipeline anywhere." This talk will briefly cover the capabilities of the Beam model for data processing and discuss its architecture, including the portability model. We’ll focus on the present state of the community and the current status of the Beam ecosystem. We’ll cover the state of the art in data processing and discuss where Beam is going next, including completion of the portability framework and the Streaming SQL. Finally, we’ll discuss areas of improvement and how anybody can join us on the path of creating the glue that interconnects the big data ecosystem. Speaker Davor Bonaci, Apache Software Foundation; Simbly, V.P. of Apache Beam; Founder/CEO at Operiant

What's hot

Best Practices for Building Robust Data Platform with Apache Spark and Delta

A Thorough Comparison of Delta Lake, Iceberg and Hudi

Building Robust Production Data Pipelines with Databricks Delta

Apache Iceberg - A Table Format for Hige Analytic Datasets

Alluxio, Inc.

Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...

HostedbyConfluent

Change Data Streaming Patterns for Microservices With Debezium

Iceberg: a fast table format for S3

Cicero Joasyo Mateus de Moura

Data platform modernization with Databricks.pptx

CalvinSim10

High-speed Database Throughput Using Apache Arrow Flight SQL

ScyllaDB

Stream de dados e Data Lake com Debezium, Delta Lake e EMR

Stream Processing using Apache Flink in Zalando's World of Microservices - Re...

Zalando Technology

Oracle RAC on Extended Distance Clusters - Presentation

Building Reliable Lakehouses with Apache Flink and Delta Lake

Flink Forward

The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...

DataStax

The delta architecture

Prakash Chockalingam

"It can always get worse!" – Lessons Learned in over 20 years working with Or...

Introducing Change Data Capture with Debezium

ChengKuan Gan

Build Real-Time Applications with Databricks Streaming

Free Training: How to Build a Lakehouse

Stream processing using Kafka

Knoldus Inc.

What's hot (20)

Best Practices for Building Robust Data Platform with Apache Spark and Delta

A Thorough Comparison of Delta Lake, Iceberg and Hudi

Building Robust Production Data Pipelines with Databricks Delta

Apache Iceberg - A Table Format for Hige Analytic Datasets

Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...

Change Data Streaming Patterns for Microservices With Debezium

Iceberg: a fast table format for S3

Data platform modernization with Databricks.pptx

High-speed Database Throughput Using Apache Arrow Flight SQL

Stream de dados e Data Lake com Debezium, Delta Lake e EMR

Stream Processing using Apache Flink in Zalando's World of Microservices - Re...

Oracle RAC on Extended Distance Clusters - Presentation

Building Reliable Lakehouses with Apache Flink and Delta Lake

The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...

The delta architecture

"It can always get worse!" – Lessons Learned in over 20 years working with Or...

Introducing Change Data Capture with Debezium

Build Real-Time Applications with Databricks Streaming

Free Training: How to Build a Lakehouse

Stream processing using Kafka

Similar to From Postgres to Event-Driven: using docker-compose to build CDC pipelines into Apache Kafka®

Building Real-Time Travel Alerts

Present and future of unified, portable, and efficient data processing with A...

Apache Kafka - Scalable Message Processing and more!

Guido Schmutz

After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL. Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go. Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.

Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...

Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and Kafka Apache NiFi, Apache Flink, Apache Kafka Timothy Spann Principal Developer Advocate Cloudera Data in Motion https://budapestdata.hu/2023/en/speakers/timothy-spann/ Timothy Spann Principal Developer Advocate Cloudera (US) LinkedIn · GitHub · datainmotion.dev June 8 · Online · English talk Building Modern Data Streaming Apps with NiFi, Flink and Kafka In my session, I will show you some best practices I have discovered over the last 7 years in building data streaming applications including IoT, CDC, Logs, and more. In my modern approach, we utilize several open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Kafka. From there we build streaming ETL with Apache Flink SQL. We will stream data into Apache Iceberg. We use the best streaming tools for the current applications with FLaNK. flankstack.dev BIO Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.

Mule soft meetup_chandigarh_#7_25_sept_2021

Lalit Panwar

Camel Day Italia 2021 - Camel K

Nicola Ferraro

AIDEVDAY_ Data-in-Motion to Supercharge AI

AIDEVDAY_ Data-in-Motion to Supercharge AI https://www.meetup.com/futureofdata-newyork/events/295376737/ Lightning Talk 2: Data-in-Motion to Supercharge AI Speaker: Timothy Spann @Cloudera Abstract: A quick look at the current state of real-time streaming for powering both data ingest and transformation to provide training and enhancement data to models. Also how to use streaming to feed a pipeline of data against your models or models hosted a HuggingFace or elsewhere. https://huggingface.co/bigscience/bloom https://www.aicamp.ai/event/eventdetails/W2023082314 Apache NiFi, Apache Kafka, Apache Flink, HuggingFace, WatsonX.AI, REST API, Cloudera Machine Learning (CML), Bloom, Deep Learning, AI

Building Event-Driven Systems with Apache Kafka

Brian Ritchie

Event-driven systems provide simplified integration, easy notifications, inherent scalability and improved fault tolerance. In this session we'll cover the basics of building event driven systems and then dive into utilizing Apache Kafka for the infrastructure. Kafka is a fast, scalable, fault-taulerant publish/subscribe messaging system developed by LinkedIn. We will cover the architecture of Kafka and demonstrate code that utilizes this infrastructure including C#, Spark, ELK and more. Sample code: https://github.com/dotnetpowered/StreamProcessingSample

Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...

DataStax Academy

We will be talking about the solution we developed for using Mesos, Docker, Kafka, Spark, Cassandra and Solr (DataStax Enterprise Edition) all developed in Go for doing realtime log analysis at scale. Many organizations either need or want log analysis in real time where you can see within a second what is happening within your entire infrastructure. Today, with the hardware available and software systems we have in place, you can develop, build and use as a service these solutions.

JConWorld_ Continuous SQL with Kafka and Flink

JConWorld: Continuous SQL with Kafka and Flink In this talk, I will walk through how someone can setup and run continous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas and publishing data. We will then cover consuming Kafka data, joining Kafka topics and inserting new events into Kafka topics as they arrive. This basic over view will show hands-on techniques, tips and examples of how to do this. Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. https://www.datainmotion.dev/p/about-me.html https://dzone.com/users/297029/bunkertor.html https://www.youtube.com/channel/UCDIDMDfje6jAvNE8DGkJ3_w?view_as=subscriber

Leverage Kafka to build a stream processing platform

Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra

Joe Stein

Slides for our solution we developed for using Mesos, Docker, Kafka, Spark, Cassandra and Solr (DataStax Enterprise Edition) all developed in Go for doing realtime log analysis at scale. Many organizations either need or want log analysis in real time where you can see within a second what is happening within your entire infrastructure. Today, with the hardware available and software systems we have in place, you can develop, build and use as a service these solutions.

Present and future of unified, portable and efficient data processing with Ap...

Apache Kafka - A modern Stream Processing Platform

Guido Schmutz

Data streaming

Alberto Paro

26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup

South Tyrol Free Software Conference

26Oct2023_ Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup.pdf ## Details **Important** Please complete your registration in this short form. For on-site we have limited room, so please confirm if you are attending in-person in Manhattan, NYC. -------------------------------------------------------------------------------------------- We're at StarTree, excited to join forces with our friends at Cloudera, for a meetup that is all about The Latest in Real-Time Analytics: Generative AI and LLM, featuring Apache Pinot and Apache NiFi. Join us for an insightful discussion about cutting-edge analytics, meet the community in person, and catch up over drinks and snacks. What's the plan ? 05:30-06:00 Pizza and Networking 06:00-06:35 Adding Generative AI to Real-Time Streaming Pipelines | Tim Spann, Principal Developer Advocate, Cloudera 06:35-07:10 Apache Pinot and Kafka an excellent pairing for refined palates | Tim Veil, VP of Solutions Engineering and Enablement, StarTree 07:10-07:20 QNA 07:20- 07:30 More Snacks and Networking ;) **Important** Seats are limited Please complete your registration in this short form. Adding Generative AI to Real-Time Streaming Pipelines | Timothy Spann In this talk, Tim will discuss the basics of real-time streaming, walk through the tools used including Apache NiFi, Apache Kafka and Apache Flink and show how to build a real-time streaming pipeline that sends prompts to LLMs hosted by the likes of Hugging Face, IBM and Cloudera. He will also discuss where real-time data stores like Apache Pinot come into play. He will show a detailed demonstration of a few use cases involving different sources of data including Kafka, Medium Articles and interactive Question and Response in Slack. He will then show you how you can build your own and where areas of growth exist. Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. https://github.com/tspannhw/SpeakerProfile Apache Pinot and Kafka an excellent pairing for refined palates | Tim Veil The other Tim, Tim Veil, will dive into the history and architect

SFSCON23 - Roberto Innocenti - From the design to reality is here the Communi...

The Open Hardware PowerPC Notebook designed around GNU/Linux will be showed at NOI Techpark. We had presented here its motherboard design in 2018. We will updates regarding last developments for u-boot AMD video drivers, re-design of heat pipes, and CE test certification process. We will give future availability milestones of this notebook and details regarding the GNU/Linux distributions or other OS that could runs on it.

Databricks Meetup @ Los Angeles Apache Spark User Group

Paco Nathan

Confluent Partner Tech Talk with Synthesis

Introduction of eBPF - 時下最夯的Linux Technology

Jace Liang

@ 2020/04/20 SDN x Cloud Native Meetup #27 隨著CNCF將Falco納入incubator project，eBPF這藏於Linux核心內的技術也開始受到矚目。 eBPF，一個從1992年就出現的技術，一路走來經過了甚麼樣的變化? 對於你我目前，或未來的工作又會有甚麼影響呢? 本次分享將會介紹eBPF的前世今生，帶各位了解何謂eBPF。並透過實際範例演示eBPF工具的特殊用法。

More from confluent

Speed Wins: From Kafka to APIs in Minutes

Evolving Data Governance for the Real-time Streaming and AI Era

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...

Santander Stream Processing with Apache Flink

Unlocking the Power of IoT: A comprehensive approach to real-time insights

Workshop híbrido: Stream Processing con Flink

El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real. Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real. En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.

Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...

Our talk will explore the transformative impact of integrating Confluent, HiveMQ, and SparkPlug in Industry 4.0, emphasizing the creation of a Unified Namespace. In addition to the creation of a Unified Namespace, our webinar will also delve into Stream Governance and Scaling, highlighting how these aspects are crucial for managing complex data flows and ensuring robust, scalable IIoT-Platforms. You will learn how to ensure data accuracy and reliability, expand your data processing capabilities, and optimize your data management processes. Don't miss out on this opportunity to learn from industry experts and take your business to the next level.

AWS Immersion Day Mapfre - Confluent

La arquitectura impulsada por eventos (EDA) será el corazón del ecosistema de MAPFRE. Para seguir siendo competitivas, las empresas de hoy dependen cada vez más del análisis de datos en tiempo real, lo que les permite obtener información y tiempos de respuesta más rápidos. Los negocios con datos en tiempo real consisten en tomar conciencia de la situación, detectar y responder a lo que está sucediendo en el mundo ahora.

Eventos y Microservicios - Santander TechTalk

Q&A with Confluent Experts: Navigating Networking in Confluent Cloud

Citi TechTalk Session 2: Kafka Deep Dive

Build real-time streaming data pipelines to AWS with Confluent

Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.

Q&A with Confluent Professional Services: Confluent Service Mesh

Citi Tech Talk: Event Driven Kafka Microservices

Confluent & GSI Webinars series - Session 3

An in depth look at how Confluent is being used in the financial services industry. Gain an understanding of how organisations are utilising data in motion to solve common problems and gain benefits from their real time data capabilities. It will look more deeply into some specific use cases and show how Confluent technology is used to manage costs and mitigate risks. This session is aimed at Solutions Architects, Sales Engineers and Pre Sales, and also the more technically minded business aligned people. Whilst this is not a deeply technical session, a level of knowledge around Kafka would be helpful.

Citi Tech Talk: Messaging Modernization

Transforming applications built with traditional messaging solutions such as TIBCO, MQ and Solace to be scalable, reliable and ready for the move to cloud How can applications built with traditional messaging technologies like TIBCO, Solace and IBM MQ be modernised and be made cloud ready? What are the advantages to Event Streaming approaches to pub/sub vs traditional message queues? What are the strengeths and weaknesses of both approaches, and what use cases and requirements are actually a better fit for messaging than Kafka?

Citi Tech Talk: Data Governance for streaming and real time data

Confluent & GSI Webinars series: Session 2

Data In Motion Paris 2023

Vous apprendrez également à : • Créer plus rapidement des produits et fonctionnalités à l’aide d’une suite complète de connecteurs et d’outils de gestion des flux, et à connecter vos environnements à des pipelines de données • Protéger vos données et charges de travail les plus critiques grâce à des garanties intégrées en matière de sécurité, de gouvernance et de résilience • Déployer Kafka à grande échelle en quelques minutes tout en réduisant les coûts et la charge opérationnelle associés

The Future of Application Development - API Days - Melbourne 2023