Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processing with Apache Kafka®

Watch the webcast here: https://videos.confluent.io/watch/P5up2YQX9QdVMhmYfsXy7Q Speaker: Brett Randall

Technology

Better Together:
Westpac Bank and Confluent
Tech Talk Series

Schedule
Tech Talks Date/Time
TT#1 Dive into Apache Kafka® June 4th (Thursday)
10:30am - 11:30am AEST
TT#2 Introduction to Streaming Data and Stream Processing with
Apache Kafka
July 2nd (Thursday)
10:30am - 11:30am AEST
TT#3 Confluent Schema Registry August 6th (Thursday)
10:30am - 11:30am AEST
TT#4 Kafka Connect September 3rd (Thursday)
10:30am - 11:30am AEST
TT#5 Avoiding Pitfalls with Large-Scale Kafka Deployments October 1st (Thursday)
10:30am - 11:30am AEST

Tech Talk #2
Introduction to Streaming Data and Stream
Processing with Apache Kafka

Confluent Platform
Apache Kafka
Unrestricted
Developer Productivity
Multi-language Development
Non-Java clients | REST Proxy
Rich Pre-built Ecosystem
Connectors | Hub | Schema Registry
SQL-based Stream Processing
KSQL (ksqlDB)
Efficient Operations
at Scale
GUI-driven Mgmt & Monitoring
Control Center
Flexible DevOps Automation
Operator | Ansible
Dynamic Performance & Elasticity
Auto Data Balancer | Tiered Storage
Production-stage
Prerequisites
Enterprise-grade Security
RBAC | Secrets | Audit logs
Data Compatibility
Schema Registry | Schema Validation
Global Resilience
Multi-Region Clusters | Replicator
ARCHITECT
Self Managed Software Freedom of Choice
Fully Managed Cloud Service
Enterprise Support | Professional Services Committer-driven Expertise Training |
Partners
OPERATORDEVELOPER
Open Source | Community licensed

69
Free E-Books from Confluent!
I Heart Logs: https://www.confluent.io/ebook/i-heart-logs-event-data-stream-processing-and-data-integration/
Kafka: The Definitive Guide: https://www.confluent.io/resources/kafka-the-definitive-guide/
Confluent Blog: https://www.confluent.io/blog
Join us on Slack: http://cnfl.io/slack
Resources

Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processing with Apache Kafka®

Using Kafka to stream data into TigerGraph, a distributed graph database, is a common pattern in our customers’ data architecture. In the TigerGraph database, Kafka Connect framework was used to build the native S3 data loader. In TigerGraph Cloud, we will be building native integration with many data sources such as Azure Blob Storage and Google Cloud Storage using Kafka as an integrated component for the Cloud Portal. In this session, we will be discussing both architectures: 1. built-in Kafka Connect framework within TigerGraph database; 2. using Kafka cluster for cloud native integration with other popular data sources. Demo will be provided for both data streaming processes.

Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...

If you want to build an ecosystem of streaming data to your Kafka platform, you will need a much easier way for your developer to quickly move what’s on the source to your cluster. Better yet, making the connector serverless so it would NOT waste any resources for being idle, and having a trusted partner manage your Kafka infrastructure for you. In this session, we will show you how easy we have made streaming data with great user experience. Flexible resource management with our new secret weapon in the Apache Camel project -- Kamelet. We’ll also demonstrate how Red Hat OpenShift Streams for Apache Kafka simplifies the provisioning of Kafka deployments in a public cloud, managing the cluster,topics, and configuring secure access to the Kafka cluster for your developers.

ksqlDB Workshop

Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...

Apache Kafka is critical to PayPal's analytics platform. It handles a stream of over 20 billion events per day across 300 partitions. To democratize access to analytics data, PayPal built a Connect platform leveraging Kafka to process and send data in real-time to tools of customers' choice. The platform scales to process over 40 billion events daily using reactive architectures with Akka and Alpakka Kafka connectors to consume and publish events within Akka streams. Some challenges include throughput limited by partitions and issues requiring tuning for optimal performance.

Cloud native Kafka | Sascha Holtbruegge and Margaretha Erber, HiveMQ

Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result. In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.

Testing Event Driven Architectures: How to Broker the Complexity | Frank Kilc...

This document discusses testing event-driven architectures. It begins by defining common event-driven architecture patterns like event notifications and event sourcing. It then discusses brokering the complexity of event-driven architectures by describing how events are communicated between producers and consumers via channels. The document outlines what information should be included in events like payloads and headers. It also discusses the difference between orchestration and choreography in event-driven systems. It provides an example of how events can be used to mediate changes within a system using order validation. Finally, it demonstrates how to test event-driven architectures using specifications and discusses accelerating API quality through testing tools that support multiple protocols and definitions.

Achieving end-to-end visibility into complex event-sourcing transactions usin...

Event-sourcing systems usage like Kafka is growing rapidly among Node.js applications. Building systems around an event-driven architecture simplifies horizontal scalability in distributed computing models and makes them more resilient to failure. With these advantages, we face new challenges - how to get visibility into these complex processes. Event-driven architecture is async by nature. Tracking the communication between different components is both extremely difficult and important when debugging or figuring out bottlenecks in the system. In this talk, I will present ways to achieve end-to-end and granular visibility into complex event-sourcing transactions using distributed tracing. I will use open-source tools like OpenTelemetry, Jaeger, and Zipkin to showcase a complex Node.js system using Kafka.

At Wells-Fargo, we move 150 TB of logs data from our syslogs to Splunk forwarders that get indexed and organized for analytic queries. As we modernize and migrate our applications to our hybrid cloud the performance expectations for this infrastructure will proportionately increase. Those improvements include the resilience of the end to end infrastructure. First, we decoupled the applications from their logging interface through a loglibrary which split the streams of logs from their sources to KAFKA which routed them to two separate destinations Splunk and ELK respectively. We also used prometheus and grafana for monitoring the metrics. We also deployed KAFKA, Splunk, ELK, Prometheus and Grafana on the Kubernetes clusters. Confluent had released a version of KAFKA without Zookeeper and replaced its functionality with Quorum Controller. The Quorum-Controller version exhibited better disposability one of the 12factors that's important for Cloud-Nativeness. We packaged this version into a Kubernetes operator called Keda and deployed this for auto-scaling. We tested this to simulate the amount of logdata that we typically generate in production. Based on the above we have also implemented distributed tracing and help make it just as resilient. We will share our lessons learnt, the patterns and practices to modernize both our underlying runtime platforms and our applications with highly performing and resilient event-driven architectures.

Kafka for Real-Time Event Processing in Serverless Environments

(Jeff Sharpe + Alex Srisuwan, Capital One) Kafka Summit SF 2018 Using Kafka as a platform messaging bus is common, but bridging communication between real-time and asynchronous components can become complicated, especially when dealing with serverless environments. This has become increasingly common in modern banking where events need to be processed at near-real-time speed. Serverless environments are well-suited to address these needs, and Kafka remains an excellent solution for providing the reliable, resilient communication layer between serverless components and dedicated stream processing services. In this talk, we will examine some of the strengths and weaknesses of using Kafka for real-time communication, some tips for efficient interactions with Kafka and AWS Lambda, and a number of useful patterns for maximizing the strengths of Kafka and serverless components.

The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven

When moving to a cloud native architecture Moogsoft knew they needed more scale than Rabbit could provide. Moogsoft moved into Kafka which is known for quick writing and driving heavy event driven workloads on top of niceties such as replayability. Choosing the tool was easy, finding a vendor that ticked all their boxes was not. They needed to ensure scalability, upgradability, builds via existing IAC pipelines, and observability via existing tools. When Moogsoft found Aiven, they were impressed with their offering and ability to scale on demand. During this presentation we will explore how Moogsoft used Aiven for Kafka to manage and scale their data in the cloud.

Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...

Being a pioneer in the interactive gaming industry, SONY PlayStation has played a vital role in implementing technological advancements thus help bringing global video gaming community together. With the recent launch of next generation console PS-5 into the market by partnering with thousands of game developers and millions of video gamers across the globe, humongous volumes of data generation in playstation servers is quite inevitable. This presentation talks about how we leveraged big data technologies along with Apache Kafka to solve some of the realtime data analytical problems. Two important case studies we carryout recently are: ""Competitive pricing analysis of game titles across online video game marketplaces"" & ""understand the gamers sentiment by streaming data from social feeds and perform NLP"" Along with Apache Kafka, the technologies that we have used to architect the solution are: REST API, ZooKeeper, D3.js visualization, DoMo, Python, SQL, NLP, AWS Cloud & JSON.

Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...

Watch this talk here: https://www.confluent.io/online-talks/using-apache-kafka-to-optimize-real-time-analytics-financial-services-iot-applications When it comes to the fast-paced nature of capital markets and IoT, the ability to analyze data in real time is critical to gaining an edge. It’s not just about the quantity of data you can analyze at once, it’s about the speed, scale, and quality of the data you have at your fingertips. Modern streaming data technologies like Apache Kafka and the broader Confluent platform can help detect opportunities and threats in real time. They can improve profitability, yield, and performance. Combining Kafka with Panopticon visual analytics provides a powerful foundation for optimizing your operations. Use cases in capital markets include transaction cost analysis (TCA), risk monitoring, surveillance of trading and trader activity, compliance, and optimizing profitability of electronic trading operations. Use cases in IoT include monitoring manufacturing processes, logistics, and connected vehicle telemetry and geospatial data. This online talk will include in depth practical demonstrations of how Confluent and Panopticon together support several key applications. You will learn: -Why Apache Kafka is widely used to improve performance of complex operational systems -How Confluent and Panopticon open new opportunities to analyze operational data in real time -How to quickly identify and react immediately to fast-emerging trends, clusters, and anomalies -How to scale data ingestion and data processing -Build new analytics dashboards in minutes

Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...

Event streaming allows companies to build more scalable and loosely coupled real-time applications supporting massive concurrency demands and simplifying the construction of services. At the same time, API management provides capabilities to securely control the upstream services consumption, including the event processing infrastructure. This session shows how Kong Konnect Enterprise can complement Kafka Event Streaming, exposing it to new and external consumers while applying specific and critical policies to control its consumption, including API key, OAuth/OIDC and others for authentication, rate limiting, caching, log processing, etc.

Maximize the Business Value of Machine Learning and Data Science with Kafka (...

Today, many companies that have lots of data are still struggling to derive value from machine learning (ML) and data science investments. Why? Accessing the data may be difficult. Or maybe it’s poorly labeled. Or vital context is missing. Or there are questions around data integrity. Or standing up an ML service can be cumbersome and complex. At Nuuly, we offer an innovative clothing rental subscription model and are continually evolving our ML solutions to gain insight into the behaviors of our unique customer base as well as provide personalized services. In this session, I’ll share how we used event streaming with Apache Kafka® and Confluent Cloud to address many of the challenges that may be keeping your organization from maximizing the business value of machine learning and data science. First, you’ll see how we ensure that every customer interaction and its business context is collected. Next, I’ll explain how we can replay entire interaction histories using Kafka as a transport layer as well as a persistence layer and a business application processing layer. Order management, inventory management, logistics, subscription management – all of it integrates with Kafka as the common backbone. These data streams enable Nuuly to rapidly prototype and deploy dynamic ML models to support various domains, including pricing, recommendations, product similarity, and warehouse optimization. Join us and learn how Kafka can help improve machine learning and data science initiatives that may not be delivered to their full potential.

Matching the Scale at Tinder with Kafka

(Krunal Vora, Tinder) Kafka Summit San Francisco 2018 At Tinder, we have been using Kafka for streaming and processing events, data science processes and many other integral jobs. Forming the core of the pipeline at Tinder, Kafka has been accepted as the pragmatic solution to match the ever increasing scale of users, events and backend jobs. We, at Tinder, are investing time and effort to optimize the usage of Kafka solving the problems we face in the dating apps context. Kafka forms the backbone for the plans of the company to sustain performance through envisioned scale as the company starts to grow in unexplored markets. Come, learn about the implementation of Kafka at Tinder and how Kafka has helped solve the use cases for dating apps. Engage in the success story behind the business case of Kafka at Tinder.

Benefits of Stream Processing and Apache Kafka Use Cases

Watch this talk here: https://www.confluent.io/online-talks/benefits-of-stream-processing-and-apache-kafka-use-cases-on-demand This talk explains how companies are using event-driven architecture to transform their business and how Apache Kafka serves as the foundation for streaming data applications. Learn how major players in the market are using Kafka in a wide range of use cases such as microservices, IoT and edge computing, core banking and fraud detection, cyber data collection and dissemination, ESB replacement, data pipelining, ecommerce, mainframe offloading and more. Also discussed in this talk are the differences between Apache Kafka and Confluent Platform. This session is part 1 of 4 in our Fundamentals for Apache Kafka series.

Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...

Studying the ""how"" of Kafka makes you better at using Kafka, but studying its ""whys"" makes you better at so much more. In looking at the tradeoffs behind a system like Kafka, we learn to reason more clearly about distributed systems and to make high-stakes technology adoption decisions more effectively. These are skills we all want to improve! In this talk, we'll examine trade-offs on which our favorite distributed messaging system takes opinionated positions: - Whether to store data contiguously or using an index - How many storage tiers are best? - Where should metadata live? - And more. It's always useful to dissect a modern distributed system with the goal of understanding it better, and it's even better to learn to deeper architectural principles in the process. Come to this talk for a generous helping of both.

Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...

In this talk, we'll discuss how VillageMD is able to use Kafka topic compaction for rapidly scaling our reprocessing pipelines to encompass hundreds of feeds. Within healthcare data ecosystems, privacy and data minimalism are key design priorities. Being able to handle data deletion in a reliable, timely manner within event-driven architectures is becoming more and more necessary with key governance frameworks like the GDPR and HIPAA. We'll be giving an overview of the building and governance of dead-letter queues for streaming data processing. We'll discuss: 1. How to architect a data sink for failed records. 2. How topic compaction can reduce duplicate data and enable idempotency. 3. Building a tombstoning system for removing successfully reprocessed records from the queues. 4. Considerations for monitoring a reprocessing system in production -- what metrics, dataops, and SLAs are useful?

Deep Dive Series #3: Schema Validation + Structured Audit Logs

Eine weitere neue sicherheitsrelevante Funktion in Confluent Platform 5.4 sind Structured Audit Logs. Jetzt ist natürlich alles in Kafka ein Log, aber Kafka protokolliert nicht, was Kafka mit Kafka macht - nur das, was in einen Topics geschrieben wird. Im dritten Teil der Deep Dive Sessions besprechen wir neben den Structured Audit Logs außerdem die "Weiterentwicklung" der bereits bekannten Schema Registry: Die Schema Validation agiert auf dem Topic-Level und stellt sicher, dass jede einzelne Message, die zu einem bestimmten Topic erstellt wird in der Schema Registry überprüft wird. Mehr dazu erklären wir in unserem Deep Dive #3.

What is Apache Kafka®?

Viktor Gamov, Confluent, Developer Advocate Apache Kafka is an open source distributed streaming platform that allows you to build applications and process events as they occur. Viktor Gamov (developer Advocate at Confluent) walks through how it works and important underlying concepts. As a real-time, scalable, and durable system, Kafka can be used for fault-tolerant storage as well as for other use cases, such as stream processing, centralized data management, metrics, log aggregation, event sourcing, and more. This talk will explain what a streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. https://www.meetup.com/Chennai-Kafka/events/269942117/

Developing custom transformation in the Kafka connect to minimize data redund...

Compacted topics grow over time and are often utilizing high performance, low latency and relatively expensive storage solutions. Reducing duplicated data plays a critical role in the size of compacted topics. with less data on the topics, the Kafka cluster consumes less disk space which in turn it leads to lower operation cost. in this use case-driven talk, we are going to demonstrate how our team at UnitedHealth Group leveraged existing transformers to extract data from the message metadata in the topic as well as how we developed our customized transformers to minimize the amount of duplicated data in each message in the topic.

5 lessons learned for successful migration to Confluent cloud | Natan Silinit...

Confluent Cloud makes Devops engineers lives a lot more easier. Yet moving 1500 microservices, 10K topics and 100K partitions to a multi-cluster Confluent cloud can be a challenge. In this talk you will hear about 5 lessons that Wix has learned in order to successfully meet this challenge. These lessons include: 1. Automation, Automation, Automation - all the process has to be completely automated at such scale 2. Prefer a gradual approach - E.g. migrate topics in small chunks and not all at once. Reduces risks if things go bad 3. Cleanup first - avoid migrating unused topics or topics with too many unnecessary partitions

SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®

Watch this talk here: https://www.confluent.io/online-talks/siem-modernization-build-a-situationally-aware-organization-with-apache-kafka Of all security breaches, 85% are conducted with compromised credentials, often at the administration level or higher. A lot of IT groups think “security” means authentication, authorization and encryption (AAE), but these are often tick-boxes that rarely stop breaches. The internal threat surfaces of data streams or disk drives in a raidset in a data centre are not the threat surface of interest. Cyber or Threat organizations must conduct internal investigations of IT, subcontractors and supply chains without implicating the innocent. Therefore, they are organizationally air-gapped from IT. Some surveys indicate up to 10% of IT is under investigation at any given time. Deploying a signal processing platform, such as Confluent Platform, allows organizations to evaluate data as soon as it becomes available enabling them to assess and mitigate risk before it arises. In Cyber or Threat Intelligence, events can be considered signals, and when analysts are hunting for threat actors, these don't appear as a single needle in a haystack, but as a series of needles. In this paradigm, streams of signals aggregate into signatures. This session shows how various sub-systems in Apache Kafka can be used to aggregate, integrate and attribute these signals into signatures of interest. In this talk you will learn: -The current threat landscape -The difference between Security and Threat Intelligence -The value of Confluent platform as an ideal complement to hardware endpoint detection systems and batch-based SIEM warehouses

What is Apache Kafka and What is an Event Streaming Platform?

Speaker: Gabriel Schenker, Lead Curriculum Developer, Confluent Streaming platforms have emerged as a popular, new trend, but what exactly is a streaming platform? Part messaging system, part Hadoop made fast, part fast ETL and scalable data integration. With Apache Kafka® at the core, event streaming platforms offer an entirely new perspective on managing the flow of data. This talk will explain what an event streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. New developments in this area such as KSQL will also be discussed.

Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...

The document discusses Kafka in the context of cloud platforms and open source communities. It describes several Apache projects that can be used with Kafka, such as Apache NiFi for data collection, Apache Flink for stream processing, and Apache Ranger for security. It also outlines features of Cloudera's platform for managing Kafka deployments, including unified governance tools, monitoring, and services to simplify operations. Finally, it discusses how Kafka can be deployed across cloud, on-premise, and hybrid environments with auto-scaling and other management capabilities.

Operational Analytics on Event Streams in Kafka

Speaker: Anirudh Ramanthan, Product Manager, Rockset Tracking key events and analyzing these event streams are critical to many enterprises. We highlight how organizations are using Apache Kafka® as a fast, reliable event streaming platform alongside Rockset, a serverless search and analytics engine, to create stateful microservices to analyze their event streams. In this talk, we will discuss a stateful microservices architecture, where events from multiple channels are collected and streamed into Kafka and continuously ingested into Rockset with no explicit schema or metadata specification required. Developers then use serverless compute frameworks, like AWS Lambda, in conjunction with serverless data management from Rockset to build microservices to derive insights on the data from Kafka. Organizations can leverage this pattern to support low-latency queries on event streams, providing immediate insight on their business.

Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...

As a 120 year-old company, Nordstrom was facing numerous challenges as a result of an aging, service-oriented, architecture. Developers needing to implement reporting for analytics separately from core functionality resulted in questionable data quality for analytical purposes. Scaling dependent services in harmony to not overwhelm each other was a struggle faced by many, if not most, teams. Several years into a company-wide transition to an event-sourced architecture, Nordstrom has solved these and various other problems. By leveraging the capabilities of Apache Kafka and Confluent, combined with a deep organizational focus on well-defined business event schemas, a singular event can be used for analytical, functional, operational, and model building purposes. This session will describe this architecture and the lessons learned while building it, with a focus on the internally built, multi-tenant, multi-cluster, Kafka-as-a-Service platform that enables it.

What's new in Confluent 3.2 and Apache Kafka 0.10.2

With the introduction of connect and streams API in 2016, Apache Kafka is becoming the defacto solution for anyone looking to build a streaming platform. The community continues to add additional capabilities to make it the complete solution for streaming data. Join us as we review the latest additions in Apache Kafka 0.10.2. In addition, we’ll cover what’s new in Confluent Enterprise 3.2 that makes it possible for running Kafka at scale.

2018 07-11 - kafka integration patterns

Alberto Paro

What's hot

Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...

Kafka for Real-Time Event Processing in Serverless Environments

The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven

Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...

Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...

Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...

Maximize the Business Value of Machine Learning and Data Science with Kafka (...

Matching the Scale at Tinder with Kafka

Benefits of Stream Processing and Apache Kafka Use Cases

Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...

Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...

Deep Dive Series #3: Schema Validation + Structured Audit Logs

What is Apache Kafka®?

Developing custom transformation in the Kafka connect to minimize data redund...

5 lessons learned for successful migration to Confluent cloud | Natan Silinit...

SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®

What is Apache Kafka and What is an Event Streaming Platform?

Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...

Operational Analytics on Event Streams in Kafka

Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...

What's hot (20)

Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...

Kafka for Real-Time Event Processing in Serverless Environments

The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven

Accelerating Innovation with Apache Kafka, Heikki Nousiainen | Heikki Nousiai...

Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...

Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...

Maximize the Business Value of Machine Learning and Data Science with Kafka (...

Matching the Scale at Tinder with Kafka

Benefits of Stream Processing and Apache Kafka Use Cases

Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...

Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...

Deep Dive Series #3: Schema Validation + Structured Audit Logs

What is Apache Kafka®?

Developing custom transformation in the Kafka connect to minimize data redund...

5 lessons learned for successful migration to Confluent cloud | Natan Silinit...

SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®

What is Apache Kafka and What is an Event Streaming Platform?

Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...

Operational Analytics on Event Streams in Kafka

Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...

Similar to Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processing with Apache Kafka®

What's new in Confluent 3.2 and Apache Kafka 0.10.2

2018 07-11 - kafka integration patterns

Alberto Paro

All Day DevOps - FLiP Stack for Cloud Data Lakes

https://www.alldaydevops.com/addo-speakers/timothy-spann Timothy Spann StreamNative MODERN INFRASTRUCTURE SHARE THIS SESSION Session Name: FLiP Stack for Cloud Data Lakes Utilizing an all Apache stack for Rapid Data Lake Population and querying utilizing Apache Flink, Apache Pulsar, and Apache NiFi. We can quickly stream data to and from any datalake, data lake house, lakehouse, database or any datamart regardless of cloud or size. FLiP allows for Java and Python developers to build scalable solutions that span messaging and streaming in cloud native fashion with full monitoring. Speaker Bio: Tim Spann is a Developer Advocate @ StreamNative where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData, and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, Pulsar, NiFi, the blockchain, and Spark.

Netflix Keystone—Cloud scale event processing pipeline

Monal Daxini

[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service

Oracle Korea

오라클 클라우드에서는 카프카를 관리형 서비스로 제공합니다. 밋업 세션에서는 관리형 카프카 서비스의 편의성을 소개하고 카프카 서비스의 데모를 진행합니다. 또한 MSA, 빅데이터 및 Blockchain의 인프라로 카프카가 핵심 위치를 갖는 것 뿐만 아니라 오라클 클라우드의 통합 핵심 컴포넌트로 카프카는 중요한 의미를 갖습니다. 오라클 클라우드의 통합 컴포넌트로 카프카의 역할과 주요 서비스의 구성을 소개합니다. * 본 세션은 “입문자/초급자/중급자” 분들께 두루 적합한 세션입니다.

Apache kafka-a distributed streaming platform

Apache Kafka - A Distributed Streaming Platform

Paolo Castagna

Cloud lunch and learn real-time streaming in azure

ApacheCon 2021 Apache Deep Learning 302

ApacheCon 2021 Apache Deep Learning 302 Tuesday 18:00 UTC Apache Deep Learning 302 Timothy Spann This talk will discuss and show examples of using Apache Hadoop, Apache Kudu, Apache Flink, Apache Hive, Apache MXNet, Apache OpenNLP, Apache NiFi and Apache Spark for deep learning applications. This is the follow up to previous talks on Apache Deep Learning 101 and 201 and 301 at ApacheCon, Dataworks Summit, Strata and other events. As part of this talk, the presenter will walk through using Apache MXNet Pre-Built Models, integrating new open source Deep Learning libraries with Python and Java, as well as running real-time AI streams from edge devices to servers utilizing Apache NiFi and Apache NiFi - MiNiFi. This talk is geared towards Data Engineers interested in the basics of architecting Deep Learning pipelines with open source Apache tools in a Big Data environment. The presenter will also walk through source code examples available in github and run the code live on Apache NiFi and Apache Flink clusters. Tim Spann is a Developer Advocate @ StreamNative where he works with Apache NiFi, Apache Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. * https://github.com/tspannhw/ApacheDeepLearning302/ * https://github.com/tspannhw/nifi-djl-processor * https://github.com/tspannhw/nifi-djlsentimentanalysis-processor * https://github.com/tspannhw/nifi-djlqa-processor * https://www.linkedin.com/pulse/2021-schedule-tim-spann/

[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story

Joan Viladrosa Riera

Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera

Spark Summit

Spark Streaming has supported Kafka since it’s inception, but a lot has changed since those times, both in Spark and Kafka sides, to make this integration more fault-tolerant and reliable.Apache Kafka 0.10 (actually since 0.9) introduced the new Consumer API, built on top of a new group coordination protocol provided by Kafka itself. So a new Spark Streaming integration comes to the playground, with a similar design to the 0.8 Direct DStream approach. However, there are notable differences in usage, and many exciting new features. In this talk, we will cover what are the main differences between this new integration and the previous one (for Kafka 0.8), and why Direct DStreams have replaced Receivers for good. We will also see how to achieve different semantics (at least one, at most one, exactly once) with code examples. Finally, we will briefly introduce the usage of this integration in Billy Mobile to ingest and process the continuous stream of events from our AdNetwork.

Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...

Helena Edelson

MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu

landoop

Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...

Big Data Spain

This document provides an overview of Apache Kafka and Spark Streaming and their integration. It discusses: - What Apache Kafka is and how it works as a publish-subscribe messaging system with topics, partitions, producers, and consumers. - What Apache Spark Streaming is and how it provides streaming data processing using micro-batching and leveraging Spark's APIs and engine. - The evolution of the integration between Kafka and Spark Streaming, from using receivers to the direct approach without receivers in Spark 1.3+. - Details on how to use the new direct Kafka integration in Spark 2.0+ including location strategies, consumer strategies, and committing offsets directly to Kafka. - Considerations around at-least

OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar

OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. If there’s a preferred language that the attendees pick, we will focus only on that one. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics. We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript. After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice. apache pulsar

Monitoring Apache Kafka with Confluent Control Center

Presentation by Nick Dearden, Direct, Product and Engineering, Confluent It’s 3 am. Do you know how your Kafka cluster is doing? With over 150 metrics to think about, operating a Kafka cluster can be daunting, particularly as a deployment grows. Confluent Control Center is the only complete monitoring and administration product for Apache Kafka and is designed specifically for making the Kafka operators life easier. Join Confluent as we cover how Control Center is used to simplify deployment, operability, and ensure message delivery. Watch the recording: https://www.confluent.io/online-talk/monitoring-and-alerting-apache-kafka-with-confluent-control-center/

Apache Kafka - Scalable Message-Processing and more !

Guido Schmutz

ndependent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target. This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...

Trivadis

Apache Kafka - Scalable Message Processing and more!

Guido Schmutz

After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL. Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go. Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.

OSSNA Building Modern Data Streaming Apps

OSSNA Building Modern Data Streaming Apps https://ossna2023.sched.com/event/1Jt05/virtual-building-modern-data-streaming-apps-with-open-source-timothy-spann-streamnative Timothy Spann Cloudera Principal Developer Advocate Data in Motion In my session, I will show you some best practices I have discovered over the last seven years in building data streaming applications, including IoT, CDC, Logs, and more. In my modern approach, we utilize several open-source frameworks to maximize all the best features. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there, we build streaming ETL with Apache Spark and enhance events with Pulsar Functions for ML and enrichment. We make continuous queries against our topics with Flink SQL. We will stream data into various open-source data stores, including Apache Iceberg, Apache Pinot, and others. We use the best streaming tools for the current applications with the open source stack - FLiPN. https://www.flipn.app/ Updates: This will be in-person with live coding based on feedback from the crowd. This will also include new data stores, new sources, and data relevant to and from the Vancouver area. This will also include updates to the platforms and inclusion of Apache Iceberg, Apache Pinot and some other new tech. https://github.com/tspannhw/SpeakerProfile Tim Spann is a Principal Developer Advocate for Cloudera. He works with Apache Kafka, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. Timothy J Spann Cloudera Principal Developer Advocate Hightstown, NJ Websitehttps://datainmotion.dev/

Similar to Westpac Bank Tech Talk 2: Introduction to Streaming Data and Stream Processing with Apache Kafka® (20)

What's new in Confluent 3.2 and Apache Kafka 0.10.2

2018 07-11 - kafka integration patterns

All Day DevOps - FLiP Stack for Cloud Data Lakes

Netflix Keystone—Cloud scale event processing pipeline

[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service

Apache kafka-a distributed streaming platform

Apache Kafka - A Distributed Streaming Platform

Cloud lunch and learn real-time streaming in azure

ApacheCon 2021 Apache Deep Learning 302

[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story

Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera

Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...

MQTT. Kafka. InfluxDB. SQL. IoT Harmony. #tutorial by Stefan Bocutiu

Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...

OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar

Monitoring Apache Kafka with Confluent Control Center

Apache Kafka - Scalable Message-Processing and more !

Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...

Apache Kafka - Scalable Message Processing and more!

OSSNA Building Modern Data Streaming Apps

More from confluent

Building API data products on top of your real-time data infrastructure

This talk and live demonstration will examine how Confluent and Gravitee.io integrate to unlock value from streaming data through API products. You will learn how data owners and API providers can document, secure data products on top of Confluent brokers, including schema validation, topic routing and message filtering. You will also see how data and API consumers can discover and subscribe to products in a developer portal, as well as how they can integrate with Confluent topics through protocols like REST, Websockets, Server-sent Events and Webhooks. Whether you want to monetize your real-time data, enable new integrations with partners, or provide self-service access to topics through various protocols, this webinar is for you!

Speed Wins: From Kafka to APIs in Minutes

Evolving Data Governance for the Real-time Streaming and AI Era

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...

Santander Stream Processing with Apache Flink

Unlocking the Power of IoT: A comprehensive approach to real-time insights

Workshop híbrido: Stream Processing con Flink

El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real. Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real. En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.

Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...

Our talk will explore the transformative impact of integrating Confluent, HiveMQ, and SparkPlug in Industry 4.0, emphasizing the creation of a Unified Namespace. In addition to the creation of a Unified Namespace, our webinar will also delve into Stream Governance and Scaling, highlighting how these aspects are crucial for managing complex data flows and ensuring robust, scalable IIoT-Platforms. You will learn how to ensure data accuracy and reliability, expand your data processing capabilities, and optimize your data management processes. Don't miss out on this opportunity to learn from industry experts and take your business to the next level.

AWS Immersion Day Mapfre - Confluent

La arquitectura impulsada por eventos (EDA) será el corazón del ecosistema de MAPFRE. Para seguir siendo competitivas, las empresas de hoy dependen cada vez más del análisis de datos en tiempo real, lo que les permite obtener información y tiempos de respuesta más rápidos. Los negocios con datos en tiempo real consisten en tomar conciencia de la situación, detectar y responder a lo que está sucediendo en el mundo ahora.

Eventos y Microservicios - Santander TechTalk

Q&A with Confluent Experts: Navigating Networking in Confluent Cloud

This document discusses networking options and best practices for Confluent Cloud. It provides an overview of public endpoints, private link, and peering options. It then discusses best practices for private networking architectures on Azure using hub-and-spoke and private link designs. Finally, it addresses networking considerations and challenges for Kafka Connect managed connectors, as well as planned enhancements for DNS peering and outbound private link support.

Citi TechTalk Session 2: Kafka Deep Dive

Build real-time streaming data pipelines to AWS with Confluent

Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.

Q&A with Confluent Professional Services: Confluent Service Mesh

Citi Tech Talk: Event Driven Kafka Microservices

Confluent & GSI Webinars series - Session 3

An in depth look at how Confluent is being used in the financial services industry. Gain an understanding of how organisations are utilising data in motion to solve common problems and gain benefits from their real time data capabilities. It will look more deeply into some specific use cases and show how Confluent technology is used to manage costs and mitigate risks. This session is aimed at Solutions Architects, Sales Engineers and Pre Sales, and also the more technically minded business aligned people. Whilst this is not a deeply technical session, a level of knowledge around Kafka would be helpful.

Citi Tech Talk: Messaging Modernization

This document discusses moving to an event-driven architecture using Confluent. It begins by outlining some of the limitations of traditional messaging middleware approaches. Confluent provides benefits like stream processing, persistence, scalability and reliability while avoiding issues like lack of structure, slow consumers, and technical debt. The document then discusses how Confluent can help modernize architectures, enable new real-time use cases, and reduce costs through migration. It provides examples of how companies like Advance Auto Parts and Nord/LB have benefitted from implementing Confluent platforms.

Citi Tech Talk: Data Governance for streaming and real time data

Confluent & GSI Webinars series: Session 2

Data In Motion Paris 2023