Neha Narkhede talks about the experience at LinkedIn moving from batch-oriented ETL to real-time streams using Apache Kafka and how the design and implementation of Kafka was driven by this goal of acting as a real-time platform for event data. She covers some of the challenges of scaling Kafka to hundreds of billions of events per day at Linkedin, supporting thousands of engineers, etc.
Data Pipelines Made Simple with Apache Kafkaconfluent
Presentation by Ewen Cheslack-Postava, Engineer, Apache Kafka Committer, Confluent
In streaming workloads, often times data produced at the source is not useful down the pipeline or it requires some transformation to get it into usable shape. Similarly, where sensitive data is concerned, filtering of topics is helpful to ensure that the wrong data doesn't get to the wrong place.
The newest release of Apache Kafka now offers the ability to do transformations on individual messages, making is possible to implement finer grained transformations customized to your unique needs. In this session we’ll talk about the new single message transform capabilities, how to use them to implement things like data masking and advanced partitioning, and when you’ll need to use more complex tools like the Kafka Streams API instead.
Un'introduzione ad Apache Kafka e Kafka Connect APIs (part of Apache Kafka), in particolare come Kafka possa essere usato assieme ad Elasticsearch.
Grazie a Seacom per averci invitato all'evento a Roma.
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelinesconfluent
ETL can be painful with dirty data and outdated batch processes slowing you down; there has to be a better way. In this talk we’ll discuss the benefits of introducing a streaming platform to your architecture including how it can greatly simplify complexity, speed up performance, and help your team deliver the features they need with real-time data integration.
Pandora’s Lawrence Weikum will discuss what they’ve done to bring real-time data integration to the team. We’ll review their Kafka-powered data pipelines and how they make the most of Kafka’s Connect API to make it surprisingly system to keep systems in sync.
Presented by:
Lawrence Weikum, Senior Software Engineer, Pandora
Gehrig Kunz, Technical Product Marketing Manager, Confluent
Presentation by Gwen Shapira, Product Manager, Confluent.
With the rapid increase of Apache Kafka use within organizations, issues of data governance and data quality take center stage. When more and more disparate departments and teams depend on the data in Apache Kafka, it’s important to provide a way to make sure "bad data" does not make its way into critical topics. Every organization that uses Kafka at large scale realize they need a way to deliver these guarantees.
In this talk, Kafka committer, Gwen Shapira will review the benefits of a schema registry for large-scale Kafka deployments and will give high-level overview of how the Confluent schema registry is being used in enterprise architectures across industry.
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...HostedbyConfluent
SIEM platforms are essential to the new cybersecurity paradigm and data collection layer is a very important piece of it.
When you deliver a new platform, you can easily get lost in a variety of different vendors and solutions, too many challenges are facing. What if I change vendors, will I keep my data? How to feed multiple tools with the same data? How to collect data from custom apps and services? How to pay less for an expensive platform? How to keep data without a huge cost?
Join us if you are looking for the answers. In this session, you will learn how we replaced the vendor-provided data collection layer with kafka connect and the lessons we learnt. After the talk you will know:
- architecture and real-life examples of the flexible and highly available data collection platform
- custom connectors that do most of the work for us and how to extend the connectors to consume new data, we made them open sourced
- easy way to receive data from thousands of servers and many cloud services
- how to archive data at low cost
You will leave armed with a set of free tools and recipes to build a truly vendor-agnostic data collection platform. It will allow you to take you SIEM costs under control. You will feed your analytics tools with what they need and archive the rest at low cost. You will feed your SIEM smart!
Neha Narkhede talks about the experience at LinkedIn moving from batch-oriented ETL to real-time streams using Apache Kafka and how the design and implementation of Kafka was driven by this goal of acting as a real-time platform for event data. She covers some of the challenges of scaling Kafka to hundreds of billions of events per day at Linkedin, supporting thousands of engineers, etc.
Data Pipelines Made Simple with Apache Kafkaconfluent
Presentation by Ewen Cheslack-Postava, Engineer, Apache Kafka Committer, Confluent
In streaming workloads, often times data produced at the source is not useful down the pipeline or it requires some transformation to get it into usable shape. Similarly, where sensitive data is concerned, filtering of topics is helpful to ensure that the wrong data doesn't get to the wrong place.
The newest release of Apache Kafka now offers the ability to do transformations on individual messages, making is possible to implement finer grained transformations customized to your unique needs. In this session we’ll talk about the new single message transform capabilities, how to use them to implement things like data masking and advanced partitioning, and when you’ll need to use more complex tools like the Kafka Streams API instead.
Un'introduzione ad Apache Kafka e Kafka Connect APIs (part of Apache Kafka), in particolare come Kafka possa essere usato assieme ad Elasticsearch.
Grazie a Seacom per averci invitato all'evento a Roma.
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelinesconfluent
ETL can be painful with dirty data and outdated batch processes slowing you down; there has to be a better way. In this talk we’ll discuss the benefits of introducing a streaming platform to your architecture including how it can greatly simplify complexity, speed up performance, and help your team deliver the features they need with real-time data integration.
Pandora’s Lawrence Weikum will discuss what they’ve done to bring real-time data integration to the team. We’ll review their Kafka-powered data pipelines and how they make the most of Kafka’s Connect API to make it surprisingly system to keep systems in sync.
Presented by:
Lawrence Weikum, Senior Software Engineer, Pandora
Gehrig Kunz, Technical Product Marketing Manager, Confluent
Presentation by Gwen Shapira, Product Manager, Confluent.
With the rapid increase of Apache Kafka use within organizations, issues of data governance and data quality take center stage. When more and more disparate departments and teams depend on the data in Apache Kafka, it’s important to provide a way to make sure "bad data" does not make its way into critical topics. Every organization that uses Kafka at large scale realize they need a way to deliver these guarantees.
In this talk, Kafka committer, Gwen Shapira will review the benefits of a schema registry for large-scale Kafka deployments and will give high-level overview of how the Confluent schema registry is being used in enterprise architectures across industry.
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...HostedbyConfluent
SIEM platforms are essential to the new cybersecurity paradigm and data collection layer is a very important piece of it.
When you deliver a new platform, you can easily get lost in a variety of different vendors and solutions, too many challenges are facing. What if I change vendors, will I keep my data? How to feed multiple tools with the same data? How to collect data from custom apps and services? How to pay less for an expensive platform? How to keep data without a huge cost?
Join us if you are looking for the answers. In this session, you will learn how we replaced the vendor-provided data collection layer with kafka connect and the lessons we learnt. After the talk you will know:
- architecture and real-life examples of the flexible and highly available data collection platform
- custom connectors that do most of the work for us and how to extend the connectors to consume new data, we made them open sourced
- easy way to receive data from thousands of servers and many cloud services
- how to archive data at low cost
You will leave armed with a set of free tools and recipes to build a truly vendor-agnostic data collection platform. It will allow you to take you SIEM costs under control. You will feed your analytics tools with what they need and archive the rest at low cost. You will feed your SIEM smart!
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...Lightbend
In this webinar, Engineering Manager at Credit Karma, Dustin Lyons, discusses how not long ago his team was facing a common challenge shared by many financial services architects and engineering leaders: not only how to move from the offline, batch-mode processing of Big Data to streaming, Fast Data, and how to enable real-time decision making based on the information flowing in from over 60 million members.
Dustin reviews how his team migrated away from PHP and successfully implemented Akka Streams with Apache Kafka to ingest, process and route real-time events throughout their data ecosystem. At the end of this presentation, you’ll better understand:
* The design considerations for new Fast Data architectures, from streaming to microservices to real-time analysis.
* Some lessons learned when it comes to progressing from batch to streaming using Akka, Spark and Kafka
* Why Akka’s self-healing actor model and the resilience that it provides is actually what matters most when delivering real-time customer experiences
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...confluent
How Priceline uses Kafka Streams technology to effectively save TBs on daily licenses of our monitoring systems. Kafka Streams powers a big part of our analytics and monitoring pipelines and delivers operational metrics transformations in real time. All logs and operational metrics from all of the APIs of Priceline’s products flow into Kafka and is ingested into our Monitoring System Splunk for Alerting and Monitoring. We have now implemented data transformations, aggregations and summarizations using Kafka Streams technologies to effectively eliminate PCI/PII violations on the log data; do aggregations on metrics to avoid ingesting sub-second metrics and ingest metrics only at the granularity that we need to. We will cover the need for custom Serdes, custom partitioners, and why we don’t use the confluent registry. You will also learn how Priceline uses a self service model to configure its streams, topics and consumers using Data Collection Console, which is our UI for managing the Kafka streaming pipelines.
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
In this workshop we will set up a streaming framework which will process realtime data of traffic sensors installed within the Belgian road system.
Starting with the intake of the data, you will learn best practices and the recommended approach to split the information into events in a way that won't come back to haunt you.
With some basic stream operations (count, filter, ... ) you will get to know the data and experience how easy it is to get things done with Spring Boot & Spring Cloud Stream.
But since simple data processing is not enough to fulfill all your streaming needs, we will also let you experience the power of windows.
After this workshop, tumbling, sliding and session windows hold no more mysteries and you will be a true streaming wizard.
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...confluent
Responding to a global pandemic presents a unique set of technical and public health challenges. The real challenge is the ability to gather data coming in via many data streams in variety of formats influences the real-world outcome and impacts everyone. The Centers for Disease Control and Prevention CELR (COVID Electronic Lab Reporting) program was established to rapidly aggregate, validate, transform, and distribute laboratory testing data submitted by public health departments and other partners. Confluent Kafka with KStreams and Connect play a critical role in program objectives to:
o Track the threat of COVID-19 virus
o Provide comprehensive data for local, state, and federal response
o Better understand locations with an increase in incidence
Agile Data Integration: How is it possible?confluent
In this talk, we are going to tell you the story of building the Connection Platform (CoPa). This is an endeavor undertaken at Generali Switzerland over the course of the last year, in a collaboration with Innovation Process Technology. The goal was to design a general purpose, state of the art integration platform, which covers all integration needs of the enterprise. The central data distribution and integration layer are powered by Confluent Kafka. We will throw a spotlight on three different aspects of this platform that, all in their own right, are essential for agile data integration.
First of all, the platform is hosted on the container platform Redhat Openshift. Everything is set up in flexible Docker containers. Automated pipelines are used to build, provision and deploy everything on the platform from infrastructure to data pipeline
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhedeconfluent
Slides from Neha Narkhede's keynote at the dotScale conference in Paris on April 24th, 2017.
There is a tectonic shift happening in how data powers the core of a company's business. This shift is about the rise of real-time. Apache Kafka was built with the vision to help companies navigate this change and become the central nervous system that makes data available in real-time to all the applications that need to use it.
This talk is about how you can put Apache Kafka to practice to help your company make this shift to real-time. And how the Connect and Streams API in Apache Kafka capture the entire scope of what it means to put streams into practice.
Monitoring Apache Kafka with Confluent Control Center confluent
Presentation by Nick Dearden, Direct, Product and Engineering, Confluent
It’s 3 am. Do you know how your Kafka cluster is doing?
With over 150 metrics to think about, operating a Kafka cluster can be daunting, particularly as a deployment grows. Confluent Control Center is the only complete monitoring and administration product for Apache Kafka and is designed specifically for making the Kafka operators life easier.
Join Confluent as we cover how Control Center is used to simplify deployment, operability, and ensure message delivery.
Watch the recording: https://www.confluent.io/online-talk/monitoring-and-alerting-apache-kafka-with-confluent-control-center/
Partner Development Guide for Kafka Connectconfluent
This guide is intended to provide useful background to developers implementing Kafka Connect sources and sinks for their data stores. Visit www.confluent.io for more information.
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIconfluent
For many industries the need to group together related events based on a period of activity or inactivity is key. Advertising businesses, content producers are just a few examples of where session windows can be used to better understand user behavior.
While such sessionization has been possible in Apache Kafka up to this point, implementing it has been rather complex and required leveraging low-level APIs. In the most recent release of Kafka, however, new capabilities have been added making session windows much easier to implement.
In this online talk, we’ll introduce the concept of a session window, talk about common use cases, and walk through how Apache Kafka can be used for session-oriented use cases.
Streaming Transformations - Putting the T in Streaming ETLconfluent
Speaker: Nick Dearden, Director of Engineering, Confluent
We’ll discuss how to leverage some of the more advanced transformation capabilities available in both KSQL and Kafka Connect, including how to chain them together into powerful combinations for handling tasks such as data-masking, restructuring and aggregations. Using KSQL, you can deliver the streaming transformation capability easily and quickly.
This is part 3 of 3 in Streaming ETL - The New Data Integration series.
Watch the recording: https://videos.confluent.io/watch/en56Qt3KAdrpQ4JE5EZNHj?.
Using Apache Kafka to Analyze Session Windowsconfluent
Speaker: Michael Noll, Product Manager, Confluent
In this online talk, we’ll introduce the concept of a session window, talk about common use cases, and walk through how Apache Kafka can be used for session-oriented use cases.
Death of the dumb pipes: Using Apache Kafka® for Integration projectsHostedbyConfluent
Guru Sattanathan, Confluent, Senior Solutions Engineer
Enterprise Integration technologies (aka Middleware) are the key enablers when it comes to Real-time data flows or Event Driven Architecture. Starting from real-time payments, e-commerce, travel booking systems, etc, everything is powered by a middleware underneath. It did transform a lot of things but with caveats!
Are ESB’s & MQ’s enough for today’s integration needs? Do you know their technical debts?
If you are someone looking at integrating your applications or an Integration Architect this session is for you. It's time to refresh yourself and see how organizations are building integrations today.
In this session, we will go in this order:
-Recap on Enterprise Integration technologies
-What are the key flaws & What needs improvement?
-What is Apache Kafka?
-Rethinking Integration using Apache Kafka
https://www.meetup.com/KafkaMelbourne/events/280590162/
Leveraging Mainframe Data for Modern Analyticsconfluent
“The mainframe is going away” is as true now as it was 10, 20 and 30 years ago. Mainframes are still crucial in handling critical business transactions, they were however built for an era where batch data movement was the norm and can be difficult to integrate into today’s data-driven, real-time, analytics-focused business processes as well as the environments that support them. Until now.
Join experts from Confluent, Attunity, and Capgemini for a one-hour online talk session where you’ll learn how to:
Unlock your mainframe data with unique change data capture (CDC) functionality without incurring the complexity and expense that come with sending ongoing queries into the mainframe database
How using CDC benefits advanced analytics approaches such as deep machine learning and predictive analytics
Deliver ongoing streams of data in real-time to the most demanding analytics environments
Ensure that your analytics environment includes the broadest possible range of data sources and destinations while ensuring true enterprise-grade functionality
Identify use cases that can help you get started delivering value to the business moving from POC to Pilot to Production
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
Speakers: Ravi Dubey, Senior Manager, Software Engineering, Capital One + Jeff Sharpe, Software Engineer, Capital One
Capital One supports interactions with real-time streaming transactional data using Apache Kafka®. Kafka helps deliver information to internal operation teams and bank tellers to assist with assessing risk and protect customers in a myriad of ways.
Inside the bank, Kafka allows Capital One to build a real-time system that takes advantage of modern data and cloud technologies without exposing customers to unnecessary data breaches, or violating privacy regulations. These examples demonstrate how a streaming platform enables Capital One to act on their visions faster and in a more scalable way through the Kafka solution, helping establish Capital One as an innovator in the banking space.
Join us for this online talk on lessons learned, best practices and technical patterns of Capital One’s deployment of Apache Kafka.
-Find out how Kafka delivers on a 5-second service-level agreement (SLA) for inside branch tellers.
-Learn how to combine and host data in-memory and prevent personally identifiable information (PII) violations of in-flight transactions.
-Understand how Capital One manages Kafka Docker containers using Kubernetes.
Watch the recording: https://videos.confluent.io/watch/6e6ukQNnmASwkf9Gkdhh69?.
Confluent building a real-time streaming platform using kafka streams and k...Thomas Alex
Jeremy Custenborder from Confluent talked about how Kafka brings an event-centric approach to building streaming applications, and how to use Kafka Connect and Kafka Streams to build them.
How Credit Karma Makes Real-Time Decisions For 60 Million Users With Akka Str...Lightbend
In this webinar, Engineering Manager at Credit Karma, Dustin Lyons, discusses how not long ago his team was facing a common challenge shared by many financial services architects and engineering leaders: not only how to move from the offline, batch-mode processing of Big Data to streaming, Fast Data, and how to enable real-time decision making based on the information flowing in from over 60 million members.
Dustin reviews how his team migrated away from PHP and successfully implemented Akka Streams with Apache Kafka to ingest, process and route real-time events throughout their data ecosystem. At the end of this presentation, you’ll better understand:
* The design considerations for new Fast Data architectures, from streaming to microservices to real-time analysis.
* Some lessons learned when it comes to progressing from batch to streaming using Akka, Spark and Kafka
* Why Akka’s self-healing actor model and the resilience that it provides is actually what matters most when delivering real-time customer experiences
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...confluent
How Priceline uses Kafka Streams technology to effectively save TBs on daily licenses of our monitoring systems. Kafka Streams powers a big part of our analytics and monitoring pipelines and delivers operational metrics transformations in real time. All logs and operational metrics from all of the APIs of Priceline’s products flow into Kafka and is ingested into our Monitoring System Splunk for Alerting and Monitoring. We have now implemented data transformations, aggregations and summarizations using Kafka Streams technologies to effectively eliminate PCI/PII violations on the log data; do aggregations on metrics to avoid ingesting sub-second metrics and ingest metrics only at the granularity that we need to. We will cover the need for custom Serdes, custom partitioners, and why we don’t use the confluent registry. You will also learn how Priceline uses a self service model to configure its streams, topics and consumers using Data Collection Console, which is our UI for managing the Kafka streaming pipelines.
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
In this workshop we will set up a streaming framework which will process realtime data of traffic sensors installed within the Belgian road system.
Starting with the intake of the data, you will learn best practices and the recommended approach to split the information into events in a way that won't come back to haunt you.
With some basic stream operations (count, filter, ... ) you will get to know the data and experience how easy it is to get things done with Spring Boot & Spring Cloud Stream.
But since simple data processing is not enough to fulfill all your streaming needs, we will also let you experience the power of windows.
After this workshop, tumbling, sliding and session windows hold no more mysteries and you will be a true streaming wizard.
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...confluent
Responding to a global pandemic presents a unique set of technical and public health challenges. The real challenge is the ability to gather data coming in via many data streams in variety of formats influences the real-world outcome and impacts everyone. The Centers for Disease Control and Prevention CELR (COVID Electronic Lab Reporting) program was established to rapidly aggregate, validate, transform, and distribute laboratory testing data submitted by public health departments and other partners. Confluent Kafka with KStreams and Connect play a critical role in program objectives to:
o Track the threat of COVID-19 virus
o Provide comprehensive data for local, state, and federal response
o Better understand locations with an increase in incidence
Agile Data Integration: How is it possible?confluent
In this talk, we are going to tell you the story of building the Connection Platform (CoPa). This is an endeavor undertaken at Generali Switzerland over the course of the last year, in a collaboration with Innovation Process Technology. The goal was to design a general purpose, state of the art integration platform, which covers all integration needs of the enterprise. The central data distribution and integration layer are powered by Confluent Kafka. We will throw a spotlight on three different aspects of this platform that, all in their own right, are essential for agile data integration.
First of all, the platform is hosted on the container platform Redhat Openshift. Everything is set up in flexible Docker containers. Automated pipelines are used to build, provision and deploy everything on the platform from infrastructure to data pipeline
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhedeconfluent
Slides from Neha Narkhede's keynote at the dotScale conference in Paris on April 24th, 2017.
There is a tectonic shift happening in how data powers the core of a company's business. This shift is about the rise of real-time. Apache Kafka was built with the vision to help companies navigate this change and become the central nervous system that makes data available in real-time to all the applications that need to use it.
This talk is about how you can put Apache Kafka to practice to help your company make this shift to real-time. And how the Connect and Streams API in Apache Kafka capture the entire scope of what it means to put streams into practice.
Monitoring Apache Kafka with Confluent Control Center confluent
Presentation by Nick Dearden, Direct, Product and Engineering, Confluent
It’s 3 am. Do you know how your Kafka cluster is doing?
With over 150 metrics to think about, operating a Kafka cluster can be daunting, particularly as a deployment grows. Confluent Control Center is the only complete monitoring and administration product for Apache Kafka and is designed specifically for making the Kafka operators life easier.
Join Confluent as we cover how Control Center is used to simplify deployment, operability, and ensure message delivery.
Watch the recording: https://www.confluent.io/online-talk/monitoring-and-alerting-apache-kafka-with-confluent-control-center/
Partner Development Guide for Kafka Connectconfluent
This guide is intended to provide useful background to developers implementing Kafka Connect sources and sinks for their data stores. Visit www.confluent.io for more information.
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIconfluent
For many industries the need to group together related events based on a period of activity or inactivity is key. Advertising businesses, content producers are just a few examples of where session windows can be used to better understand user behavior.
While such sessionization has been possible in Apache Kafka up to this point, implementing it has been rather complex and required leveraging low-level APIs. In the most recent release of Kafka, however, new capabilities have been added making session windows much easier to implement.
In this online talk, we’ll introduce the concept of a session window, talk about common use cases, and walk through how Apache Kafka can be used for session-oriented use cases.
Streaming Transformations - Putting the T in Streaming ETLconfluent
Speaker: Nick Dearden, Director of Engineering, Confluent
We’ll discuss how to leverage some of the more advanced transformation capabilities available in both KSQL and Kafka Connect, including how to chain them together into powerful combinations for handling tasks such as data-masking, restructuring and aggregations. Using KSQL, you can deliver the streaming transformation capability easily and quickly.
This is part 3 of 3 in Streaming ETL - The New Data Integration series.
Watch the recording: https://videos.confluent.io/watch/en56Qt3KAdrpQ4JE5EZNHj?.
Using Apache Kafka to Analyze Session Windowsconfluent
Speaker: Michael Noll, Product Manager, Confluent
In this online talk, we’ll introduce the concept of a session window, talk about common use cases, and walk through how Apache Kafka can be used for session-oriented use cases.
Death of the dumb pipes: Using Apache Kafka® for Integration projectsHostedbyConfluent
Guru Sattanathan, Confluent, Senior Solutions Engineer
Enterprise Integration technologies (aka Middleware) are the key enablers when it comes to Real-time data flows or Event Driven Architecture. Starting from real-time payments, e-commerce, travel booking systems, etc, everything is powered by a middleware underneath. It did transform a lot of things but with caveats!
Are ESB’s & MQ’s enough for today’s integration needs? Do you know their technical debts?
If you are someone looking at integrating your applications or an Integration Architect this session is for you. It's time to refresh yourself and see how organizations are building integrations today.
In this session, we will go in this order:
-Recap on Enterprise Integration technologies
-What are the key flaws & What needs improvement?
-What is Apache Kafka?
-Rethinking Integration using Apache Kafka
https://www.meetup.com/KafkaMelbourne/events/280590162/
Leveraging Mainframe Data for Modern Analyticsconfluent
“The mainframe is going away” is as true now as it was 10, 20 and 30 years ago. Mainframes are still crucial in handling critical business transactions, they were however built for an era where batch data movement was the norm and can be difficult to integrate into today’s data-driven, real-time, analytics-focused business processes as well as the environments that support them. Until now.
Join experts from Confluent, Attunity, and Capgemini for a one-hour online talk session where you’ll learn how to:
Unlock your mainframe data with unique change data capture (CDC) functionality without incurring the complexity and expense that come with sending ongoing queries into the mainframe database
How using CDC benefits advanced analytics approaches such as deep machine learning and predictive analytics
Deliver ongoing streams of data in real-time to the most demanding analytics environments
Ensure that your analytics environment includes the broadest possible range of data sources and destinations while ensuring true enterprise-grade functionality
Identify use cases that can help you get started delivering value to the business moving from POC to Pilot to Production
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
Speakers: Ravi Dubey, Senior Manager, Software Engineering, Capital One + Jeff Sharpe, Software Engineer, Capital One
Capital One supports interactions with real-time streaming transactional data using Apache Kafka®. Kafka helps deliver information to internal operation teams and bank tellers to assist with assessing risk and protect customers in a myriad of ways.
Inside the bank, Kafka allows Capital One to build a real-time system that takes advantage of modern data and cloud technologies without exposing customers to unnecessary data breaches, or violating privacy regulations. These examples demonstrate how a streaming platform enables Capital One to act on their visions faster and in a more scalable way through the Kafka solution, helping establish Capital One as an innovator in the banking space.
Join us for this online talk on lessons learned, best practices and technical patterns of Capital One’s deployment of Apache Kafka.
-Find out how Kafka delivers on a 5-second service-level agreement (SLA) for inside branch tellers.
-Learn how to combine and host data in-memory and prevent personally identifiable information (PII) violations of in-flight transactions.
-Understand how Capital One manages Kafka Docker containers using Kubernetes.
Watch the recording: https://videos.confluent.io/watch/6e6ukQNnmASwkf9Gkdhh69?.
Confluent building a real-time streaming platform using kafka streams and k...Thomas Alex
Jeremy Custenborder from Confluent talked about how Kafka brings an event-centric approach to building streaming applications, and how to use Kafka Connect and Kafka Streams to build them.
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
Learn how data in motion is secure within Apache Kafka and the broader Confluent Platform, while data at rest can be secured by solutions like Vormetric Data Security Manager.
Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...DataStax
Apache Kafka is a high throughput messaging system that companies like LinkedIn, Netflix, and AirBnB are adopting to handle massive real-time datasets. These datasets originate from dozens of systems -- from databases like Cassandra, to log files, to application data. And companies often need to adopt just as many tools to integrate that data for processing. This presentation introduces Kafka Connect, Kafka's new tool for scalable, fault-tolerant data import and export. We'll discuss existing tools in the space and how they fall short when applied to real-time data integration at scale. Then we'll explore Kafka Connect's design and how it compares to systems with similar goals, including key design decisions and tradeoffs. Finally, we'll discuss the current support for Cassandra connectors and how they can be combined with other connectors and stream processing frameworks to help you get more out of your data.
About the Speaker
Ewen Cheslack-Postava Engineer, Confluent, Inc.
Ewen Cheslack-Postava is a Kafka committer and engineer at Confluent building a stream data platform based on Apache Kafka to help organizations reliably and robustly capture and leverage all their real-time data.
Paolo Castagna is a Senior Sales Engineer at Confluent. His background is on 'big data' and he has, first hand, saw the shift happening in the industry from batch to stream processing and from big data to fast data. His talk will introduce Kafka Streams and explain why Apache Kafka is a great option and simplification for stream processing.
We share our experience with Apache Kafka for event-driven collaboration in microservices-based architecture. Talk was a part of Meetup: https://www.meetup.com/de-DE/Apache-Kafka-Germany-Munich/events/236402498/
In this presentation we describe the design and implementation of Kafka Connect, Kafka’s new tool for scalable, fault-tolerant data import and export. First we’ll discuss some existing tools in the space and why they fall short when applied to data integration at large scale. Next, we will explore Kafka Connect’s design and how it compares to systems with similar goals, discussing key design decisions that trade off between ease of use for connector developers, operational complexity, and reuse of existing connectors. Finally, we’ll discuss how standardizing on Kafka Connect can ultimately lead to simplifying your entire data pipeline, making ETL into your data warehouse and enabling stream processing applications as simple as adding another Kafka connector.
Getting started with Azure Event Hubs and Stream Analytics servicesVladimir Bychkov
The total amount of data in the world almost doubles every 2 years. Storing data for offline processing is no longer a viable business model. In the past few years, new technologies for real-time data processing emerged. Microsoft Azure offers a comprehensive set of tools to ingest and process data in motion. In this presentation we will go over and learn how to collect data from devices, how to process data in real time using Azure Stream Analytic jobs, and how to produce and handle actionable insights.
Landoop presenting how to simplify your ETL process using Kafka Connect for (E) and (L). Introducing KCQL - the Kafka Connect Query Language & how it can simplify fast-data (ingress & egress) pipelines. How KCQL can be used to set up Kafka Connectors for popular in-memory and analytical systems and live demos with HazelCast, Redis and InfluxDB. How to get started with a fast-data docker kafka development environment. Enhance your existing Cloudera (Hadoop) clusters with fast-data capabilities.
Apache Kafka is a distributed streaming platform that forms a key part of the infrastructure at many companies including Uber, Netflix and LinkedIn. In this talk, Matt gave a technical overview of Apache Kafka, discussed practical use cases of Kafka for IoT data and demonstrated how to ingest data from an MQTT server using Kafka Connect.
Streaming Data Ingest and Processing with Apache KafkaAttunity
Apache™ Kafka is a fast, scalable, durable, and fault-tolerant
publish-subscribe messaging system. It offers higher throughput, reliability and replication. To manage growing data volumes, many companies are leveraging Kafka for streaming data ingest and processing.
Join experts from Confluent, the creators of Apache™ Kafka, and the experts at Attunity, a leader in data integration software, for a live webinar where you will learn how to:
-Realize the value of streaming data ingest with Kafka
-Turn databases into live feeds for streaming ingest and processing
-Accelerate data delivery to enable real-time analytics
-Reduce skill and training requirements for data ingest
The recorded webinar on slide 32 includes a demo using automation software (Attunity Replicate) to stream live changes from a database into Kafka and also includes a Q&A with our experts.
For more information, please go to www.attunity.com/kafka.
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramièreconfluent
During the Confluent Streaming event in Paris, Florent Ramière, Technical Account Manager at Confluent, goes beyond brokers, introducing a whole new ecosystem with Kafka Streams, KSQL, Kafka Connect, Rest proxy, Schema Registry, MirrorMaker, etc.
Data Streaming with Apache Kafka & MongoDBconfluent
Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Data Streaming with Apache Kafka & MongoDB - EMEAAndrew Morgan
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
This webinar explores the use-cases and architecture for Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesKai Wähner
This session introduces Apache Kafka, an event-driven open source streaming platform. Apache Kafka goes far beyond scalable, high volume messaging. In addition, you can leverage Kafka Connect for integration and the Kafka Streams API for building lightweight stream processing microservices in autonomous teams. The Confluent Platform adds further components such as a Schema Registry, REST Proxy, KSQL, Clients for different programming languages and Connectors for different technologies.
The session discusses how tech giants like LinkedIn, Ebay or Airbnb leverage Apache Kafka as event streaming platform to solve various different business problems and how to create a scalable, flexible microservice architecture. A live demo shows how you can easily process and analyze streams of events using Apache Kafka and KSQL.
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureKai Wähner
AWS Data Lake / Lake House + Confluent Cloud for Serverless Apache Kafka. Learn about use cases, architectures, and features.
Data must be continuously collected, processed, and reactively used in applications across the entire enterprise - some in real time, some in batch mode. In other words: As an enterprise becomes increasingly software-defined, it needs a data platform designed primarily for "data in motion" rather than "data at rest."
Apache Kafka is now mainstream when it comes to data in motion! The Kafka API has become the de facto standard for event-driven architectures and event streaming. Unfortunately, the cost of running it yourself is very often too expensive when you add factors like scaling, administration, support, security, creating connectors...and everything else that goes with it. Resources in enterprises are scarce: this applies to both the best team members and the budget.
The cloud - as we all know - offers the perfect solution to such challenges.
Most likely, fully-managed cloud services such as AWS S3, DynamoDB or Redshift are already in use. Now it is time to implement "fully-managed" for Kafka as well - with Confluent Cloud on AWS.
Building a central integration layer that doesn't care where or how much data is coming from.
Implementing scalable data stream processing to gain real-time insights
Leveraging fully managed connectors (like S3, Redshift, Kinesis, MongoDB Atlas & more) to quickly access data
Confluent Cloud in action? Let's show how ao.com made it happen!
Translated with www.DeepL.com/Translator (free version)
Cloud-Native Patterns for Data-Intensive ApplicationsVMware Tanzu
Are you interested in learning how to schedule batch jobs in container runtimes?
Maybe you’re wondering how to apply continuous delivery in practice for data-intensive applications? Perhaps you’re looking for an orchestration tool for data pipelines?
Questions like these are common, so rest assured that you’re not alone.
In this webinar, we’ll cover the recent feature improvements in Spring Cloud Data Flow. More specifically, we’ll discuss data processing use cases and how they simplify the overall orchestration experience in cloud runtimes like Cloud Foundry and Kubernetes.
Please join us and be part of the community discussion!
Presenters :
Sabby Anandan, Product Manager
Mark Pollack, Software Engineer, Pivotal
Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job?
This session explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.
No matter if you think about open source Apache Kafka, a cloud service like Confluent Cloud, or another technology using the Kafka protocol like Redpanda or Pulsar, check out this slide deck.
A detailed article about this topic:
https://www.kai-waehner.de/blog/2022/01/04/when-not-to-use-apache-kafka/
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022HostedbyConfluent
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
Apache Kafka is the de facto standard for data streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job? This session explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.
Time's Up! Getting Value from Big Data NowEric Kavanagh
The Briefing Room with Dr. Robin Bloor and CASK
We all know the promise of big data, but who gets the value? There are plenty of success stories already, and most of them involve one key ingredient: facilitated access to important data sets. Most research studies suggest that the Pareto principle applies: 80 percent goes to data integration, and only 20 to analysis. Inverting that balance is the Holy Grail.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why the time has finally come for turning the tables on the status quo in analytics. He'll be briefed by CASK CEO Jonathan Gray, who will showcase his company's big data integration platform, CDAP, which was specifically designed to expedite time-to-value for big data.
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022HostedbyConfluent
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Modern streaming use cases are generating massive amounts of data - much of it needs to be organized and queried over time. The sheer amount and complexity of this problem presents new challenges for data engineers and developers alike.
To solve this problem Apache Kafka and MongoDB Time Series collections are a powerful combination. In this talk, Kenny Gorman and Elena Cuevas will present how Apache Kafka on Confluent Cloud can stream massive amounts of data to Time Series Collections via the MongoDB Connector for Apache Kafka. Elena and Kenny will discuss the required configuration details and critical components of Confluent Cloud and MongoDB Atlas as well as some tips, tricks and best practices.
You will leave armed with the knowledge of how Confluent Cloud, Apache Kafka, MongoDB Atlas, and Time Series collections fit into your event-driven architecture.
Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. One was is to first persist the data into a data store and then use a traditional data visualisation solution to present the data.
If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualisation tools might already integrate with the specific data store. An other option is to use a Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solution and highlights some of the products available to implement these blueprints.
Similar to Confluent kafka meetupseattle jan2017 (20)
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
1. 11Confidential
State of the Streaming
Platform 2017
What’s new in Apache Kafka and the Confluent Platform
David Tucker, Confluent
2. 44Confidential
The shift to streams
“By 2020, 70% of organizations will adopt data
streaming to enable real-time analytics.”1
1: Gartner: Harness Streaming Data for Real-Time Analytics - Nov 2016
2: Forrester’s 2016 Predictions: Turn Data Into Insight And Action - Nov 2015
“Streaming ingestion and analytics will become
a must-have for digital winners.”2
3. 55Confidential
Vision of a Streaming Enterprise
Search
NewSQL / NoSQL
RDBMS Monitoring
Document StoreReal-time Analytics Data Warehouse
Mobile Apps
Legacy Apps
Hadoop
Streaming Platform
4. 66Confidential
What Can You Do with a Streaming Platform ?
• Publish and Subscribe to streams of data
• Analogous to traditional messaging systems
• Store streams of data
• Consumers can look back in time
• Process streams of data
• Analyze and correlate events in real time
5. 77Confidential
The typical integration architecture
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational Metrics
Hadoop
Data
Warehouse
MySQL Cassandra Oracle
App
Databases
Storage
Interfaces
Monitoring
App
Databases
Storage
Interfaces
6. 88Confidential
Challenges abound
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational Metrics
Hadoop
Data
Warehouse
Espresso Cassandra Oracle
App
Databases
Storage
Interfaces
Monitoring
App
Databases
Storage
Interfaces
Difficult to handle
massive amounts of data
Diverse data sets, arriving
at an increasing rate
Many complex data
pipelines
Require a separate
cluster for real-time
Difficult & time
consuming to change
Require mission critical
availability into most
recent/relevant data
7. 99Confidential
Modernized architecture using Apache Kafka
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle
Hadoop
Streams API
App
Streams API
Monitoring
App
Data
Warehouse
Apache Kafka
8. 1010Confidential
Challenges addressed by a streaming platform
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle
Hadoop
Streams API
App
Streams API
Monitoring
App
Data
Warehouse
Apache Kafka
Rewind data stream to re-
load into any target system
Scale to meet demands
of diverse streams
Pub/sub to data
streams
Lightweight, easy to
modify with minimal
disruption
Decoupled from upstream
apps creating agility Real-time, context specific
data in the moment
9. 1111Confidential
Stream Data is
The Faster the Better
Stream Data can be
Big or Fast (Lambda)
Stream Data will be
Big AND Fast (Kappa)
From Big Data to Stream Data
Apache Kafka is the Enabling Technology of this Transition
Big Data was
The More the Better
ValueofData
Volume of Data
ValueofData
Age of Data
Job 1 Job 2
Streams
Table 1 Table 2
DB
Speed Table Batch Table
DB
Streams Hadoop
10. 1212Confidential
Ingest, Process, Load, and Serve Data at a Global Scale
Data Systeam A
…
Data System B
…
Kafka cluster
Applications
Other data
stores
Kafka cluster
FIX
Raw data / Events
Kafka Streams
(Data Enrichment and Transformation)
Kafka Connect
(Connectors to Extract and Load data)
Confluent
Replicator
Confluent
Replicator
Custom
Replication
Custom
Replication
11. 1313Confidential
Confluent: Enterprise Streaming Platform based on Apache Kafka™
Confluent
Platform
Database
Changes
Log
Events
loT Data
Web
Events
…
CRM
Data
Warehouse
Database
Hadoop
Data
Integration
…
Monitoring
Analytics
Custom Apps
Transformations
Real-time
Applications
…
Apache Open Source Confluent Open Source Confluent Commercial
Confluent Enterprise
Apache Kafka™
Data Compatibility
Monitoring & Administration
Operations
Clients Connectors
Complete
Open
Trusted
Enterprise Grade
13. 1717Confidential
Apache KafkaTM Connect – Streaming Data Capture
JDBC
IRC /
Twitter
MySQL
Elastic
NoSQL
HDFS
Kafka Connect API
Kafka Pipeline
Connector
Connector
Connector
Connector
Connector
Connector
Sources Sinks
Fault tolerant
Manage hundreds of
data sources and sinks
Preserves data schema
Part of Apache Kafka
project
Integrated within
Confluent Platform’s
Control Center
14. 1818Confidential
Apache KafkaTM Connect – Let the framework do the hard work
• Serialization / de-serialization
• Schema Registry integration
• Fault tolerance, automatic fail-over
• Partitioning and scale-out
• … and let the developer focus on domain specific details on copying data
15. 1919Confidential
Kafka Connect Architecture: Logical Model
Connect has three main components: Connectors, Tasks, and Workers
Data flowing into / out of the connectors is a stream; each stream is 1 or more
partitions. In practice, a stream partition could be a database table, a log file, etc.
There may or may not be an exact alignment of streams to Kafka topics.
17. 2121Confidential
Kafka Connect API Library of Connectors
* Denotes Connectors developed at Confluent and distributed by Confluent. Extensive validation and testing has been performed.
Databases
*
Datastore/File Store
*
Analytics
*
Applications / Other
21. 2525Confidential
The Challenge of Data Compatibility at Scale
App 1
App 2
App 3
Many sources without a policy
causes mayhem in a centralized
data pipeline
Ensuring downstream systems
can use the data is key to an
operational stream pipeline
Example: Date formats
Even within a single application,
different formats can be
presented
Incompatibly formatted message
24. 2828Confidential
Architecture of Kafka Streams API, a Part of Apache Kafka
Kafka
Streams API
Producer
Kafka Cluster
Topic TopicTopic
Consumer Consumer
Key benefits
• No additional cluster
• Easy to run as a service
• Supports large aggregations and joins
• Security and permissions fully
integrated from Kafka
Example Use Cases
• Microservices
• Continuous queries
• Continuous transformations
• Event-triggered processes
25. 2929Confidential
Kafka Streams API: the Easiest Way to Process Data in Apache Kafka™
Example Use Cases
• Microservices
• Large-scale continuous queries and transformations
• Event-triggered processes
• Reactive applications
• Customer 360-degree view, fraud detection, location-
based marketing, smart electrical grids, fleet
management, …
Key Benefits of Apache Kafka’s Streams API
• Build Apps, Not Clusters: no additional cluster required
• Elastic, highly-performant, distributed, fault-tolerant,
secure
• Equally viable for small, medium, and large-scale use
cases
• “Run Everywhere”: integrates with your existing
deployment strategies such as containers, automation,
cloud
Your App
Kafka
Streams API
26. 3030Confidential
Architecture Example
Before: Complexity for development and operations, heavy footprint
1 2 3
Capture business
events in Kafka
Must process events with
separate, special-purpose
clusters
Write results
back to Kafka
Your Processing Job
27. 3131Confidential
Architecture Example
With Kafka Streams: App-centric architecture that blends well into your existing infrastructure
1 2 3a
Capture business
events in Kafka
Process events fast, reliably, securely
with standard Java applications
Write results
back to Kafka
Your App
3b
External apps can directly
query the latest results
AppApp
Kafka
Streams API
31. 3636Confidential
Confluent Control Center: Alerting
Alerts
• Configure alerts on incomplete data
delivery, high latency, Kafka connector
status, and more
• Manage alerts for different users and
applications from a web UI
• Manage alerts for different users and
applications from a web UI
User authentication
• Control access to Confluent Control Center
• Integrates with existing enterprise
authentication systems
33. 3838Confidential
Demo Scenario: Multiple Streaming Data Pipelines
• IRC feed of Wikipedia updates
• IRC Source connector publishes real-time stream of Wikipedia updates to Kafka topic
• Kafka Streams application parses records and re-writes to new topic
• Elasticsearch Sink connector indexes parsed data
• Kibana dashboards visualize Wikipedia updates in real time
• Twitter feed augmented with sentiment data
• Twitter Source connector configured to publish data to Kafka topic
• Kafka Streams application strips extraneous twitter fields and adds sentiment score
• Sink connector saves K-Streams output to key-value store (eg Couchbase or DynamoDB)
• Key-value queries can track sentiment trends
35. 4040Confidential
Wikipedia Transformation
• Raw input records
{"createdat":1485386068652,"channel":"#en.wikipedia","sender":{"nick":"rc-pmtpa","login":"~rc-
pmtpa","hostname":"special.user"},"message":"[[List of Iranian
Americans]] https://en.wikipedia.org/w/index.php?diff=761978901&oldid=760575313 *
01:445:4080:1510:F1A4:7C08:B276:FA8B * (+0) /* Media/Journalism */"}
{"createdat":1485386069199,"channel":"#en.wikipedia","sender":{"nick":"rc-pmtpa","login":"~rc-
pmtpa","hostname":"special.user"},"message":"[[In the Bleak
Midwinter]] https://en.wikipedia.org/w/index.php?diff=761978902&oldid=761960970 * Grover cleveland *
(+422) /* Settings */"}
• Parsed records
{"createdat":1485386068652,"wikipage":"List of Iranian
Americans","isnew":false,"isminor":false,"isunpatrolled":false,"isbot":false,"diffurl":"https://en.wikipedia.org/w/i
ndex.php?diff=761978901&oldid=760575313","username":"01:445:4080:1510:F1A4:7C08:B276:FA8B","bytech
ange":0,"commitmessage":"/* Media/Journalism */"}
{"createdat":1485386069199,"wikipage":"In the Bleak
Midwinter","isnew":false,"isminor":false,"isunpatrolled":false,"isbot":false,"diffurl":"https://en.wikipedia.org/w/in
dex.php?diff=761978902&oldid=761960970","username":"Grover
cleveland","bytechange":422,"commitmessage":"/* Settings */"}
36. 4141Confidential
Twitter Transformation
• Raw input records
"CreatedAt": 1479252348000,
"Id": 798668350956126200,
"Text": "Iago Aspas pays tribute to #Spain players for making his international debut “easy” vs
#England… https://t.co/G13NUaZj8W",
"Source": "<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>",
"User": { }
128 separate fields
• Filtered records
{"sentiment":"Negative","sentimentScore":1,"UserName":"tits","CreatedAt":1485387765000,"Text":"RT
@STsportsdesk: Football: Real Madrid eliminated from #CopaDelRey by Celta Vigo
https://t.co/QfCLayqRsH
https://t.co/53GWANPDXj","id":"824402156707049475","UserScreenName":"titusanghongwen"}