This document discusses cloud-native Apache Kafka and Kubernetes integrations. It introduces Apache Kafka as a proven technology for real-time data processing and describes how it is used for digital experiences, microservices applications, streaming ETL, and real-time analytics. It then makes the case for using a cloud-native, managed Kafka platform like Red Hat OpenShift Streams, which provides a reduced complexity solution with a managed Apache Kafka cluster. Finally, it provides an overview of Red Hat OpenShift Streams and demonstrates how to use Kamelets to easily integrate and develop stream-based applications on Kubernetes.
New Features in Confluent Platform 6.0 / Apache Kafka 2.6Kai Wähner
New Features in Confluent Platform 6.0 / Apache Kafka 2.6, including REST Proxy and API, Tiered Storage for AWS S3 and GCP GCS, Cluster Linking (On-Premise, Edge, Hybrid, Multi-Cloud), Self-Balancing Clusters), ksqlDB.
Concepts and Patterns for Streaming Services with KafkaQAware GmbH
Cloud Native Night March 2020, Mainz: Talk by Perry Krol (@perkrol, Confluent)
=== Please download slides if blurred! ===
Abstract: Proven approaches such as service-oriented and event-driven architectures are joined by newer techniques such as microservices, reactive architectures, DevOps, and stream processing. Many of these patterns are successful by themselves, but they provide a more holistic and compelling approach when applied together. In this session Confluent will provide insights how service-based architectures and stream processing tools such as Apache Kafka® can help you build business-critical systems. You will learn why streaming beats request-response based architectures in complex, contemporary use cases, and explain why replayable logs such as Kafka provide a backbone for both service communication and shared datasets.
Based on these principles, we will explore how event collaboration and event sourcing patterns increase safety and recoverability with functional, event-driven approaches, apply patterns including Event Sourcing and CQRS, and how to build multi-team systems with microservices and SOA using patterns such as “inside out databases” and “event streams as a source of truth”.
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Michael Noll
Talk URL: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/77360
Abstract: Would you cross the street with traffic information that’s a minute old? Certainly not. Modern businesses have the same needs nowadays, whether it’s due to competitive pressure or because their customers have much higher expectations of how they want to interact with a product or service. At the heart of this movement are events: in today’s digital age, events are everywhere. Every digital action—across online purchases to ride-sharing requests to bank deposits—creates a set of events around transaction amount, transaction time, user location, account balance, and much more. The technology that allows businesses to read, write, store, and compute and process these events in real-time are event-streaming platforms, and tens of thousands of companies like Netflix, Audi, PayPal, Airbnb, Uber, and Pinterest have picked Apache Kafka as the de facto choice to implement event-driven architectures and reshape their industries.
Michael Noll explores why and how you can use Apache Kafka and its growing ecosystem to build event-driven architectures that are elastic, scalable, robust, and fault tolerant, whether it’s on-premises, in the cloud, on bare metal machines, or in Kubernetes with Docker containers. Specifically, you’ll look at Kafka as the storage and publish and subscribe layer; Kafka’s Connect framework for integrating external data systems such as MySQL, Elastic, or S3 with Kafka; and Kafka’s Streams API and KSQL as the compute layer to implement event-driven applications and microservices in Java and Scala and streaming SQL, respectively, that process the events flowing through Kafka in real time. Michael provides an overview of the most relevant functionality, both current and upcoming, and shares best practices and typical use cases so you can tie it all together for your own needs.
Event streaming: A paradigm shift in enterprise software architectureSina Sojoodi
This talk helps developers and architects understand the benefits, opportunities and challenges in moving from traditional point-to-point integration in application architecture to one with event streaming. Apache Kafka and Spring provide a solid foundation for enterprise and large organizations to implement event streaming solutions. Examples and common patterns are covered
towards the end.
Many thanks to James Watters and all the original content authors, editors and aggregators referenced in the slides.
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Servicesconfluent
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services, Perry Krol, Head of Systems Engineering, CEMEA, Confluent
https://www.meetup.com/Frankfurt-Apache-Kafka-Meetup-by-Confluent/events/269751169/
Battle-tested event-driven patterns for your microservices architecture - Sca...Natan Silnitsky
During the past couple of years I’ve implemented or have witnessed implementations of several key patterns of event-driven messaging designs on top of Kafka that have facilitated creating a robust distributed microservices system at Wix that can easily handle increasing traffic and storage needs with many different use-cases.
In this talk I will share these patterns with you, including:
* Consume and Project (data decoupling)
* End-to-end Events (Kafka+websockets)
* In memory KV stores (consume and query with 0-latency)
* Events transactions (Exactly Once Delivery)
This document discusses cloud-native Apache Kafka and Kubernetes integrations. It introduces Apache Kafka as a proven technology for real-time data processing and describes how it is used for digital experiences, microservices applications, streaming ETL, and real-time analytics. It then makes the case for using a cloud-native, managed Kafka platform like Red Hat OpenShift Streams, which provides a reduced complexity solution with a managed Apache Kafka cluster. Finally, it provides an overview of Red Hat OpenShift Streams and demonstrates how to use Kamelets to easily integrate and develop stream-based applications on Kubernetes.
New Features in Confluent Platform 6.0 / Apache Kafka 2.6Kai Wähner
New Features in Confluent Platform 6.0 / Apache Kafka 2.6, including REST Proxy and API, Tiered Storage for AWS S3 and GCP GCS, Cluster Linking (On-Premise, Edge, Hybrid, Multi-Cloud), Self-Balancing Clusters), ksqlDB.
Concepts and Patterns for Streaming Services with KafkaQAware GmbH
Cloud Native Night March 2020, Mainz: Talk by Perry Krol (@perkrol, Confluent)
=== Please download slides if blurred! ===
Abstract: Proven approaches such as service-oriented and event-driven architectures are joined by newer techniques such as microservices, reactive architectures, DevOps, and stream processing. Many of these patterns are successful by themselves, but they provide a more holistic and compelling approach when applied together. In this session Confluent will provide insights how service-based architectures and stream processing tools such as Apache Kafka® can help you build business-critical systems. You will learn why streaming beats request-response based architectures in complex, contemporary use cases, and explain why replayable logs such as Kafka provide a backbone for both service communication and shared datasets.
Based on these principles, we will explore how event collaboration and event sourcing patterns increase safety and recoverability with functional, event-driven approaches, apply patterns including Event Sourcing and CQRS, and how to build multi-team systems with microservices and SOA using patterns such as “inside out databases” and “event streams as a source of truth”.
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Michael Noll
Talk URL: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/77360
Abstract: Would you cross the street with traffic information that’s a minute old? Certainly not. Modern businesses have the same needs nowadays, whether it’s due to competitive pressure or because their customers have much higher expectations of how they want to interact with a product or service. At the heart of this movement are events: in today’s digital age, events are everywhere. Every digital action—across online purchases to ride-sharing requests to bank deposits—creates a set of events around transaction amount, transaction time, user location, account balance, and much more. The technology that allows businesses to read, write, store, and compute and process these events in real-time are event-streaming platforms, and tens of thousands of companies like Netflix, Audi, PayPal, Airbnb, Uber, and Pinterest have picked Apache Kafka as the de facto choice to implement event-driven architectures and reshape their industries.
Michael Noll explores why and how you can use Apache Kafka and its growing ecosystem to build event-driven architectures that are elastic, scalable, robust, and fault tolerant, whether it’s on-premises, in the cloud, on bare metal machines, or in Kubernetes with Docker containers. Specifically, you’ll look at Kafka as the storage and publish and subscribe layer; Kafka’s Connect framework for integrating external data systems such as MySQL, Elastic, or S3 with Kafka; and Kafka’s Streams API and KSQL as the compute layer to implement event-driven applications and microservices in Java and Scala and streaming SQL, respectively, that process the events flowing through Kafka in real time. Michael provides an overview of the most relevant functionality, both current and upcoming, and shares best practices and typical use cases so you can tie it all together for your own needs.
Event streaming: A paradigm shift in enterprise software architectureSina Sojoodi
This talk helps developers and architects understand the benefits, opportunities and challenges in moving from traditional point-to-point integration in application architecture to one with event streaming. Apache Kafka and Spring provide a solid foundation for enterprise and large organizations to implement event streaming solutions. Examples and common patterns are covered
towards the end.
Many thanks to James Watters and all the original content authors, editors and aggregators referenced in the slides.
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Servicesconfluent
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services, Perry Krol, Head of Systems Engineering, CEMEA, Confluent
https://www.meetup.com/Frankfurt-Apache-Kafka-Meetup-by-Confluent/events/269751169/
Battle-tested event-driven patterns for your microservices architecture - Sca...Natan Silnitsky
During the past couple of years I’ve implemented or have witnessed implementations of several key patterns of event-driven messaging designs on top of Kafka that have facilitated creating a robust distributed microservices system at Wix that can easily handle increasing traffic and storage needs with many different use-cases.
In this talk I will share these patterns with you, including:
* Consume and Project (data decoupling)
* End-to-end Events (Kafka+websockets)
* In memory KV stores (consume and query with 0-latency)
* Events transactions (Exactly Once Delivery)
What is Apache Kafka and What is an Event Streaming Platform?confluent
Speaker: Gabriel Schenker, Lead Curriculum Developer, Confluent
Streaming platforms have emerged as a popular, new trend, but what exactly is a streaming platform? Part messaging system, part Hadoop made fast, part fast ETL and scalable data integration. With Apache Kafka® at the core, event streaming platforms offer an entirely new perspective on managing the flow of data. This talk will explain what an event streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. New developments in this area such as KSQL will also be discussed.
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...Natan Silnitsky
Kafka is the bedrock of Wix's distributed microservices system. For the last 5 years we have learned a lot about how to successfully scale our event-driven architecture to roughly 1500 microservices.
We’ve managed to achieve higher decoupling and independence for our various services and dev teams that have very different use-cases while maintaining a single uniform infrastructure in place.
In these slides you will learn about 8 key decisions and steps you can take in order to safely scale-up your Kafka-based system. These include:
* How to increase dev velocity of event driven style code.
* How to optimize working with Kafka in polyglot setting
* How to support growing amount of traffic and developers.
1) Sam Vanhoutte discusses using Azure services like IoT Edge, IoT Hub, Stream Analytics, and Azure Databricks for real-time data analytics in IoT from edge to cloud.
2) A traffic camera scenario is presented where IoT Edge is used at the edge for tasks like license plate recognition while the cloud is used for analytics like detecting speeding tickets and suspicious vehicles.
3) Stream Analytics is used both at the edge and in the cloud to process streaming data in real-time while Azure Databricks is used for structured streaming and continuous aggregations using Apache Spark.
Real-time analytics have traditionally been analyzed using batch processing in DWH/Hadoop environments. Common use cases use data lakes, data science, and machine learning (ML). Creating serverless data-driven architecture and serverless streaming solutions with services like Amazon Kinesis, AWS Lambda, and Amazon Athena can solve real-time ingestion, storage, and analytics challenges, and help you focus on application logic without managing infrastructure. Learn design patterns and best practices for serverless stream processing.
GCP for Apache Kafka® Users: Stream Ingestion and Processingconfluent
Watch this talk here: https://www.confluent.io/online-talks/gcp-for-apache-kafka-users-stream-ingestion-processing
In private and public clouds, stream analytics commonly means stateless processing systems organized around Apache Kafka® or a similar distributed log service. GCP took a somewhat different tack, with Cloud Pub/Sub, Dataflow, and BigQuery, distributing the responsibility for processing among ingestion, processing and database technologies.
We compare the two approaches to data integration and show how Dataflow allows you to join and transform and deliver data streams among on-prem and cloud Apache Kafka clusters, Cloud Pub/Sub topics and a variety of databases. The session will have a mix of architectural discussions and practical code reviews of Dataflow-based pipelines.
How to build 1000 microservices with Kafka and thriveNatan Silnitsky
This talk is about the Wix ecosystem for event driven architecture on top of Kafka.
I share the best practices, SDKs and tools we have created in order to be able to scale our distributed system to more than 1000 microservices.
Should we manage events like APIs? | Kim Clark, IBMHostedbyConfluent
APIs have become ubiquitous as a way of exposing the capabilities of the enterprise both internally and externally. However, are APIs alone enough? There is a strong resurgence in interest in asynchronous communication and event driven architecture. Applications want to receive events immediately so they can respond in real time, and furthermore they also want the benefit of being decoupled from the availability and performance characteristics of the systems providing that data. However, whilst the way that APIs are socialized, exposed, versioned etc. is well matured in the form of API management technology. We are now on the cusp of seeing first class support for event endpoint management to provide the same sophistication for discovering, exposing and consuming events.
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaKai Wähner
Apache Kafka and Event Streaming are two of the most relevant buzzwords in tech these days. Ever wonder what the predicted TOP 5 Event Streaming Architectures and Use Cases for 2021 are? Check out the following presentation. Learn about edge deployments, hybrid and multi-cloud architectures, service mesh-based microservices, streaming machine learning, and cybersecurity.
On-demand video recording: https://videos.confluent.io/watch/XAjxV3j8hzwCcEKoZVErUJ
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesKai Wähner
This session introduces Apache Kafka, an event-driven open source streaming platform. Apache Kafka goes far beyond scalable, high volume messaging. In addition, you can leverage Kafka Connect for integration and the Kafka Streams API for building lightweight stream processing microservices in autonomous teams. The Confluent Platform adds further components such as a Schema Registry, REST Proxy, KSQL, Clients for different programming languages and Connectors for different technologies.
The session discusses how tech giants like LinkedIn, Ebay or Airbnb leverage Apache Kafka as event streaming platform to solve various different business problems and how to create a scalable, flexible microservice architecture. A live demo shows how you can easily process and analyze streams of events using Apache Kafka and KSQL.
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...confluent
Watch this talk here: https://www.confluent.io/online-talks/using-apache-kafka-to-optimize-real-time-analytics-financial-services-iot-applications
When it comes to the fast-paced nature of capital markets and IoT, the ability to analyze data in real time is critical to gaining an edge. It’s not just about the quantity of data you can analyze at once, it’s about the speed, scale, and quality of the data you have at your fingertips.
Modern streaming data technologies like Apache Kafka and the broader Confluent platform can help detect opportunities and threats in real time. They can improve profitability, yield, and performance. Combining Kafka with Panopticon visual analytics provides a powerful foundation for optimizing your operations.
Use cases in capital markets include transaction cost analysis (TCA), risk monitoring, surveillance of trading and trader activity, compliance, and optimizing profitability of electronic trading operations. Use cases in IoT include monitoring manufacturing processes, logistics, and connected vehicle telemetry and geospatial data.
This online talk will include in depth practical demonstrations of how Confluent and Panopticon together support several key applications. You will learn:
-Why Apache Kafka is widely used to improve performance of complex operational systems
-How Confluent and Panopticon open new opportunities to analyze operational data in real time
-How to quickly identify and react immediately to fast-emerging trends, clusters, and anomalies
-How to scale data ingestion and data processing
-Build new analytics dashboards in minutes
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...confluent
A powerful stream processing platform and an end-user friendly spreadsheet-interface, if this combination rings a bell, you should definitely attend our „Streamsheets and Apache Kafka“ webinar. While development is interactive with a web user interface, Streamsheets applications can run as mission-critical applications. They directly consume and produce event streams in Apache Kafka. One popular option is to run everything in the cloud leveraging the fully managed Confluent Cloud service on AWS, GCP or Azure. Without any coding or scripting, end-users leverage their existing spreadsheet skills to build customized streaming apps for analysis, dashboarding, condition monitoring or any kind of real-time pre-and post-processing of Kafka or KsqlDB streams and tables.
Hear Kai Waehner of Confluent and Kristian Raue of Cedalo on these topics:
• Where Apache Kafka and Streamsheets fit in the data ecosystem (Industrial IoT, Smart Energy, Clinical Applications, Finance Applications)
• Customer Story: How the Freiburg University Hospital uses Kafka and Streamsheets for dashboarding the utilization of clinical assets
• 15-Minutes Live Demonstration: Building a financial fraud detection dashboard based on Confluent Cloud, ksqlDB and Cedalo Cloud Streamsheets just using spreadsheet formulas.
Speaker:
Kai Waehner, Technology Evangelist, Confluent
Kristian Raue, Founder & Chief Technologist, cedalo
Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...Natan Silnitsky
During the past couple of years I’ve implemented or have witnessed implementations of several key patterns of event-driven messaging designs on top of Kafka that have facilitated creating a robust distributed microservices system at Wix that can easily handle increasing traffic and storage needs with many different use-cases.
In this talk I will share these patterns with you, including:
* Consume and Project (data decoupling)
* End-to-end Events (Kafka+websockets)
* In memory KV stores (consume and query with 0-latency)
* Events transactions (Exactly Once Delivery)
Serverless London 2019 FaaS composition using Kafka and CloudEventsNeil Avery
FaaS composition using Kafka and Cloud-Events
LOCATION: Burton & Redgrave, DATE: November 7, 2019, TIME: 2:30 pm - 3:15 pm
https://serverlesscomputing.london/sessions/faas-composition-using-kafka-and-cloud-events/
Serverless functions or FaaS are all the rage. By leveraging well established event-driven microservice design principles and applying them to serverless functions we can build a homogenous ecosystem to run FaaS applications.
Kafka’s natural ability to store and replay events means serverless functions can not only be replayed, but they can also be used to choreograph call chains or driven using orchestration. Kafka also means we can democratize and organize FaaS environments in a way that scales across the enterprise.
Underpinning this mantra is the use of Cloud Events by the CNCF serverless working group (of which Confluent is an active member).
Objective of the talk
You will leave the talk with an understanding of what the future of cloud holds, a methodology for embracing serverless functions and how they become part of your journey to a cloud-native, event-driven architecture.
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfDATAVERSITY
The document discusses 4 reasons to use a cloud-native Kafka service like Confluent Cloud instead of managing Kafka yourself. It notes that managing Kafka requires significant investment of time and resources for tasks like architecture planning, cluster sizing, software upgrades, and more. A cloud-native service handles all operational overhead automatically so you can focus on your core business. Confluent Cloud specifically offers elastic scaling, infinite data retention, global access across clouds, and integrations to make it a complete data streaming platform.
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKai Wähner
Spoilt for Choice – Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka:
Apache Kafka is a de facto standard streaming data processing platform. It is widely deployed as event streaming platform. Part of Kafka is its stream processing API “Kafka Streams”. In addition, the Kafka ecosystem now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax.
This session discusses and demos the pros and cons of Kafka Streams and KSQL to understand when to use which stream processing alternative for continuous stream processing natively on Apache Kafka infrastructures. The end of the session compares the trade-offs of Kafka Streams and KSQL to separate stream processing frameworks such as Apache Flink or Spark Streaming.
Bridge Your Kafka Streams to Azure Webinarconfluent
With a fully managed Apache Kafka(R) as-a-service on Microsoft Azure, businesses can focus on building applications and not managing clusters. Build a persistent bridge from on-premises data systems to the cloud with a hybrid Kafka service or stream across public clouds for multi-cloud data pipelines.
In this session for business and technical data leaders, you can learn about powering business applications with the managed Kafka service that streams data into Azure SQL Data Warehouse, Cosmos DB, Azure Data Lake Storage and Azure Blob Storage.
More info: https://cnfl.io/cloud-native-experience-for-kafka-in-cloud | Neha Narkhede is co-founder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s streaming infrastructure built on top of Apache Kafka and Apache Samza. She is one of the initial authors of Apache Kafka and a committer and PMC member on the project.
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Aljoscha Krettek
Flink is a great stream processor, Python is a great programming language, Apache Beam is a great programming model and portability layer. Using all three together is a great idea! We will demo and discuss writing Beam Python pipelines and running them on Flink. We will cover Beam's portability vision that led here, what you need to know about how Beam Python pipelines are executed on Flink, and where Beam's portability framework is headed next (hint: Python pipelines reading from non-Python connectors)
What you need to know about .NET Core 3.0 and beyondJon Galloway
The document provides an overview of .NET Core 3.0 including its top features, upcoming release schedule, and what is coming next. It discusses the key features in .NET Core 3.0 such as Windows desktop apps, microservices, gRPC, and machine learning. It also outlines the future of .NET with .NET 5 which will unify the different .NET implementations into a single platform.
What is Apache Kafka and What is an Event Streaming Platform?confluent
Speaker: Gabriel Schenker, Lead Curriculum Developer, Confluent
Streaming platforms have emerged as a popular, new trend, but what exactly is a streaming platform? Part messaging system, part Hadoop made fast, part fast ETL and scalable data integration. With Apache Kafka® at the core, event streaming platforms offer an entirely new perspective on managing the flow of data. This talk will explain what an event streaming platform such as Apache Kafka is and some of the use cases and design patterns around its use—including several examples of where it is solving real business problems. New developments in this area such as KSQL will also be discussed.
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...Natan Silnitsky
Kafka is the bedrock of Wix's distributed microservices system. For the last 5 years we have learned a lot about how to successfully scale our event-driven architecture to roughly 1500 microservices.
We’ve managed to achieve higher decoupling and independence for our various services and dev teams that have very different use-cases while maintaining a single uniform infrastructure in place.
In these slides you will learn about 8 key decisions and steps you can take in order to safely scale-up your Kafka-based system. These include:
* How to increase dev velocity of event driven style code.
* How to optimize working with Kafka in polyglot setting
* How to support growing amount of traffic and developers.
1) Sam Vanhoutte discusses using Azure services like IoT Edge, IoT Hub, Stream Analytics, and Azure Databricks for real-time data analytics in IoT from edge to cloud.
2) A traffic camera scenario is presented where IoT Edge is used at the edge for tasks like license plate recognition while the cloud is used for analytics like detecting speeding tickets and suspicious vehicles.
3) Stream Analytics is used both at the edge and in the cloud to process streaming data in real-time while Azure Databricks is used for structured streaming and continuous aggregations using Apache Spark.
Real-time analytics have traditionally been analyzed using batch processing in DWH/Hadoop environments. Common use cases use data lakes, data science, and machine learning (ML). Creating serverless data-driven architecture and serverless streaming solutions with services like Amazon Kinesis, AWS Lambda, and Amazon Athena can solve real-time ingestion, storage, and analytics challenges, and help you focus on application logic without managing infrastructure. Learn design patterns and best practices for serverless stream processing.
GCP for Apache Kafka® Users: Stream Ingestion and Processingconfluent
Watch this talk here: https://www.confluent.io/online-talks/gcp-for-apache-kafka-users-stream-ingestion-processing
In private and public clouds, stream analytics commonly means stateless processing systems organized around Apache Kafka® or a similar distributed log service. GCP took a somewhat different tack, with Cloud Pub/Sub, Dataflow, and BigQuery, distributing the responsibility for processing among ingestion, processing and database technologies.
We compare the two approaches to data integration and show how Dataflow allows you to join and transform and deliver data streams among on-prem and cloud Apache Kafka clusters, Cloud Pub/Sub topics and a variety of databases. The session will have a mix of architectural discussions and practical code reviews of Dataflow-based pipelines.
How to build 1000 microservices with Kafka and thriveNatan Silnitsky
This talk is about the Wix ecosystem for event driven architecture on top of Kafka.
I share the best practices, SDKs and tools we have created in order to be able to scale our distributed system to more than 1000 microservices.
Should we manage events like APIs? | Kim Clark, IBMHostedbyConfluent
APIs have become ubiquitous as a way of exposing the capabilities of the enterprise both internally and externally. However, are APIs alone enough? There is a strong resurgence in interest in asynchronous communication and event driven architecture. Applications want to receive events immediately so they can respond in real time, and furthermore they also want the benefit of being decoupled from the availability and performance characteristics of the systems providing that data. However, whilst the way that APIs are socialized, exposed, versioned etc. is well matured in the form of API management technology. We are now on the cusp of seeing first class support for event endpoint management to provide the same sophistication for discovering, exposing and consuming events.
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaKai Wähner
Apache Kafka and Event Streaming are two of the most relevant buzzwords in tech these days. Ever wonder what the predicted TOP 5 Event Streaming Architectures and Use Cases for 2021 are? Check out the following presentation. Learn about edge deployments, hybrid and multi-cloud architectures, service mesh-based microservices, streaming machine learning, and cybersecurity.
On-demand video recording: https://videos.confluent.io/watch/XAjxV3j8hzwCcEKoZVErUJ
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesKai Wähner
This session introduces Apache Kafka, an event-driven open source streaming platform. Apache Kafka goes far beyond scalable, high volume messaging. In addition, you can leverage Kafka Connect for integration and the Kafka Streams API for building lightweight stream processing microservices in autonomous teams. The Confluent Platform adds further components such as a Schema Registry, REST Proxy, KSQL, Clients for different programming languages and Connectors for different technologies.
The session discusses how tech giants like LinkedIn, Ebay or Airbnb leverage Apache Kafka as event streaming platform to solve various different business problems and how to create a scalable, flexible microservice architecture. A live demo shows how you can easily process and analyze streams of events using Apache Kafka and KSQL.
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...confluent
Watch this talk here: https://www.confluent.io/online-talks/using-apache-kafka-to-optimize-real-time-analytics-financial-services-iot-applications
When it comes to the fast-paced nature of capital markets and IoT, the ability to analyze data in real time is critical to gaining an edge. It’s not just about the quantity of data you can analyze at once, it’s about the speed, scale, and quality of the data you have at your fingertips.
Modern streaming data technologies like Apache Kafka and the broader Confluent platform can help detect opportunities and threats in real time. They can improve profitability, yield, and performance. Combining Kafka with Panopticon visual analytics provides a powerful foundation for optimizing your operations.
Use cases in capital markets include transaction cost analysis (TCA), risk monitoring, surveillance of trading and trader activity, compliance, and optimizing profitability of electronic trading operations. Use cases in IoT include monitoring manufacturing processes, logistics, and connected vehicle telemetry and geospatial data.
This online talk will include in depth practical demonstrations of how Confluent and Panopticon together support several key applications. You will learn:
-Why Apache Kafka is widely used to improve performance of complex operational systems
-How Confluent and Panopticon open new opportunities to analyze operational data in real time
-How to quickly identify and react immediately to fast-emerging trends, clusters, and anomalies
-How to scale data ingestion and data processing
-Build new analytics dashboards in minutes
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...confluent
A powerful stream processing platform and an end-user friendly spreadsheet-interface, if this combination rings a bell, you should definitely attend our „Streamsheets and Apache Kafka“ webinar. While development is interactive with a web user interface, Streamsheets applications can run as mission-critical applications. They directly consume and produce event streams in Apache Kafka. One popular option is to run everything in the cloud leveraging the fully managed Confluent Cloud service on AWS, GCP or Azure. Without any coding or scripting, end-users leverage their existing spreadsheet skills to build customized streaming apps for analysis, dashboarding, condition monitoring or any kind of real-time pre-and post-processing of Kafka or KsqlDB streams and tables.
Hear Kai Waehner of Confluent and Kristian Raue of Cedalo on these topics:
• Where Apache Kafka and Streamsheets fit in the data ecosystem (Industrial IoT, Smart Energy, Clinical Applications, Finance Applications)
• Customer Story: How the Freiburg University Hospital uses Kafka and Streamsheets for dashboarding the utilization of clinical assets
• 15-Minutes Live Demonstration: Building a financial fraud detection dashboard based on Confluent Cloud, ksqlDB and Cedalo Cloud Streamsheets just using spreadsheet formulas.
Speaker:
Kai Waehner, Technology Evangelist, Confluent
Kristian Raue, Founder & Chief Technologist, cedalo
Battle Tested Event-Driven Patterns for your Microservices Architecture - Dev...Natan Silnitsky
During the past couple of years I’ve implemented or have witnessed implementations of several key patterns of event-driven messaging designs on top of Kafka that have facilitated creating a robust distributed microservices system at Wix that can easily handle increasing traffic and storage needs with many different use-cases.
In this talk I will share these patterns with you, including:
* Consume and Project (data decoupling)
* End-to-end Events (Kafka+websockets)
* In memory KV stores (consume and query with 0-latency)
* Events transactions (Exactly Once Delivery)
Serverless London 2019 FaaS composition using Kafka and CloudEventsNeil Avery
FaaS composition using Kafka and Cloud-Events
LOCATION: Burton & Redgrave, DATE: November 7, 2019, TIME: 2:30 pm - 3:15 pm
https://serverlesscomputing.london/sessions/faas-composition-using-kafka-and-cloud-events/
Serverless functions or FaaS are all the rage. By leveraging well established event-driven microservice design principles and applying them to serverless functions we can build a homogenous ecosystem to run FaaS applications.
Kafka’s natural ability to store and replay events means serverless functions can not only be replayed, but they can also be used to choreograph call chains or driven using orchestration. Kafka also means we can democratize and organize FaaS environments in a way that scales across the enterprise.
Underpinning this mantra is the use of Cloud Events by the CNCF serverless working group (of which Confluent is an active member).
Objective of the talk
You will leave the talk with an understanding of what the future of cloud holds, a methodology for embracing serverless functions and how they become part of your journey to a cloud-native, event-driven architecture.
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfDATAVERSITY
The document discusses 4 reasons to use a cloud-native Kafka service like Confluent Cloud instead of managing Kafka yourself. It notes that managing Kafka requires significant investment of time and resources for tasks like architecture planning, cluster sizing, software upgrades, and more. A cloud-native service handles all operational overhead automatically so you can focus on your core business. Confluent Cloud specifically offers elastic scaling, infinite data retention, global access across clouds, and integrations to make it a complete data streaming platform.
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKai Wähner
Spoilt for Choice – Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka:
Apache Kafka is a de facto standard streaming data processing platform. It is widely deployed as event streaming platform. Part of Kafka is its stream processing API “Kafka Streams”. In addition, the Kafka ecosystem now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax.
This session discusses and demos the pros and cons of Kafka Streams and KSQL to understand when to use which stream processing alternative for continuous stream processing natively on Apache Kafka infrastructures. The end of the session compares the trade-offs of Kafka Streams and KSQL to separate stream processing frameworks such as Apache Flink or Spark Streaming.
Bridge Your Kafka Streams to Azure Webinarconfluent
With a fully managed Apache Kafka(R) as-a-service on Microsoft Azure, businesses can focus on building applications and not managing clusters. Build a persistent bridge from on-premises data systems to the cloud with a hybrid Kafka service or stream across public clouds for multi-cloud data pipelines.
In this session for business and technical data leaders, you can learn about powering business applications with the managed Kafka service that streams data into Azure SQL Data Warehouse, Cosmos DB, Azure Data Lake Storage and Azure Blob Storage.
More info: https://cnfl.io/cloud-native-experience-for-kafka-in-cloud | Neha Narkhede is co-founder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s streaming infrastructure built on top of Apache Kafka and Apache Samza. She is one of the initial authors of Apache Kafka and a committer and PMC member on the project.
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...Aljoscha Krettek
Flink is a great stream processor, Python is a great programming language, Apache Beam is a great programming model and portability layer. Using all three together is a great idea! We will demo and discuss writing Beam Python pipelines and running them on Flink. We will cover Beam's portability vision that led here, what you need to know about how Beam Python pipelines are executed on Flink, and where Beam's portability framework is headed next (hint: Python pipelines reading from non-Python connectors)
What you need to know about .NET Core 3.0 and beyondJon Galloway
The document provides an overview of .NET Core 3.0 including its top features, upcoming release schedule, and what is coming next. It discusses the key features in .NET Core 3.0 such as Windows desktop apps, microservices, gRPC, and machine learning. It also outlines the future of .NET with .NET 5 which will unify the different .NET implementations into a single platform.
Flink Forward Berlin 2017: Aljoscha Krettek - Talk Python to me: Stream Proce...Flink Forward
Flink is a great stream processor, Python is a great programming language, Apache Beam is a great programming model and portability layer. Using all three together is a great idea! We will demo and discuss writing Beam Python pipelines and running them on Flink. We will cover Beam's portability vision that led here, what you need to know about how Beam Python pipelines are executed on Flink, and where Beam's portability framework is headed next (hint: Python pipelines reading from non-Python connectors)
Apache Pulsar Development 101 with PythonTimothy Spann
Apache Pulsar Development 101 with Python PS2022_Ecosystem_v0.0
There is always the fear a speaker cannot make it. So just in case, since I was the MC for the ecosystem track I put together a talk just in case.
Here it is. Never seen or presented.
Python is popular amongst data scientists and engineers for data processing tasks. The big data ecosystem has traditionally been rather JVM centric. Often Java (or Scala) are the only viable option to implement data processing pipelines. That sometimes poses an adoption barrier for organizations that have already invested in other language ecosystems. The Apache Beam project provides a unified programming model for data processing and its ongoing portability effort aims to enable multiple language SDKs (currently Java, Python and Go) on a common set of runners. The combination of Python streaming on the Apache Flink runner is one example. Let’s take a look how the Flink runner translates the Beam model into the native DataStream (or DataSet) API, how the runner is changing to support portable pipelines, how Python user code execution is coordinated with gRPC based services and how a sample pipeline runs on Flink.
Flink Forward Berlin 2018: Thomas Weise & Aljoscha Krettek - "Python Streamin...Flink Forward
Python is popular amongst data scientists and engineers for data processing tasks. The big data ecosystem has traditionally been rather JVM centric. Often Java (or Scala) are the only viable option to implement data processing pipelines. That sometimes poses an adoption barrier for organizations that have already invested in other language ecosystems. The Apache Beam project provides a unified programming model for data processing and its ongoing portability effort aims to enable multiple language SDKs (currently Java, Python and Go) on a common set of runners. The combination of Python streaming on the Apache Flink runner is one example. Let’s take a look how the Flink runner translates the Beam model into the native DataStream (or DataSet) API, how the runner is changing to support portable pipelines, how Python user code execution is coordinated with gRPC based services and how a sample pipeline runs on Flink.
Workshop híbrido: Stream Processing con Flinkconfluent
El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real.
Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real.
En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.
Serverless Event Streaming Applications as Functions on K8DoKC
We will walk through how to build serverless event streaming applications as functions running in a function mesh on kubernetes with cloud native messaging via Apache Pulsar.
In this talk, you will deploy ML functions to transform real-time data on Kubernetes.
This talk was given by Timothy Spann for DoK Day Europe @ KubeCon 2022.
Ballerina: A Cloud Native Programming LanguageWSO2
This slide deck explores Ballerina features and development - most of the talk will be a live demo together with a discussion on the motivation for creating a new programming language and the design inspirations.
SuperConnectivity: One company’s heroic mission to deliver on the promises of...4DK Technologies, Inc.
A high level deck illustrating 4DK's SuperConnectivity product suite. Suitable for product managers in the wireless industry, including network and device executives.
CocoaConf: The Language of Mobile Software is APIsTim Burks
We’re all excited about using the same language to write our mobile apps and cloud services, but as we do, we’ll still need to work with a few things that aren’t written with Swift. Fortunately, there are some great patterns that we can use for doing that. In this session we’ll talk about two technologies that you can use to make your app speak with APIs written in any language: OpenAPI and Protocol Buffers, and then we’ll see how to use them from clients and servers that are written in Swift.
Presented Friday November 4, 2016 in San Jose.
Serverless Event Streaming Applications as Functionson K8Timothy Spann
This document discusses Apache Pulsar, a cloud-native messaging and event streaming platform. It provides an overview of key Pulsar concepts including messaging vs streaming, the Pulsar cluster architecture using brokers and bookies, and Pulsar Functions which allow processing data streams using multiple programming languages. Examples of using Pulsar Functions with Java, Python and deploying on Kubernetes are also presented. Benefits of using Pulsar for building microservices, asynchronous communication, real-time applications and tiered storage are highlighted.
Serverless on Google Cloud covers a lot: compute, Cloud Functions, Cloud Run, App Engine, containers, Kubernetes, Firebase and much more. We'll also cover storage, containers vs apps vs functions, ML and AI, and much more.
Portable batch and streaming pipelines with Apache Beam (Big Data Application...Malo Denielou
Apache Beam is a top-level Apache project which aims at providing a unified API for efficient and portable data processing pipeline. Beam handles both batch and streaming use cases and neatly separates properties of the data from runtime characteristics, allowing pipelines to be portable across multiple runtimes, both open-source (e.g., Apache Flink, Apache Spark, Apache Apex, ...) and proprietary (e.g., Google Cloud Dataflow). This talk will cover the basics of Apache Beam, describe the main concepts of the programming model and talk about the current state of the project (new python support, first stable version). We'll illustrate the concepts with a use case running on several runners.
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...Flink Forward
This document introduces Apache Beam, a unified model for batch and stream processing, and discusses its portability across languages and backends. It also introduces TFX, a TensorFlow tool for building end-to-end machine learning pipelines that addresses data collection, preprocessing, analysis, serving, and monitoring using components like TensorFlow Transform and TensorFlow Model Analysis. A demo of TFX's model analysis capabilities on a Chicago taxi dataset is provided.
Creating microservices architectures using node.js and KubernetesPaul Goldbaum
This document discusses creating microservices architectures using Node.js and Kubernetes. It covers the author's migration from a monolithic PHP stack to a microservices architecture using Node.js, Express, MongoDB, Kubernetes, and other tools. Key points include choosing programming languages, communicating between microservices, deploying services to Kubernetes, monitoring services, and creating a custom microservices framework to make developing services easier.
What I learned about APIs in my first year at GoogleTim Burks
Tim Burks spent a decade building Electronic Design Automation systems and another building mobile apps. Now he's focused on the thing that holds them all together: APIs. In 2016 he joined Google where he works on open source software that helps developers use gRPC and OpenAPI.
Rpc Case Studies (Distributed computing)Sri Prasanna
The document provides an overview of various remote procedure call (RPC) systems including Sun RPC, DCE RPC, DCOM, CORBA, and Java RMI. It summarizes the key aspects of each system such as how interfaces are defined, how clients locate and invoke remote objects, how data is marshaled and transported, and improvements made in newer systems over older ones.
Amit Kumar is a technical professional with 3+ years of experience in Spark, Scala, Java, Hadoop and AWS. He has experience developing data ingestion frameworks using these technologies. His current project involves ingesting data from multiple sources into AWS S3 and creating a golden record for each customer. He is responsible for data quality checks, creating jobs to ingest and process the data, and automating the workflow using AWS Lambda and EMR. Previously he has worked on projects involving data migration from Teradata to Hadoop, converting graphs to XML/Java code to replicate workflows, and developing software for aircraft cabin systems.
Similar to Stream processing for the masses with beam, python and flink (20)
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
Odoo ERP software
Odoo ERP software, a leading open-source software for Enterprise Resource Planning (ERP) and business management, has recently launched its latest version, Odoo 17 Community Edition. This update introduces a range of new features and enhancements designed to streamline business operations and support growth.
The Odoo Community serves as a cost-free edition within the Odoo suite of ERP systems. Tailored to accommodate the standard needs of business operations, it provides a robust platform suitable for organisations of different sizes and business sectors. Within the Odoo Community Edition, users can access a variety of essential features and services essential for managing day-to-day tasks efficiently.
This blog presents a detailed overview of the features available within the Odoo 17 Community edition, and the differences between Odoo 17 community and enterprise editions, aiming to equip you with the necessary information to make an informed decision about its suitability for your business.
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
DDS Security Version 1.2 was adopted in 2024. This revision strengthens support for long runnings systems adding new cryptographic algorithms, certificate revocation, and hardness against DoS attacks.
WhatsApp offers simple, reliable, and private messaging and calling services for free worldwide. With end-to-end encryption, your personal messages and calls are secure, ensuring only you and the recipient can access them. Enjoy voice and video calls to stay connected with loved ones or colleagues. Express yourself using stickers, GIFs, or by sharing moments on Status. WhatsApp Business enables global customer outreach, facilitating sales growth and relationship building through showcasing products and services. Stay connected effortlessly with group chats for planning outings with friends or staying updated on family conversations.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
5. STREAMS POWER YELP
Powered by
streaming
Notifications
Real-time visit detection
User Search
Indexing pipeline
User personalization
Purchase flows
ML feature ETL
Ads
Product development
Transactions
Realtime campaign shut-off
Experimentation infrastructure and guardrail metrics
8. DATA PIPELINE
Strong data schematization and documentation
Standardized wire protocol (AVRO)
Contract between data producers and consumers
Centralized schema registry
Decouple data
ETL
9. PROCESSOR
Paastorm
Paastorm was Yelp’s answer to the lack of good
open source Python stream processors
Paastorm provides a thin wrapper around Kafka
producer/consumer
Good fit to perform map/flatmap transformations
10. PROCESSOR
Paastorm
Adoption
Paastorm API is a class called Spolt
Users extend the Spolt and implement
process_message(self, message)
Over 150 production Paastorm applications
11. class GreatReviewsSpolt(Spolt):
def process_message(self, message):
payload = message.payload_data
if payload['rating'] >= 4.0:
yield message
if __name__ == ‘__main__’:
Paastorm(GreatReviewsSpolt()).run()
PROCESSOR
Paastorm
Code
12. Need shiny new tools to leverage the value
of real-time data
13. 2017
Unlocked real-time data processing at scale
Stateful processing
Powerful streaming oriented (DataStream) API
Apache Flink
Event time processing
16. LIMITATIONS
Tightly coupled to Kafka
No high level primitives: groupBy, windowing,
filter, etc.
No stateful processing support
High cost to implement and maintain new features
17. CHALLENGES
Hundreds of Python libraries implementing
business logic
Flink SQL good mostly for simple SQL like
transformations
High barrier of entry to JVM language
19. BEAM
And more ...
Pipeline
Driver program
Programming
model
Execution
responsible for defining the pipeline in the Beam SDK
represent the logical data processing tasks
run on any supported distributed processing framework
21. BEAM
Python SDK
High level API
Side Input and tagged output
State and Timers
ParDo, Map, Flatmap, Filter, GroupByKey, CoGroupByKey
Support for Window and Triggers
Fixed, Sliding and Session windows and a variety triggers
ParDo with two or more inputs and two or more outputs
Can be combined to build complex stateful applications
22. EXECUTION
Beam
Portability API
Portability API
Define the protocols used by the runner (e.g. Flink) to translate
and run the pipeline
Python SDK
Make use of a containerized SDK harness to run
language specific UDFs
Fn API
Rely on gRPC for runner - SDK worker
communication
Beam Model: Fn Runners
Apache
Flink
Apache
Spark
Beam Model: Pipeline Construction
Other
LanguagesBeam Java
Beam
Python
Execution Execution
Cloud
Dataflow
Execution
24. INTEGRATION
Data Pipeline
Source and
Sink
Yelp specific implementation to discover Kafka
clusters
Team expertise around the Flink
Consumer/Producer
Customize Portable translation to “attach”
existing Flink components to a Beam pipeline
Flink
DPSource
Beam Flink DPSink
Coder
deserializer
Coder
serializer
WindowedValue<byte[]> WindowedValue<byte[]>
25. INTEGRATION
Invoking Flink
code
PBegin for the source and PDone for the sink
class FlinkYelpDatapipelineSource(PTransform):
def expand(self, pbegin): return pvalue.PCollection(pbegin.pipeline)
def infer_output_type(self, unused_input_type): return Message
def to_runner_api_parameter(self, context):
api_parameters = ('yelp:flinkYelpDatapipelineSource',
json.dumps({...}))
return api_parameters
@staticmethod
@PTransform.register_urn('yelp:flinkYelpDatapipelineSource', None)
def from_runner_api_parameter(spec_parameter, _unused_context):
instance = FlinkYelpDatapipelineSource()
params = json.loads(spec_parameter)
....
return instance
Can use json to pass parameters from Python
Beam to Java Flink
Beam urn identifies a PTransform during the
translation
26. INTEGRATION
Translate to a
Flink operator
Fork FlinkStreamingPortablePipelineTranslator.java
Add your urn and translation function to the
translatorMap
The result of the translation is a “chunk” of Flink
pipeline
The output/input Flink DataStream is of type
WindowedValue<byte[]>
27. Message
Envelope
INTEGRATION
Bytes from Kafka
Timestamp Micros
UUID
Metadata / Headers
Field 1
Field n
Payload
Message Type
Message
Timestamp Micros
UUID
Metadata / Headers
Field 1
Field n
Payload
Message Type
Kafka Position
Kafka Position
Cluster
Topic
Partition
Offset
28. Beam Coder
SERIALIZATION
Data needs to be properly serialized between
Flink and Beam SDK worker
Extend the Beam Coder class to implement a
custom coder for the Message class
Register the coder when the source/sink is
being used
class DataPipelineCoder(Coder):
def encode(self, value: Message):
envelope = Envelope()
return envelope.pack(value)
def decode(self, value: bytes) -> Message:
return create_from_kafka_message_value(value)
registry.register_coder(Message, DataPipelineCoder)
29. Beam
typehints
SERIALIZATION
Critical to make sure that the proper Coder is
being used
Every PTransform that returns a Message must
use the typehint
Annotation
@typehints.with_output_types(Message)
34. DEPLOYMENT
Running Beam
Run on Kubernetes
One Flink Cluster per Beam service
One base Docker image extended with service specific
deps
Run SDK Worker in same Task Manager container
38. DEPLOYMENT
Can do
better?
BEAM-7966 Write portable Beam application jar
BEAM-7980 External environment with containerized worker
pool (Beam 2.16)
Pluggable custom portable translations
40. ADOPTION
Paastorm on
Beam
@typehints.with_output_types(Message)
class Spolt(beam.DoFn):
def process_message(self, message):
raise NotImplementedError()
def process(self, element):
return self.process_message(element)
class Paastorm:
def __init__(self, paastorm_fn):
self.paastorm_fn = paastorm_fn
def run():
p = beam.Pipeline(options=options)
messages = p | DataPipelineSource()
| beam.Map(f()-> (kafka_partition, message))
| beam.ParDo(self.paastorm_fn)
| DataPipelineSink()
p.run()
41. TAKEAWAYS
The future is
now
Run all of our stream processing on one engine:
Flink
Legacy Paastorm easily migrated
Feature parity across languages
New applications use native Beam
Yelp’s Indexing pipeline as first use case