This is the slide deck that was presented at the Hadoop Users Group at LinkedIn on November 5, 2013.
The presentation covers what Samza is, why we built it, and how it works.
Samza: Real-time Stream Processing at LinkedInC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1eGbVJv.
Chris Riccomini discusses: Samza's feature set, how Samza integrates with YARN and Kafka, how it's used at LinkedIn, and what's next on the roadmap. Filmed at qconsf.com.
Chris Riccomini is a Staff Software Engineer at LinkedIn, where he's is currently working as a committer and PMC member for Apache Samza. He's been involved in a wide range of projects at LinkedIn, including, "People You May Know", REST.li, Hadoop, engineering tooling, and OLAP systems. Prior to LinkedIn, he worked on data visualization and fraud modeling at PayPal.
Event Stream Processing with Kafka and SamzaZach Cox
This document discusses event stream processing with Kafka and Samza. It describes how businesses generate event data that can be ingested into a unified log like Kafka for integration. Event streams can then be processed in real-time by systems like Samza to take quick actions on the data and gain insights. Samza is a scalable, reliable stream processing framework that can perform stateless and stateful operations on events streams from Kafka.
Streaming Analytics & CEP - Two sides of the same coin?Till Rohrmann
Talk I gave together with Fabian Hueske at the Berlin Buzzwords 2016 conference.
The talk demonstrates how we can combine streaming analytics and complex event processing (CEP) on the same execution engine, namely Apache Flink. This combination allows to open up a new field of applications where we can easily combine aggregations with temporal pattern detection.
This document provides an overview of Apache Flink, an open-source framework for distributed stream and batch data processing. It discusses key aspects of Flink including that it executes everything as data streams, supports iterative and cyclic data flows, allows mutable state in operators, and provides high availability and checkpointing of operator state. It also provides examples of using Flink's DataStream API to perform operations like hourly and daily tweet impression counts on a continuous stream of tweet data from Kafka.
Unified Stream Processing at Scale with Apache Samza - BDS2017Jacob Maes
The shift to stream processing at LinkedIn has accelerated over the past few years. We now have over 200 Samza applications in production processing more than 260B events per day. Many of these are new applications, but there have also been more migrations from existing online and offline applications. To support the influx of new use cases, we have improved the flexibility, efficiency and reliability of Apache Samza.
In this talk, we will take a brief look at the broader streaming ecosystem at LinkedIn, then we will zoom in on a few representative use cases and explain how they are powered by recent advancements to Apache Samza including a unified high level API, flexible deployment model, batch processing, and more.
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...confluent
Getting Kafka running on Kubernetes is only step one of a journey to create a production-ready Kafka cluster. This talk walks through the other steps: 1) Monitoring and remediating faults. 2) Updates to Kubernetes nodes for clusters not using shared storage. 3) Automating Kafka updates and restarts. We present how to create fault-tolerant Kafka clusters on Kubernetes without sacrificing availability, durability, or latency. Learn about Lyft's overlay-free Kubernetes networking driver and how we use it to keep performance on par with non-Kubernetes clusters.
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward
Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well integrated applications. One of the latest additions to the Mesos family is Apache Flink. Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos. In this talk we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
Fabian Hueske presented on stream analytics using SQL on Apache Flink. Flink provides a scalable platform for stream processing that is fast, accurate, and reliable. Its relational APIs allow querying both batch and streaming data using standard SQL or a LINQ-style Table API. Queries on streaming data produce continuously updating results. Windows can be used to compute aggregates over tumbling time intervals. The dynamic tables representing streaming data can be converted to output streams encoding updates as insertions and deletions. While not all queries can be supported, techniques like limiting state size allow bounding computational resources. Use cases like continuous ETL, dashboards, and event-driven architectures were discussed.
Samza: Real-time Stream Processing at LinkedInC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1eGbVJv.
Chris Riccomini discusses: Samza's feature set, how Samza integrates with YARN and Kafka, how it's used at LinkedIn, and what's next on the roadmap. Filmed at qconsf.com.
Chris Riccomini is a Staff Software Engineer at LinkedIn, where he's is currently working as a committer and PMC member for Apache Samza. He's been involved in a wide range of projects at LinkedIn, including, "People You May Know", REST.li, Hadoop, engineering tooling, and OLAP systems. Prior to LinkedIn, he worked on data visualization and fraud modeling at PayPal.
Event Stream Processing with Kafka and SamzaZach Cox
This document discusses event stream processing with Kafka and Samza. It describes how businesses generate event data that can be ingested into a unified log like Kafka for integration. Event streams can then be processed in real-time by systems like Samza to take quick actions on the data and gain insights. Samza is a scalable, reliable stream processing framework that can perform stateless and stateful operations on events streams from Kafka.
Streaming Analytics & CEP - Two sides of the same coin?Till Rohrmann
Talk I gave together with Fabian Hueske at the Berlin Buzzwords 2016 conference.
The talk demonstrates how we can combine streaming analytics and complex event processing (CEP) on the same execution engine, namely Apache Flink. This combination allows to open up a new field of applications where we can easily combine aggregations with temporal pattern detection.
This document provides an overview of Apache Flink, an open-source framework for distributed stream and batch data processing. It discusses key aspects of Flink including that it executes everything as data streams, supports iterative and cyclic data flows, allows mutable state in operators, and provides high availability and checkpointing of operator state. It also provides examples of using Flink's DataStream API to perform operations like hourly and daily tweet impression counts on a continuous stream of tweet data from Kafka.
Unified Stream Processing at Scale with Apache Samza - BDS2017Jacob Maes
The shift to stream processing at LinkedIn has accelerated over the past few years. We now have over 200 Samza applications in production processing more than 260B events per day. Many of these are new applications, but there have also been more migrations from existing online and offline applications. To support the influx of new use cases, we have improved the flexibility, efficiency and reliability of Apache Samza.
In this talk, we will take a brief look at the broader streaming ecosystem at LinkedIn, then we will zoom in on a few representative use cases and explain how they are powered by recent advancements to Apache Samza including a unified high level API, flexible deployment model, batch processing, and more.
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...confluent
Getting Kafka running on Kubernetes is only step one of a journey to create a production-ready Kafka cluster. This talk walks through the other steps: 1) Monitoring and remediating faults. 2) Updates to Kubernetes nodes for clusters not using shared storage. 3) Automating Kafka updates and restarts. We present how to create fault-tolerant Kafka clusters on Kubernetes without sacrificing availability, durability, or latency. Learn about Lyft's overlay-free Kubernetes networking driver and how we use it to keep performance on par with non-Kubernetes clusters.
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...Flink Forward
Apache Mesos allows operators to run distributed applications across an entire datacenter and is attracting ever increasing interest. As much as distributed applications see increased use enabled by Mesos, Mesos also sees increasing use due to a growing ecosystem of well integrated applications. One of the latest additions to the Mesos family is Apache Flink. Flink is one of the most popular open source systems for real-time high scale data processing and allows users to deal with low-latency streaming analytical workloads on Mesos. In this talk we explain the challenges solved while integrating Flink with Mesos, including how Flink’s distributed architecture can be modeled as a Mesos framework, and how Flink was integrated with Fenzo. Next, we describe how Flink was packaged to easily run on DC/OS.
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
Fabian Hueske presented on stream analytics using SQL on Apache Flink. Flink provides a scalable platform for stream processing that is fast, accurate, and reliable. Its relational APIs allow querying both batch and streaming data using standard SQL or a LINQ-style Table API. Queries on streaming data produce continuously updating results. Windows can be used to compute aggregates over tumbling time intervals. The dynamic tables representing streaming data can be converted to output streams encoding updates as insertions and deletions. While not all queries can be supported, techniques like limiting state size allow bounding computational resources. Use cases like continuous ETL, dashboards, and event-driven architectures were discussed.
Data Streaming Ecosystem Management at Booking.com confluent
This document provides an overview of the data streaming ecosystem at Booking.com. It discusses how Booking.com uses Apache Kafka, Kafka Connect, and related tools across over 300 clusters containing over 350 brokers to handle large volumes of streaming data from its various services and applications. Key aspects of Booking.com's data streaming infrastructure are highlighted, including its use of multiple data centers, global and local clusters, monitoring and alerting systems, and operational best practices.
Building data product requires having lambda architecture to bridge the batch and streaming processing. AirStream is a framework built on top of HBase to allow users to easily build data products at Airbnb. It proved HBase is impactful and useful in the production for mission critical data products.
In the talk, we will present the applications to leverage HBase to compute moving average, distinct count, window based join and etc. in the streaming computation.
Also, we will talk about how to leverage HBase to bridge the gap between batch and streaming queries, including building presto-hbase connector to serve near real time ad-hoc query.
by Liyin Tang of AirBnB
Debunking Common Myths in Stream ProcessingKostas Tzoumas
This document discusses stream processing with Apache Flink. It begins by defining streaming as the continuous processing of never-ending data streams. It then debunks four common myths about stream processing: 1) that there is always a throughput/latency tradeoff, showing that Flink can achieve high throughput and low latency; 2) that exactly-once processing is not possible, but Flink provides exactly-once state guarantees with checkpoints; 3) that streaming is only for real-time applications, whereas it can also be used for historical data; and 4) that streaming is too hard, whereas most data problems are actually streaming problems. The document concludes by discussing Flink's community and examples of companies using Flink in production.
Principles in Data Stream Processing | Matthias J Sax, ConfluentHostedbyConfluent
Data stream processing is, for many of us, a new paradigm with which you process data and build applications. In this talk, we will take you on a journey through the theoretical foundations of stream processing and discuss the underlying principles and unique problems that need to be addressed. What actually is a data stream anyway? And how do I use it? How do streams relate to application state and when do I use the one or the other?
ksqlDB and Kafka Streams are both, at their core, designed to help build stream processing applications and we will explain how stream processing principles are reflected in the design of each system and what trade-offs were chosen (and - more importantly! - why). Finally, we take a look into the future how the stream processing space, and in particular ksqlDB and Kafka Streams, may evolve over the next few years as we outline extensions and improvements to the underlying conceptual model. So, bring your thinking hats and notepads and prepare to learn WHY these systems are the way they are!
Samza SQL allows users to write stream processing jobs using SQL queries. It converts SQL queries into a Samza job by translating the SQL into a logical query plan composed of relational algebra operators. These operators are then converted into a Samza operator graph using the high-level Samza API. The operator graph processes streaming messages in real-time. Samza SQL supports features like filtering, projections, aggregations, and joins on streaming data.
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...confluent
Kafka Streams is a flexible and powerful framework. The Domain Specific Language (DSL) is an obvious place from which to start, but not all requirements fit the DSL model. Many people are unaware of the Processor API (PAPI) – or are intimidated by it because of sinks, sources, edges and stores – oh my! But most of the power of the PAPI can be leveraged, simply through the DSL ”#process” method, which lets you attach the general building block ”Processor” interface to your -easy to use- DSL topology, to combine the best of both worlds.
In this talk you’ll get a look at the flexibility of the DSL’s process method and the possibilities it opens up. We’ll use real world use-cases borne from extensive experience in the field with multiple customers to explore power of direct write access to the state stores and how to perform range sub-selects. We’ll also see the options that punctuators bring to the table, as well as opportunities for major latency optimisations.
Key takeaways:
* Understanding of how to combine DSL and Processors
* Capabilities and benefits of Processors
* Real-world uses of Processors
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
Flink and Kafka are popular components to build an open source stream processing infrastructure. We present how Flink integrates with Kafka to provide a platform with a unique feature set that matches the challenging requirements of advanced stream processing applications. In particular, we will dive into the following points:
Flink’s support for event-time processing, how it handles out-of-order streams, and how it can perform analytics on historical and real-time streams served from Kafka’s persistent log using the same code. We present Flink’s windowing mechanism that supports time-, count- and session- based windows, and intermixing event and processing time semantics in one program.
How Flink’s checkpointing mechanism integrates with Kafka for fault-tolerance, for consistent stateful applications with exactly-once semantics.
We will discuss “”Savepoints””, which allows users to save the state of the streaming program at any point in time. Together with a durable event log like Kafka, savepoints allow users to pause/resume streaming programs, go back to prior states, or switch to different versions of the program, while preserving exactly-once semantics.
We explain the techniques behind the combination of low-latency and high throughput streaming, and how latency/throughput trade-off can configured.
We will give an outlook on current developments for streaming analytics, such as streaming SQL and complex event processing.
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Ververica
Stream Processing is emerging as a popular paradigm for data processing architectures, because it handles the continuous nature of most data and computation and gets rid of artificial boundaries and delays.
The fact that stream processing is gaining rapid adoption is also due to more powerful and maturing technology (much of it open source at the ASF) that has solved many of the hard technical challenges.
We discuss Apache Flink's approach to high performance stream processing with state, strong consistency, low latency, and sophisticated handling of time. With such building blocks, Apache Flink can handle classes of problems previously considered out of reach for stream processing. We also take a sneak preview at the next steps for Flink.
This document provides a summary of the top 10 Kafka configuration settings for optimal performance and robustness. It begins with a brief introduction to Kafka and then discusses important broker configurations like enabling JMX metrics, unclean leader election, retention policies, and minimum in-sync replicas. Client-side configurations like max poll interval and committing offsets for consumers as well as linger time, acknowledgement levels, retries, and maintaining ordering for producers are also covered. The document emphasizes that proper configuration is key to the health of a Kafka cluster and recommends understanding the goals and measuring performance before and after any changes.
Jingwei Lu and Jason Zhang (Airbnb)
AirStream is a realtime stream computation framework built on top of Spark Streaming and HBase that allows our engineers and data scientists to easily leverage HBase to get real-time insights and build real-time feedback loops. In this talk, we will introduce AirStream, and then go over a few production use cases.
The document discusses new features in Apache Flink 1.2, including queryable state and dynamic scaling. It provides an overview of Flink 1.2 features like security enhancements, metrics, and improvements to table API and SQL. It then examines queryable state and dynamic scaling in more detail, covering motivations and implementations for making state queryable and allowing jobs to scale resources dynamically in response to changing workloads. The document concludes by looking briefly beyond Flink 1.2 to future work on automatic scaling without restarts.
Speaker: Neil Avery, Technologist, Office of the CTO, Confluent
Stream processing is now at the forefront of many company strategies. Over the last couple of years we have seen streaming use cases explode and now proliferate the landscape of any modern business.
Use cases including digital transformation, IoT, real-time risk, payments microservices and machine learning are all built on the fundamental that they need fast data and they need it at scale.
Apache Kafka® has long been the streaming platform of choice, its origins of being dumb pipes for big data have long since been left behind and now it is the goto-streaming platform of choice.
Stream processing beckons as being the vehicle for driving those streams, and along with it brings a world of real-time semantics surrounding windowing, joining, correctness, elasticity, and accessibility. The ‘current state of stream processing’ walks through the origins of stream processing, applicable use cases and then dives into the challenges currently facing the world of stream processing as it drives the next data revolution.
Neil is a Technologist in the Office of the CTO at Confluent, the company founded by the creators of Apache Kafka. He has over 20 years of expertise of working on distributed computing, messaging and stream processing. He has built or redesigned commercial messaging platforms, distributed caching products as well as developed large scale bespoke systems for tier-1 banks. After a period at ThoughtWorks, he went on to build some of the first distributed risk engines in financial services. In 2008 he launched a startup that specialised in distributed data analytics and visualization. Prior to joining Confluent he was the CTO at a fintech consultancy.
Watch the recording: https://videos.confluent.io/watch/rmU6GHrd4EKFaZrRhdTE3s?.
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward
This document summarizes recent improvements to Flink SQL and Table API by Blink, Alibaba's distribution of Flink. Key improvements include support for stream-stream joins, user-defined functions, table functions and aggregate functions, retractable streams, and over/group aggregates. Blink aims to make Flink work well at large scale for Alibaba's search and recommendation systems. Many of the improvements will be included in upcoming Flink releases.
Bootstrapping Microservices with Kafka, Akka and SparkAlex Silva
Compared to a traditional analysis-centric data hub, today's data platforms need to fulfill many different use cases. The need for real-time, transport-agnostic data protocols have become a crucial feature shared across many different use cases.
During this talk, we will discuss our approach to bootstrapping and bounded context subscription, leveraging a mix of open source technologies and home-grown services aimed at providing a full end-to-end solution.
We will demonstrate and discuss our use of Kafka, Spark Streaming and Akka to orchestrate a unified data transfer protocol that frees developers from having to listen to and process events within their bounded contexts. More specifically:
- Leveraging Kafka as the source of truth
- Topic serialization formats, Avro, and retention rules
- Using Kafka's distributed commit logs to produce durable datasets
- Log compaction and its role in service bootstrapping
- Ingestion at scale: consuming data in different formats across different teams at different latencies
- Using Spark Streaming and Akka to perform near real-time replication protocols
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...confluent
We’ve built a real-time streaming platform that enables prediction based on user behavior, with events occurring in virtual and augmented reality environments. The solution enables organizations to train people in an extended reality environment, where real-life training may be costly and dangerous. Kafka Streams enables analyzing spatial and event data to detect gestural feature and analyze user behavior in real-time to be able to predict any future mistake the user might make. Kafka is the backbone of our real-time analytics and extended reality communication platform with our cluster and applications being deployed on Kubernetes.
In this talk, we will mainly focus on the following: 1. Why Extended Reality with Kafka is a step in the right direction. 2. Architecture & Power of Schema Registry in building a generic platform for pluggable XR apps and analytics models 3. How KSQL and Kafka Streams fits in Kafka Ecosystem to help analyze human motion data and detect features for real-time prediction. 4. Demo of a VR application with real-time analytics feedback, which assists people to be trained in how to work with chemical laboratory equipment.
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)confluent
The document discusses streams and tables as two sides of the same coin. It proposes a dual streaming model that handles out-of-order data by continuously updating stateful operators and emitting changelog streams. This allows processing data with low latency while avoiding buffering and reordering. The model is implemented in Apache Kafka Streams and adopted in industry through tools like KSQL.
Slides for my talk at Hadoop Summit Dublin, April 2016.
The talk motivates how streaming can subsume batch use cases at the example of continuous counting.
Apache Flink: Streaming Done Right @ FOSDEM 2016Till Rohrmann
The talk I gave at the FOSDEM 2016 on the 31st of January.
The talk explains how we can do stateful stream processing with Apache Flink at the example of counting tweet impressions. It covers Flink's windowing semantics, stateful operators, fault tolerance and performance numbers. The talks ends with giving an outlook on what's is going to happen in the next couple of months.
Deck36 is a small team of engineers who specialize in designing, implementing, and operating complex web systems. They discuss their approach to logging everything through a data pipeline that ingests data from producers, transports it via RabbitMQ, stores it in Hadoop HDFS and Amazon S3, runs analytics with Hadoop MapReduce and Amazon EMR, and performs real-time stream processing with Twitter Storm. They also live demo their JavaScript data collector client and a PHP/Storm example that processes click stream data.
Samza at LinkedIn: Taking Stream Processing to the Next LevelMartin Kleppmann
Slides from my talk at Berlin Buzzwords, 27 May 2014. Unfortunately Slideshare has screwed up the fonts. See https://speakerdeck.com/ept/samza-at-linkedin-taking-stream-processing-to-the-next-level for a version of the deck with correct fonts.
Stream processing is an essential part of real-time data systems, such as news feeds, live search indexes, real-time analytics, metrics and monitoring. But writing stream processes is still hard, especially when you're dealing with so much data that you have to distribute it across multiple machines. How can you keep the system running smoothly, even when machines fail and bugs occur?
Apache Samza is a new framework for writing scalable stream processing jobs. Like Hadoop and MapReduce for batch processing, it takes care of the hard parts of running your message-processing code on a distributed infrastructure, so that you can concentrate on writing your application using simple APIs. It is in production use at LinkedIn.
This talk will introduce Samza, and show how to use it to solve a range of different problems. Samza has some unique features that make it especially interesting for large deployments, and in this talk we will dig into how they work under the hood. In particular:
• Samza is built to support many different jobs written by different teams. Isolation between jobs ensures that a single badly behaved job doesn't affect other jobs. It is robust by design.
• Samza can handle jobs that require large amounts of state, for example joining multiple streams, augmenting a stream with data from a database, or aggregating data over long time windows. This makes it a very powerful tool for applications.
Apache Samza is a framework for reliable stream processing using Apache Kafka and Hadoop YARN. It provides low-latency, real-time stream processing capabilities. Samza jobs consume and process data from Kafka topics as input streams and output results to other topics. Samza tasks are distributed and run reliably across a YARN cluster, processing and maintaining state for assigned Kafka partitions. The Samza API allows developers to build stream processing applications using a simple process() method to consume messages and emit results.
Data Streaming Ecosystem Management at Booking.com confluent
This document provides an overview of the data streaming ecosystem at Booking.com. It discusses how Booking.com uses Apache Kafka, Kafka Connect, and related tools across over 300 clusters containing over 350 brokers to handle large volumes of streaming data from its various services and applications. Key aspects of Booking.com's data streaming infrastructure are highlighted, including its use of multiple data centers, global and local clusters, monitoring and alerting systems, and operational best practices.
Building data product requires having lambda architecture to bridge the batch and streaming processing. AirStream is a framework built on top of HBase to allow users to easily build data products at Airbnb. It proved HBase is impactful and useful in the production for mission critical data products.
In the talk, we will present the applications to leverage HBase to compute moving average, distinct count, window based join and etc. in the streaming computation.
Also, we will talk about how to leverage HBase to bridge the gap between batch and streaming queries, including building presto-hbase connector to serve near real time ad-hoc query.
by Liyin Tang of AirBnB
Debunking Common Myths in Stream ProcessingKostas Tzoumas
This document discusses stream processing with Apache Flink. It begins by defining streaming as the continuous processing of never-ending data streams. It then debunks four common myths about stream processing: 1) that there is always a throughput/latency tradeoff, showing that Flink can achieve high throughput and low latency; 2) that exactly-once processing is not possible, but Flink provides exactly-once state guarantees with checkpoints; 3) that streaming is only for real-time applications, whereas it can also be used for historical data; and 4) that streaming is too hard, whereas most data problems are actually streaming problems. The document concludes by discussing Flink's community and examples of companies using Flink in production.
Principles in Data Stream Processing | Matthias J Sax, ConfluentHostedbyConfluent
Data stream processing is, for many of us, a new paradigm with which you process data and build applications. In this talk, we will take you on a journey through the theoretical foundations of stream processing and discuss the underlying principles and unique problems that need to be addressed. What actually is a data stream anyway? And how do I use it? How do streams relate to application state and when do I use the one or the other?
ksqlDB and Kafka Streams are both, at their core, designed to help build stream processing applications and we will explain how stream processing principles are reflected in the design of each system and what trade-offs were chosen (and - more importantly! - why). Finally, we take a look into the future how the stream processing space, and in particular ksqlDB and Kafka Streams, may evolve over the next few years as we outline extensions and improvements to the underlying conceptual model. So, bring your thinking hats and notepads and prepare to learn WHY these systems are the way they are!
Samza SQL allows users to write stream processing jobs using SQL queries. It converts SQL queries into a Samza job by translating the SQL into a logical query plan composed of relational algebra operators. These operators are then converted into a Samza operator graph using the high-level Samza API. The operator graph processes streaming messages in real-time. Samza SQL supports features like filtering, projections, aggregations, and joins on streaming data.
Beyond the DSL-Unlocking the Power of Kafka Streams with the Processor API (A...confluent
Kafka Streams is a flexible and powerful framework. The Domain Specific Language (DSL) is an obvious place from which to start, but not all requirements fit the DSL model. Many people are unaware of the Processor API (PAPI) – or are intimidated by it because of sinks, sources, edges and stores – oh my! But most of the power of the PAPI can be leveraged, simply through the DSL ”#process” method, which lets you attach the general building block ”Processor” interface to your -easy to use- DSL topology, to combine the best of both worlds.
In this talk you’ll get a look at the flexibility of the DSL’s process method and the possibilities it opens up. We’ll use real world use-cases borne from extensive experience in the field with multiple customers to explore power of direct write access to the state stores and how to perform range sub-selects. We’ll also see the options that punctuators bring to the table, as well as opportunities for major latency optimisations.
Key takeaways:
* Understanding of how to combine DSL and Processors
* Capabilities and benefits of Processors
* Real-world uses of Processors
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
Flink and Kafka are popular components to build an open source stream processing infrastructure. We present how Flink integrates with Kafka to provide a platform with a unique feature set that matches the challenging requirements of advanced stream processing applications. In particular, we will dive into the following points:
Flink’s support for event-time processing, how it handles out-of-order streams, and how it can perform analytics on historical and real-time streams served from Kafka’s persistent log using the same code. We present Flink’s windowing mechanism that supports time-, count- and session- based windows, and intermixing event and processing time semantics in one program.
How Flink’s checkpointing mechanism integrates with Kafka for fault-tolerance, for consistent stateful applications with exactly-once semantics.
We will discuss “”Savepoints””, which allows users to save the state of the streaming program at any point in time. Together with a durable event log like Kafka, savepoints allow users to pause/resume streaming programs, go back to prior states, or switch to different versions of the program, while preserving exactly-once semantics.
We explain the techniques behind the combination of low-latency and high throughput streaming, and how latency/throughput trade-off can configured.
We will give an outlook on current developments for streaming analytics, such as streaming SQL and complex event processing.
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Ververica
Stream Processing is emerging as a popular paradigm for data processing architectures, because it handles the continuous nature of most data and computation and gets rid of artificial boundaries and delays.
The fact that stream processing is gaining rapid adoption is also due to more powerful and maturing technology (much of it open source at the ASF) that has solved many of the hard technical challenges.
We discuss Apache Flink's approach to high performance stream processing with state, strong consistency, low latency, and sophisticated handling of time. With such building blocks, Apache Flink can handle classes of problems previously considered out of reach for stream processing. We also take a sneak preview at the next steps for Flink.
This document provides a summary of the top 10 Kafka configuration settings for optimal performance and robustness. It begins with a brief introduction to Kafka and then discusses important broker configurations like enabling JMX metrics, unclean leader election, retention policies, and minimum in-sync replicas. Client-side configurations like max poll interval and committing offsets for consumers as well as linger time, acknowledgement levels, retries, and maintaining ordering for producers are also covered. The document emphasizes that proper configuration is key to the health of a Kafka cluster and recommends understanding the goals and measuring performance before and after any changes.
Jingwei Lu and Jason Zhang (Airbnb)
AirStream is a realtime stream computation framework built on top of Spark Streaming and HBase that allows our engineers and data scientists to easily leverage HBase to get real-time insights and build real-time feedback loops. In this talk, we will introduce AirStream, and then go over a few production use cases.
The document discusses new features in Apache Flink 1.2, including queryable state and dynamic scaling. It provides an overview of Flink 1.2 features like security enhancements, metrics, and improvements to table API and SQL. It then examines queryable state and dynamic scaling in more detail, covering motivations and implementations for making state queryable and allowing jobs to scale resources dynamically in response to changing workloads. The document concludes by looking briefly beyond Flink 1.2 to future work on automatic scaling without restarts.
Speaker: Neil Avery, Technologist, Office of the CTO, Confluent
Stream processing is now at the forefront of many company strategies. Over the last couple of years we have seen streaming use cases explode and now proliferate the landscape of any modern business.
Use cases including digital transformation, IoT, real-time risk, payments microservices and machine learning are all built on the fundamental that they need fast data and they need it at scale.
Apache Kafka® has long been the streaming platform of choice, its origins of being dumb pipes for big data have long since been left behind and now it is the goto-streaming platform of choice.
Stream processing beckons as being the vehicle for driving those streams, and along with it brings a world of real-time semantics surrounding windowing, joining, correctness, elasticity, and accessibility. The ‘current state of stream processing’ walks through the origins of stream processing, applicable use cases and then dives into the challenges currently facing the world of stream processing as it drives the next data revolution.
Neil is a Technologist in the Office of the CTO at Confluent, the company founded by the creators of Apache Kafka. He has over 20 years of expertise of working on distributed computing, messaging and stream processing. He has built or redesigned commercial messaging platforms, distributed caching products as well as developed large scale bespoke systems for tier-1 banks. After a period at ThoughtWorks, he went on to build some of the first distributed risk engines in financial services. In 2008 he launched a startup that specialised in distributed data analytics and visualization. Prior to joining Confluent he was the CTO at a fintech consultancy.
Watch the recording: https://videos.confluent.io/watch/rmU6GHrd4EKFaZrRhdTE3s?.
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward
This document summarizes recent improvements to Flink SQL and Table API by Blink, Alibaba's distribution of Flink. Key improvements include support for stream-stream joins, user-defined functions, table functions and aggregate functions, retractable streams, and over/group aggregates. Blink aims to make Flink work well at large scale for Alibaba's search and recommendation systems. Many of the improvements will be included in upcoming Flink releases.
Bootstrapping Microservices with Kafka, Akka and SparkAlex Silva
Compared to a traditional analysis-centric data hub, today's data platforms need to fulfill many different use cases. The need for real-time, transport-agnostic data protocols have become a crucial feature shared across many different use cases.
During this talk, we will discuss our approach to bootstrapping and bounded context subscription, leveraging a mix of open source technologies and home-grown services aimed at providing a full end-to-end solution.
We will demonstrate and discuss our use of Kafka, Spark Streaming and Akka to orchestrate a unified data transfer protocol that frees developers from having to listen to and process events within their bounded contexts. More specifically:
- Leveraging Kafka as the source of truth
- Topic serialization formats, Avro, and retention rules
- Using Kafka's distributed commit logs to produce durable datasets
- Log compaction and its role in service bootstrapping
- Ingestion at scale: consuming data in different formats across different teams at different latencies
- Using Spark Streaming and Akka to perform near real-time replication protocols
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...confluent
We’ve built a real-time streaming platform that enables prediction based on user behavior, with events occurring in virtual and augmented reality environments. The solution enables organizations to train people in an extended reality environment, where real-life training may be costly and dangerous. Kafka Streams enables analyzing spatial and event data to detect gestural feature and analyze user behavior in real-time to be able to predict any future mistake the user might make. Kafka is the backbone of our real-time analytics and extended reality communication platform with our cluster and applications being deployed on Kubernetes.
In this talk, we will mainly focus on the following: 1. Why Extended Reality with Kafka is a step in the right direction. 2. Architecture & Power of Schema Registry in building a generic platform for pluggable XR apps and analytics models 3. How KSQL and Kafka Streams fits in Kafka Ecosystem to help analyze human motion data and detect features for real-time prediction. 4. Demo of a VR application with real-time analytics feedback, which assists people to be trained in how to work with chemical laboratory equipment.
Streams and Tables: Two Sides of the Same Coin (BIRTE 2018)confluent
The document discusses streams and tables as two sides of the same coin. It proposes a dual streaming model that handles out-of-order data by continuously updating stateful operators and emitting changelog streams. This allows processing data with low latency while avoiding buffering and reordering. The model is implemented in Apache Kafka Streams and adopted in industry through tools like KSQL.
Slides for my talk at Hadoop Summit Dublin, April 2016.
The talk motivates how streaming can subsume batch use cases at the example of continuous counting.
Apache Flink: Streaming Done Right @ FOSDEM 2016Till Rohrmann
The talk I gave at the FOSDEM 2016 on the 31st of January.
The talk explains how we can do stateful stream processing with Apache Flink at the example of counting tweet impressions. It covers Flink's windowing semantics, stateful operators, fault tolerance and performance numbers. The talks ends with giving an outlook on what's is going to happen in the next couple of months.
Deck36 is a small team of engineers who specialize in designing, implementing, and operating complex web systems. They discuss their approach to logging everything through a data pipeline that ingests data from producers, transports it via RabbitMQ, stores it in Hadoop HDFS and Amazon S3, runs analytics with Hadoop MapReduce and Amazon EMR, and performs real-time stream processing with Twitter Storm. They also live demo their JavaScript data collector client and a PHP/Storm example that processes click stream data.
Samza at LinkedIn: Taking Stream Processing to the Next LevelMartin Kleppmann
Slides from my talk at Berlin Buzzwords, 27 May 2014. Unfortunately Slideshare has screwed up the fonts. See https://speakerdeck.com/ept/samza-at-linkedin-taking-stream-processing-to-the-next-level for a version of the deck with correct fonts.
Stream processing is an essential part of real-time data systems, such as news feeds, live search indexes, real-time analytics, metrics and monitoring. But writing stream processes is still hard, especially when you're dealing with so much data that you have to distribute it across multiple machines. How can you keep the system running smoothly, even when machines fail and bugs occur?
Apache Samza is a new framework for writing scalable stream processing jobs. Like Hadoop and MapReduce for batch processing, it takes care of the hard parts of running your message-processing code on a distributed infrastructure, so that you can concentrate on writing your application using simple APIs. It is in production use at LinkedIn.
This talk will introduce Samza, and show how to use it to solve a range of different problems. Samza has some unique features that make it especially interesting for large deployments, and in this talk we will dig into how they work under the hood. In particular:
• Samza is built to support many different jobs written by different teams. Isolation between jobs ensures that a single badly behaved job doesn't affect other jobs. It is robust by design.
• Samza can handle jobs that require large amounts of state, for example joining multiple streams, augmenting a stream with data from a database, or aggregating data over long time windows. This makes it a very powerful tool for applications.
Apache Samza is a framework for reliable stream processing using Apache Kafka and Hadoop YARN. It provides low-latency, real-time stream processing capabilities. Samza jobs consume and process data from Kafka topics as input streams and output results to other topics. Samza tasks are distributed and run reliably across a YARN cluster, processing and maintaining state for assigned Kafka partitions. The Samza API allows developers to build stream processing applications using a simple process() method to consume messages and emit results.
This document provides an overview of Apache Samza, an open source stream processing framework. It discusses why stream processing is useful, Samza's design of processing streams of data across jobs and tasks, how its design is implemented using Apache Kafka for messaging and YARN for resource management, and how to use Samza by developing stream and stateful tasks.
Benchmarking Apache Samza: 1.2 million messages per sec per nodeTao Feng
This document summarizes benchmarking tests of Apache Samza's performance processing streaming data. The tests measured Samza's performance on different processing tasks: message passing achieved 1.2 million messages per second per node; key counting with an in-memory store achieved 1 million messages per second; key counting with RocksDB storage was 443k messages per second; and key counting with RocksDB storage and changelog was 300k messages per second. The benchmarks provide a foundation for developing a capacity model for Samza's performance on high-volume streaming data applications.
Apache Samza is a framework for reliable stream processing using Apache Kafka and Hadoop YARN. It provides low-latency stream processing by allowing users to write stream processing jobs that consume messages from Kafka topics and process them using simple process functions. Samza jobs are distributed and run across clusters using YARN to provide reliability and scalability. The process functions in Samza allow users to easily integrate stream processing with state storage and message output to other Kafka topics.
Stream Processing with Samza introduces an architecture and concepts for processing streaming data in real-time. It describes use cases at LinkedIn including data standardization and call graph assembly. The key concepts discussed are streams, tasks, jobs, and stateful stream processing. Streams are partitions of data that are processed by tasks in parallel to perform aggregations and generate real-time metrics and monitoring information.
Stream processing involves processing unbounded streams of data in near real-time to produce derived data outputs. Samza is a distributed stream processing framework that allows processing of streams at large scale. At LinkedIn, Samza is used to process over 1 trillion events per day across many jobs and clusters for applications like tracking, analytics, and data standardization. Upcoming Samza features include improvements to local state storage, dynamic configuration, easier deployment of standalone jobs, and a high-level query language.
This document appears to be a presentation about optimizing Sitecore performance. It discusses various Sitecore performance counters that can be monitored, such as cache hits/clears and data reads/writes. It also covers monitoring server resources like memory, disk, and processor usage. The presentation recommends setting IIS and SQL Server options to improve performance, such as enabling compression and setting the SQL compatibility level. Potential symptoms of performance issues are also listed.
- The document discusses the Samza high-level API, which allows expressing stream processing pipelines in a single program using built-in functions, providing a more flexible deployment model that can run Samza applications either embedded or in a cluster.
- It also covers convergence between batch and stream processing in Samza, where the same application logic can run on either streaming or batch data with only configuration changes.
With Lakehouse as the future of data architecture, Delta becomes the de facto data storage format for all the data pipelines. By using delta, to build the curated data lakes, users achieve efficiency and reliability end-to-end. Curated data lakes involve multiple hops in the end-to-end data pipeline, which are executed regularly (mostly daily) depending on the need. As data travels through each hop, its quality improves and becomes suitable for end-user consumption. On the other hand real-time capabilities are key for any business and an added advantage, luckily Delta has seamless integration with structured streaming which makes it easy for users to achieve real-time capability using Delta. Overall, Delta Lake as a streaming source is a marriage made in heaven for various reasons and we are already seeing the rise in adoption among our users.
In this talk, we will discuss various functional components of structured streaming with Delta as a streaming source. Deep dive into Query Progress Logs(QPL) and their significance for operating streams in production. How to track the progress of any streaming job and map it with the source Delta table using QPL. What exactly gets persisted in the checkpoint directory and its details. Mapping the contents of the checkpoint directory with the QPL metrics and understanding the significance of contents in the checkpoint directory with respect to Delta streams.
This document summarizes a presentation about stream processing at LinkedIn using Apache Samza. It discusses how LinkedIn uses Samza for several use cases, including notifications, viewport tracking, and others. It provides details on how notifications are handled through a stream processing-based ATC (Air Traffic Controller) system. It also describes how viewport tracking works to power relevant content on LinkedIn's feed by processing billions of client-side tracking events daily through Samza.
Bei Jimdo sammeln wir jede Menge Metriken über alle Teile unseres Systems. Dabei fallen Daten auf allen Ebenen des Systems an: Infrastruktur, System und Applikation. Wichtig ist, dass alle Entwickler zu jedem Zeitpunkt Einblick in die Echtzeit-Metriken ihrer Services nehmen können. Um das zu garantieren, haben wir uns einige Zeit mit der Integration von Prometheus in unsere Systeme beschäftigt.
In unserem Talk werden wir sowohl über den Betrieb von Prometheus als auch über die Integrationen mit dem Rest der Jimdo-Plattform sprechen. Wir werden von Stolpersteinen und Tricks berichten, die wir gelernt haben, sowie einen Einblick in unserer Tool-Landschaft geben.
1404 app dev series - session 8 - monitoring & performance tuningMongoDB
This document discusses MongoDB monitoring tools and key metrics. It provides an overview of tools like mongostat, the MongoDB shell, MMS, and mtools for monitoring operations per second, memory usage, page faults, and other metrics. It also discusses using logs to analyze query performance and disk saturation. The importance of monitoring queued readers/writers, page faults, background flush processes, memory usage, locks, and other core metrics is highlighted.
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation. Cloud Dataflow frees you from operational tasks like resource management and performance optimization.
SVR17: Data-Intensive Computing on Windows HPC Server with the ...butest
This document discusses using the DryadLINQ framework to perform data-intensive computing on Windows HPC Server. DryadLINQ allows developers to write LINQ queries over distributed datasets using a declarative programming model. It automatically parallelizes queries by generating execution plans that leverage both multi-node and multi-core parallelism. Queries are executed efficiently by the Dryad distributed execution engine across large partitioned datasets.
SVR17: Data-Intensive Computing on Windows HPC Server with the ...butest
This document discusses using the DryadLINQ framework to perform data-intensive computing on Windows HPC Server. DryadLINQ allows developers to write LINQ queries over distributed datasets using a declarative programming model. It automatically parallelizes queries by generating execution plans that leverage both intra-node parallelism using PLINQ and inter-node parallelism using the Dryad distributed execution engine. DryadLINQ integrates with .NET and provides type safety while handling serialization, distribution, and failure recovery of queries across large clustered datasets.
Stream processing with Apache Flink - Maximilian Michels Data ArtisansEvention
Apache Flink is an open source platform for distributed stream and batch data processing. At its core, Flink is a streaming dataflow engine which provides data distribution, communication, and fault tolerance for distributed computations over data streams. On top of this core, APIs make it easy to develop distributed data analysis programs. Libraries for graph processing or machine learning provide convenient abstractions for solving large-scale problems. Apache Flink integrates with a multitude of other open source systems like Hadoop, databases, or message queues. Its streaming capabilities make it a perfect fit for traditional batch processing as well as state of the art stream processing.
Continuous Application with Structured Streaming 2.0Anyscale
Introduction to Continuous Application with Apache Spark 2.0 Structured Streaming. This presentation is a culmination and curation from talks and meetups presented by Databricks engineers.
The notebooks on Structured Streaming demonstrates aspects of the Structured Streaming APIs
Kakfa summit london 2019 - the art of the event-streaming appNeil Avery
Have you ever imagined what it would be like to build a massively scalable streaming application on Kafka, the challenges, the patterns and the thought process involved? How much of the application can be reused? What patterns will you discover? How does it all fit together? Depending upon your use case and business, this can mean many things. Starting out with a data pipeline is one thing, but evolving into a company-wide real-time application that is business critical and entirely dependent upon a streaming platform is a giant leap. Large-scale streaming applications are also called event streaming applications. They are classically different from other data systems; event streaming applications are viewed as a series of interconnected streams that are topologically defined using stream processors; they hold state that models your use case as events. Almost like a deconstructed real-time database.
In this talk, I step through the origins of event streaming systems, understanding how they are developed from raw events to evolve into something that can be adopted at an organizational scale. I start with event-first thinking, Domain Driven Design to build data models that work with the fundamentals of Streams, Kafka Streams, KSQL and Serverless (FaaS).
Building upon this, I explain how to build common business functionality by stepping through the patterns for: – Scalable payment processing – Run it on rails: Instrumentation and monitoring – Control flow patterns Finally, all of these concepts are combined in a solution architecture that can be used at an enterprise scale. I will introduce enterprise patterns such as events-as-a-backbone, events as APIs and methods for governance and self-service. You will leave talk with an understanding of how to model events with event-first thinking, how to work towards reusable streaming patterns and most importantly, how it all fits together at scale.
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...confluent
1) The document discusses the art of building event streaming applications using various techniques like bounded contexts, stream processors, and architectural pillars.
2) Key aspects include modeling the application as a collection of loosely coupled bounded contexts, handling state using Kafka Streams, and building reusable stream processing patterns for instrumentation.
3) Composition patterns involve choreographing and orchestrating interactions between bounded contexts to capture business workflows and functions as event-driven data flows.
The art of the event streaming application: streams, stream processors and sc...confluent
The document discusses event streaming applications and microservices. It introduces event streaming as an architectural style where applications are composed of loosely coupled services that communicate asynchronously through streams of events. Key aspects covered include handling state using event streams and Kafka Streams, building applications as bounded contexts with choreography and orchestration, and establishing pillars for instrumentation, control and operations. Overall the document promotes event streaming as a paradigm that addresses complexity by providing simplicity and scalability through convergent data and logic processing.
Kafka summit SF 2019 - the art of the event-streaming appNeil Avery
Have you ever imagined what it would be like to build a massively scalable streaming application on Kafka, the challenges, the patterns and the thought process involved? How much of the application can be reused? What patterns will you discover? How does it all fit together? Depending upon your use case and business, this can mean many things. Starting out with a data pipeline is one thing, but evolving into a company-wide real-time application that is business critical and entirely dependent upon a streaming platform is a giant leap. Large-scale streaming applications are also called event streaming applications. They are classically different from other data systems; event streaming applications are viewed as a series of interconnected streams that are topologically defined using stream processors; they hold state that models your use case as events. Almost like a deconstructed realtime database.
In this talk, I step through the origins of event streaming systems, understanding how they are developed from raw events to evolve into something that can be adopted at an organizational scale. I start with event-first thinking, Domain Driven Design to build data models that work with the fundamentals of Streams, Kafka Streams, KSQL and Serverless (FaaS). Building upon this, I explain how to build common business functionality by stepping through patterns for Scalable payment processing Run it on rails: Instrumentation and monitoring Control flow patterns (start, stop, pause) Finally, all of these concepts are combined in a solution architecture that can be used at enterprise scale. I will introduce enterprise patterns such as events-as-a-backbone, events as APIs and methods for governance and self-service. You will leave talk with an understanding of how to model events with event-first thinking, how to work towards reusable streaming patterns and most importantly, how it all fits together at scale.
Big Data-Driven Applications with Cassandra and SparkArtem Chebotko
This document discusses using Cassandra and Spark for big data applications. It describes how Cassandra is well-suited for operational workloads with millisecond response times and linear scalability, while Spark can handle real-time, streaming and batch analytics up to 100x faster than Hadoop. The Spark-Cassandra connector allows Spark to efficiently read from and write into Cassandra by optimizing for predicate pushdown, data locality, joins and grouping. The document provides an architecture overview and examples of modeling data in Cassandra and interacting with it from Spark using the connector.
The document discusses a gate pass system project that provides security and shows comparability with passing elements through multiple steps. It includes checking an element, showing its status, and passing the element through the security levels. The project defines many attributes and properties to demonstrate the security features.
Similar to Apache Incubator Samza: Stream Processing at LinkedIn (20)
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
What is an RPA CoE? Session 2 – CoE RolesDianaGray10
In this session, we will review the players involved in the CoE and how each role impacts opportunities.
Topics covered:
• What roles are essential?
• What place in the automation journey does each role play?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsScyllaDB
ScyllaDB monitoring provides a lot of useful information. But sometimes it’s not easy to find the root of the problem if something is wrong or even estimate the remaining capacity by the load on the cluster. This talk shares our team's practical tips on: 1) How to find the root of the problem by metrics if ScyllaDB is slow 2) How to interpret the load and plan capacity for the future 3) Compaction strategies and how to choose the right one 4) Important metrics which aren’t available in the default monitoring setup.
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Keywords: AI, Containeres, Kubernetes, Cloud Native
Event Link: https://meine.doag.org/events/cloudland/2024/agenda/#agendaId.4211
In our second session, we shall learn all about the main features and fundamentals of UiPath Studio that enable us to use the building blocks for any automation project.
📕 Detailed agenda:
Variables and Datatypes
Workflow Layouts
Arguments
Control Flows and Loops
Conditional Statements
💻 Extra training through UiPath Academy:
Variables, Constants, and Arguments in Studio
Control Flow in Studio
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
107. Let’s be Friends!
• We are incubating, and you can help!
• Get up and running in 5 minutes
http://bit.ly/hello-samza
• Grab some newbie JIRAs
http://bit.ly/samza_newbie_issues
Editor's Notes
- stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
- stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
- stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
- stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
- stream processing for us = anything asynchronous, but not batch computed.- 25% of code is async. 50% is rpc/online. 25% is batch.- stream processing is worst supported.
- compute top shares, pull in, scrape, entity tag- language detection- send emails: friend was in the news- requirement: has to be fast, since news is trendy
- relevance pipeline
- we send relatively data rich emails- some emails are time sensitive (need to be sent soon)
- time sensitive- data ingestion pattern- other systems that follow this pattern: realtimeolap system, and social graph system
- ecosystem at LinkedIn (some unique traits)- hard unsolved problems in this space
- oncewe had all this data in kafka, we wanted to do stuff with it.- persistent,reliable,distributed,message queue- Kafka = first among equals, but stream systems are pluggable. Just like Hadoop with HDSF vs. S3.
- started with just simple web service that consumes and produces kafka messages.- realized that there are a lot of hard problems that needed to be solved.- reprocessing: what if my algorithm changes and I need to reprocess all events?- non-determinism: queries to external systems, time dependencies, ordering of messages.
- open area of research- been around for 20 years
partitioned
re-playableorderedfault tolerantinfinitevery heavyweight definition of a stream (vs. s4, storm, etc)
At least once messaging. Duplicates are possible.Future: exact semantics.Transparent to user. No ack’ing API.
connected by stream name onlyfully buffered
- group by, sum, count
- stream to stream, stream to table, table to table
- buffered sorting
UDP is an over-optimization, since most processors try to remote join, which is very slow.